Where am I? Using the SF package for R to locate points in an area
Ari Fenn, Researcher
February 10, 2021
We have some exciting research in the pipeline. Part of this involves matching specific locations to larger geographic areas. There are two ways to match locations to areas.
- We could write a long and cumbersome CASE WHEN statement.
- Since we have the latitude and longitude of each specific point and geographic boundaries of the area, we can link them through the fact that both are physical locations.
While it is too early to discuss the upcoming research’s specifics, I will work through an example of how many UTA stops are located in Utah small areas and which Utah small area has the most stops.
In this case, to link our specific locations, transit stops, regions, and Utah small areas, as physical locations, we will exploit the R package Simple Features, sf. The sf package is designed to quickly and simply work with spatial data. This package is frequently used to create maps in R easily, but we will exploit the package’s spatial join abilities for this demonstration.
To answer which Utah small area has the most UTA (Utah Transit Authority) stops, I will use publicly available data from the State of Utah. To better emulate a real-world data situation, one of the files will be a comma-separated value file (.csv), and the second will be a shapefile.
The first will be UTA stops, and these are available here. From the “Data” tab, choose the “Spreadsheet” option from the “Download” dropdown.
The second dataset I will use are the shapes of Utah small areas found here. Under the “Data” tab, choose the “Shapefile” option from the “Download” dropdown. This link will download the shapefile and accompanying files. Make sure to extract all files to your working directory.
From here, I will load the two data sets and convert them with the sf package to Simple Features. The UTA stops data is missing latitude and longitude for three stops. For this demonstration, I will filter them out.
When working with geographic data, it is important to remember that the data comes from a sphere (3D object) and is projected onto a flat (2D) object. This conversion means that some information will be lost in translation.
In the following code, I will use the Mercator Projection, which is the most well-known projection. While making areas in the North appear larger than areas in the South for this project, we only use the same projection for both datasets as we are more concerned with matching stops to small areas.
library(tidyverse)
library(sf)
stops <- read_csv('UTA_Stops_%26_Most_Recent_Ridership.csv') %>%
select(FID, City, ZipCode, County, UTA_StopID, Latitude, Longitude) %>%
filter(Latitude != 0) %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 3857)
utsmallarea <- st_read("HealthSmallStatisticalAreas2020.shp") %>%
st_transform(crs = 3857) # Mercator Projection
Now that the UTA Stops data is “stops” and Utah small areas are “utsmallarea,” we want to match the UTA stops with the small area in which they are located. Looking at both datasets, we see that the geometry variable for “stops” has a single set of points. Still, the geometry variable for “utsmallarea” has multiple sets of points.
head(utsmallarea)
## Simple feature collection with 6 features and 5 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -12654050 ymin: 4969716 xmax: -12245010 ymax: 5160483
## projected CRS: WGS 84 / Pseudo-Mercator
## OBJECTID SmallAreaN SmallAre_1 SHAPE_Leng SHAPE_Area
## 1 1 -- Not assigned 335004.38 3569799641
## 2 2 -- Not assigned 2343.89 256560
## 3 3 -- Not assigned 129015.67 523638467
## 4 4 1 Brigham City 198637.88 887185994
## 5 5 10 Riverdale 55477.75 56762254
## 6 6 11 Clearfield Area/Hooper 108289.15 264496136
## geometry
## 1 POLYGON ((-12245107 5011460...
## 2 POLYGON ((-12361673 5012913...
## 3 POLYGON ((-12631967 5159977...
## 4 POLYGON ((-12465290 5091622...
## 5 POLYGON ((-12465983 5042874...
## 6 POLYGON ((-12466762 5034251...
head(stops)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -111.8925 ymin: 40.86167 xmax: -111.8798 ymax: 40.86523
## projected CRS: WGS 84 / Pseudo-Mercator
## # A tibble: 6 x 6
## FID City ZipCode County UTA_StopID geometry
## <dbl> <chr> <dbl> <chr> <dbl> <POINT [m]>
## 1 1 Bountiful 84010 Davis 112092 (-111.8798 40.86523)
## 2 2 Bountiful 84010 Davis 112093 (-111.8821 40.86246)
## 3 3 Bountiful 84010 Davis 112094 (-111.8837 40.86187)
## 4 4 Bountiful 84010 Davis 112095 (-111.8925 40.86169)
## 5 5 Bountiful 84010 Davis 112096 (-111.8888 40.86167)
## 6 6 Bountiful 84010 Davis 112097 (-111.8922 40.86221)
The goal is to sort each point in “stops” into an area in “utsmallarea.” To do this, I will use a spatial join:
joined <- st_join(stops, utsmallarea, join = st_contains)
The “join = st_contains” simply states that if a point in the left dataset, “stops,” is in an area in the suitable dataset, “utsmallarea,” the information from “utsmallarea” is joined with the information from “stops.”
## Simple feature collection with 6 features and 10 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -111.8925 ymin: 40.86167 xmax: -111.8798 ymax: 40.86523
## projected CRS: WGS 84 / Pseudo-Mercator
## # A tibble: 6 x 11
## FID City ZipCode County UTA_StopID geometry OBJECTID
## <dbl> <chr> <dbl> <chr> <dbl> <POINT [m]> <int>
## 1 1 Boun~ 84010 Davis 112092 (-111.8798 40.86523) NA
## 2 2 Boun~ 84010 Davis 112093 (-111.8821 40.86246) NA
## 3 3 Boun~ 84010 Davis 112094 (-111.8837 40.86187) NA
## 4 4 Boun~ 84010 Davis 112095 (-111.8925 40.86169) NA
## 5 5 Boun~ 84010 Davis 112096 (-111.8888 40.86167) NA
## 6 6 Boun~ 84010 Davis 112097 (-111.8922 40.86221) NA
## # ... with 4 more variables: SmallAreaN <fct>, SmallAre_1 <fct>,
## # SHAPE_Leng <dbl>, SHAPE_Area <dbl>
We see that the first stop now has the resulting information about the small area; the name, number, shape length, and area. From here, I group by small area and can see that Bountiful has the most stops with 284. Additional analysis using data about each small area is as simple as joining two data sets. Hopefully, this will help add spatial data to your future research, and not forget to look out for our research coming this summer.