# Bringing the family together: Finding the center of geographic points in R

My husband’s family throws a family reunion every year and this year we’ve been tasked with co-planning it. We were trying to decide on the best location for everyone, so I embarked on a mission to find the center of all of our residences.

```
library(tidyverse)
library(leaflet)
```

## Geocoding the locations

I began by putting together a quick `.csv`

of the locations and number of family members in each.

```
df <- read_csv("locations.csv")
DT::datatable(df)
```

To get the longitude and latitude for each location, I used the **opencage** package. This 📦 requires an API key – you can get a free one from their website. For tips on adding environment variables to your `.Renviron`

, check out Jenny Bryan’s Happy Git and GitHub for the useR. If you add this key to your `.Renviron`

as `OPENCAGE_KEY`

the package will automatically locate and use this.

Maëlle Salmon has a blog post for everything! I based my geocoding on the methods she used in her post on the R-Ladies global tour.
The `open_forward()`

function will give multiple results for each call, along with a confidence score for each. We can sort by the confidence score and take the top result. Since I just want to grab the longitude and latitude for the most likely location, I have made a small function to grab that, based on Maëlle’s blog post.

```
geocode_family <- function(city) {
opencage::opencage_forward(city)$results %>%
filter(components._type %in% c("city", "state")) %>%
arrange(desc(confidence)) %>%
select(lat = geometry.lat, lon = geometry.lng) %>%
slice(1)
}
```

```
df <- df %>%
bind_cols(map_df(.$city, geocode_family))
```

## Finding the center

It turns out there are many ways to find the center of geographic locations. The easiest way is to average all of the longitude points and average all of the latitude points. This can optionally be weighted by the number of people at each point by taking a weighted mean. I call this the `geographic_average()`

.

```
geographic_average <- function(lon, lat, weight = NULL) {
if (is.null(weight)) {
weight <- rep(1, length(lon))
}
lon <- weighted.mean(lon, w = weight)
lat <- weighted.mean(lat, w = weight)
data.frame(lon = lon, lat = lat)
}
```

A bit of a more complex version uses vector algebra – first converting the points to radians, then sticking them on a Cartesian plane. This was well explained on Ask Dr. Math. I call this the `geographic_midpoint()`

.

```
geographic_midpoint <- function(lon, lat, weight = NULL) {
if (is.null(weight)) {
weight <- rep(1, length(lon))
}
# degrees to radians
lat <- lat * pi / 180
lon <- lon * pi / 180
# cartesian coordinates
x <- cos(lat) * cos(lon)
y <- cos(lat) * sin(lon)
z <- sin(lat)
# weighted mean
x <- weighted.mean(x, w = weight)
y <- weighted.mean(y, w = weight)
z <- weighted.mean(z, w = weight)
# convert to lat and lon
lon <- atan2(y, x) * 180 / pi
hyp <- sqrt(x * x + y * y)
lat <- atan2(z, hyp) * 180 / pi
data.frame(lon = lon, lat = lat)
}
```

A final way to do this is using the **geosphere** package along with the `centroid`

function. This calculates the center by projecting the points onto a spherical polygon and finding the centroid.

I’ll use all three methods to demonstrate their difference. Some of my husband’s family live in Europe, so I will calculate the center both with and without them.

```
df_nospain <- df %>%
filter(city != "Madrid, Spain")
center <- list(
"Unweighted Average" = geographic_average(
lon = df$lon,
lat = df$lat),
"Weighted Average" = geographic_average(
lon = df$lon,
lat = df$lat,
weight = df$number),
"Unweighted Midpoint" = geographic_midpoint(
lon = df$lon,
lat = df$lat),
"Weighted Midpoint" = geographic_midpoint(
lon = df$lon,
lat = df$lat,
weight = df$number),
"Centroid" = data.frame(geosphere::centroid(
df[, c("lon", "lat")])),
"Unweighted Average (No Spain)" = geographic_average(
lon = df_nospain$lon,
lat = df_nospain$lat),
"Weighted Average (No Spain)" = geographic_average(
lon = df_nospain$lon,
lat = df_nospain$lat,
weight = df_nospain$number),
"Unweighted Midpoint (No Spain)" = geographic_midpoint(
lon = df_nospain$lon,
lat = df_nospain$lat),
"Weighted Midpoint (No Spain)" = geographic_midpoint(
lon = df_nospain$lon,
lat = df_nospain$lat,
weight = df_nospain$number),
"Centroid (No Spain)" = data.frame(geosphere::centroid(
df_nospain[, c("lon", "lat")]))
)
center <- bind_rows(center, .id = "calculation")
```

## Visualizing

Now that we’ve calculated the various “center” points, we can plot them using **leaflet**. Since I have so many different things to visualize, I am going to use a color palette to help me choose the marker colors! You can install this package from GitHub with `devtools::install_github("EmilHvitfeldt/paletteer")`

What a great excuse to try out Emil Hvitfeldt’s new **paletteer** 📦!

```
cols <- paletteer::paletteer_d("LaCroixColoR",
"PassionFruit",
n = 12,
type = "continuous")
```

```
leaflet() %>%
addTiles() %>%
setView(lng = -82, lat = 38, zoom = 04) %>%
addCircleMarkers(lng = df$lon,
lat = df$lat,
color = cols[1],
label = df$city,
radius = 5) %>%
addCircleMarkers(lng = center$lon,
lat = center$lat,
color = cols[2:12],
label = center$calculation,
radius = 8)
```

The various methods seemed to yield very similar results when not including Spain, but otherwise they are noticeably different. Without Spain, it looks like if we want the next family reunion to be equidistant (as the crow flies 🐦), then we will be somewhere in West Virginia or southern Ohio. If we incorporate a weight based on how many relatives live in each location, it looks like we’ll end up in Wayne National Forest (coincidentally, we held the reunion here 4 years ago!). If we include the European relatives, using the average or midpoint methods, we could end up in Pennsylvania, Maryland, or Delaware; otherwise we will be taking a cruise 🛳 across the Atlantic!

comments powered by Disqus