I am using the tidycensus R package to pull in census data and geometries. I want to be able to calculate population densities and have the results match what I see on censusreporter.org. I am noticing a difference between the geography variables returned from tidycenus compared to what I calculate myself using the sf package sf::st_area() function.
library(tidyverse)
library(tidycensus)
census_api_key("my_api_key")
library(sf)
options(tigris_use_cache = TRUE)
pop_texas <-
get_acs(geography = 'state',
variables = "B01003_001", # Total Population
year = 2020,
survey = 'acs5',
keep_geo_vars = TRUE,
geometry = TRUE) %>%
filter(GEOID == '48') # Filter to Texas
Since I included the keep_geo_vars argument as TRUE it returned an ALAND column which I believe is the correct area for the geography returned in square meters (m^2).
> pop_texas$ALAND %>% format(big.mark=",")
[1] "676,680,588,914"
# Conversion to square miles
> (pop_texas$ALAND / 1000000 / 2.5899881) %>% format(big.mark=",")
[1] "261,267.8"
When I convert the ALAND amount to square miles I get the same number as shown on censusreporter.org:
I have also tried to calculate the area using the sf::st_area() function, but I get a different result:
> sf::st_area(pop_texas) %>% format(big.mark=",", scientific=FALSE)
[1] "688,276,954,146 [m^2]"
# Conversion to square miles
> (sf::st_area(pop_texas) / 1000000 / 2.5899881) %>%
+ as.numeric() %>%
+ format(big.mark=",", scientific=FALSE)
[1] "265,745.2"
Please let me know if there is something I am missing to reconcile these numbers. I would expect to get the same results either directly through tidycensus or calculating the area using sf::st_area().
Right now I am off by a lot:
> (pop_texas$ALAND - as.numeric(st_area(pop_texas)) ) %>%
+ format(big.mark=",")
[1] "-11,596,365,232"
If you want the "official" area of a shape like Texas you should always use the ALAND
or published area value. st_area()
is using geometry to calculate the area of the polygon which is always going to be a simplified and imperfect representation of Texas (or any other area). For smaller shapes (like Census tracts) the calculations will probably be pretty close; for larger shapes like states (especially those with complex coastal geography, like Texas) you're going to be further off.