Today’s post is by Kyle Walker, a professor of geography at Texas Christian University. I’ve been a fan of Kyle’s work for a while. When I saw that he wrote a package for accessing the Census Bureau’s International Data Base, I asked him to write a guest post about it.
The US Census Bureau’s International Data Base (IDB) is one of the best resources on the web for obtaining both historical and future projections of international demographic indicators. I’ve long used the IDB in my teaching, generally using its web interface to download data extracts. However, the Census Bureau also makes the IDB accessible via its API, which makes it much more convenient for programmers to access the data. Earlier this year, I wrote the R package idbr (https://github.com/walkerke/idbr) to help R programmers use the IDB in their projects.
idbr is available on CRAN, and can be installed with the command
install.packages('idbr'). Once installed, start up your idbr session with the following code:
library(idbr) idb_api_key('Your API key goes here')
To use the US Census Bureau API, you’ll need to get an API key from http://api.census.gov/data/key_signup.html. It is free and doesn’t take long for you to get the key. Supply the key to the
idb_api_key function to set your API key as an environment variable in your R session; you only need to do this once per session.
There are two main functions in idbr:
idb1 taps into the 1-year-age-band IDB dataset; this dataset includes population counts for single years of age, optionally for specific age ranges or by sex. At its simplest, a user can request data for a given country in a given year. Countries are identified by their FIPS 10-4 codes, which can be looked up with the countrycode package if you don’t know the code for your country of interest.
library(countrycode) countrycode('Canada', 'country.name', 'fips104')  "CA"
The country code for Canada is “CA”; this can now be supplied to the
ca <- idb1('CA', 2016) head(ca) Source: local data frame [6 x 6] AGE AREA_KM2 NAME POP FIPS time (dbl) (dbl) (chr) (dbl) (chr) (dbl) 1 0 9093507 Canada 361796 CA 2016 2 1 9093507 Canada 361650 CA 2016 3 2 9093507 Canada 361299 CA 2016 4 3 9093507 Canada 360863 CA 2016 5 4 9093507 Canada 360321 CA 2016 6 5 9093507 Canada 360143 CA 2016
The function returns a data frame with the default variables available in the 1-year dataset.
More demographic indicators are available via the
idb5 function, which includes population data by 5-year age bands as well as measurements of birth, death, and migration rates. Variables are accessible by supplying a variable name or a concept, which refers to a group of variables. To get a list of available variable and concept names, call
For example, we can specify the concept “Fertility rates” to get all of the fertility variables for Canada in 2016:
idb5('CA', 2016, concept = 'Fertility rates') Source: local data frame [1 x 12] ASFR15_19 ASFR20_24 ASFR25_29 ASFR30_34 ASFR35_39 ASFR40_44 ASFR45_49 GRR SRB TFR FIPS time (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (chr) (dbl) 1 13.8 53.3 101.9 101.4 41.7 7.1 0.3 0.7767 1.0563 1.5972 CA 2016
idb5 function as called above returns age-specific fertility rates for five-year age bands, as well as the gross reproduction rate, sex ratio at birth, and total fertility rate.
These demographic indicators are even more useful, however, when you can compare them across countries or over time. You can supply a vector of years to the
idb1 function to get single-year-of-age population counts over multiple years;
idb5 accepts both a vector of country codes and a vector of years. From here, you can design analyses or data visualizations to examine these temporal or cross-country comparisons.
Two animated examples with code are below. The examples use the gganimate extension to ggplot2 which wraps the animation package, so to reproduce the examples, you’ll need ImageMagick installed on your machine and on your system PATH.
Animated population pyramid of Nigeria, 1990-2050 (projected)
library(idbr) library(ggplot2) library(dplyr) library(gganimate) library(animation) idb_api_key("Your key goes here") male <- idb1('NI', 1990:2050, sex = 'male') %>% mutate(POP = POP * -1, SEX = 'Male') female <- idb1('NI', 1990:2050, sex = 'female') %>% mutate(SEX = 'Female') nigeria <- rbind(male, female) g1 <- ggplot(nigeria, aes(x = AGE, y = POP, fill = SEX, width = 1, frame = time)) + coord_fixed() + coord_flip() + geom_bar(data = subset(nigeria, SEX == "Female"), stat = "identity", position = 'identity') + geom_bar(data = subset(nigeria, SEX == "Male"), stat = "identity", position = 'identity') + scale_y_continuous(breaks = seq(-5000000, 5000000, 2500000), labels = c('5m', '2.5m', '0', '2.5m', '5m'), limits = c(min(nigeria$POP), max(nigeria$POP))) + theme_minimal(base_size = 14, base_family = "Tahoma") + scale_fill_manual(values = c('#98df8a', '#2ca02c')) + ggtitle('Population structure of Nigeria,') + ylab('Population') + xlab('Age') + theme(legend.position = "bottom", legend.title = element_blank()) + labs(caption = 'Chart by @kyle_e_walker | Data source: US Census Bureau IDB via the idbr R package') + guides(fill = guide_legend(reverse = TRUE)) gg_animate(g1, interval = 0.1, ani.width = 700, ani.height = 600)
Life expectancy at birth by sex in the former USSR, 1989-2016 (click the image for a clearer version)
library(idbr) library(dplyr) library(ggplot2) library(tidyr) library(countrycode) library(gganimate) library(tweenr) ctrys <- countrycode(c('Russia', 'Ukraine', 'Belarus', 'Moldova', 'Georgia', 'Kazakhstan', 'Uzbekistan', 'Lithuania', 'Latvia', 'Estonia', 'Kyrgyzstan', 'Tajikistan', 'Turkmenistan', 'Armenia', 'Azerbaijan'), 'country.name', 'fips104') idb_api_key("Your API key here") full <- idb5(country = ctrys, year = 1989:2016, variables = c('E0_F', 'E0_M'), country_name = TRUE) tmp <- full %>% filter(time == 1989) %>% arrange(E0_F) ord <- as.character(as.vector(tmp$NAME)) dft <- full %>% mutate(diff = E0_F - E0_M, ease = 'cubic-in-out') %>% select(-FIPS) %>% rename(Male = E0_M, Female = E0_F) %>% tween_elements(time = 'time', group = 'NAME', ease = 'ease', nframes = 500) %>% gather(Sex, value, Male, Female, -diff, -.group) %>% mutate(.group = factor(.group, levels = ord)) g <- ggplot() + geom_point(data = dft, aes(x = value, y = .group, color = Sex, frame = .frame), size = 14) + scale_color_manual(values = c('darkred', 'navy')) + geom_text(data = dft, aes(x = value, y = .group, frame = .frame, label = as.character(round(dft$value, 1))), color = 'white', fontface = 'bold') + geom_text(data = dft, aes(x = 80, y = 1.5, frame = .frame, label = round(dft$time, 0)), color = 'black', size = 12) + theme_minimal(base_size = 16, base_family = "Tahoma") + theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank()) + labs(y = '', x = '', color = '', caption = 'Data source: US Census Bureau IDB via the idbr R package; chart by @kyle_e_walker', title = 'Life expectancy at birth in the former USSR, 1989-2016') gg_animate(g, interval = 0.05, ani.width = 750, ani.height = 650, title_frame = FALSE)