Hans Rosling-style Animated Bubble Plots in R
The post-Soviet Edition
Purpose
In this blog post, I’ll show how to make Hans Rosling -style interactive bubble charts with R. The resulting interactive plot we’ll build looks like this:
Hans Rosling-style Bubble Plots
Hans Rosling was a physician, academic, public speaker and enthusiastic promoter of a fact-based worldview. I heard the first time about him when I was taking intro courses on statistics, and I remember thinking already back then that his visualizations were impressive. I also managed (finally) to read his Factfulness book (Rosling, Rönnlund, and Rosling 2020) recently, and I thought it would be nice to try out the kind of visualizations he uses on his presentations.
Luckily, it turned out that imitating Rosling’s visualizations isn’t so hard these days if you know a little bit of R. I started my investigation by simply Googling “hans rosling style plots in r.” The first instructions I found were done by Tristan Ganry (2018), and I learned from his instructions that you can generate Rosling-style plots easily using gganimate (for creating GIF files) or plotly (for HTML files) package. Since I was mainly interested in creating HTML-type interactive plots, this post deals only with the plotly package method.
Interactive (Bubble) Charts Using the plotly Package
So, how to imitate the kind of bubble chart as shown under the title of this blog post? After some Googling, I found the needed instructions from plotly package’s webpage under the topic “Intro to Animations in R” (plotlygraphs 2017). Actually, the example data used in these instructions was exactly from Gapminder, and there’s a R package for this dataset which can be downloaded to your R environment simply by typing library(gapminder)
in your RMarkdown code chunk or your R console:
library(gapminder)
gapminder
## # A tibble: 1,704 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # … with 1,694 more rows
Here’s an example of interactive plot from plotly’s webpage (ibid.):
library(plotly)
library(gapminder)
df <- gapminder
fig <- df %>%
plot_ly(
x = ~gdpPercap,
y = ~lifeExp,
size = ~pop,
color = ~continent,
frame = ~year,
text = ~country,
hoverinfo = "text",
type = 'scatter',
mode = 'markers'
)
fig <- fig %>% layout(
xaxis = list(
type = "log"
)
)
fig
In the chart above, life expectancy is displayed on Y-axis (the vertical axis), gross domestic product (GDP) per capita is displayed on X-axis (the horizontal axis), bubbles capture the population size in each country (the bigger the bubble, the more populated country), and color describes which continent countries represent. By pressing the “Play”-button, we can see how these variables have changed over time between the years 1952 and 2007. As we can see from the chart, life expectancy and wealth are related to each other: as either one grows, another grows too, and vice versa. We can also see that in 1952 Asian and African countries were clustered in the bottom-left corner of the graph (poor and short life expectancy), whereas European and Oceanian countries are closer to the top-right corner (rich and long life expectancy). Most of the countries in the “Americas” group, in turn, are somewhere in between the poor and short life expectancy, and rich and long life expectancy. However, as we press the “Play”-button and look at how these variables change in time, we can see that more and more countries are clustering in the top-right corner (rich and long life expectancy) of the chart. This is exactly one of Rosling’s main points: it’s dated to speak about “the West and the rest” because today many non-Western countries have caught up the “West” in life expectancy and wealth.
However, since I’m specializing in Russian and former Soviet Union countries, I decided to do my Rosling-style bubble chart visualization only with post-Soviet countries. Also, since all post-Soviet countries are located either in Europe or in Asia, I wanted to create custom country groups for my own visualization (Rosling usually divides countries by continents, as seen in the visualization above).
Data
As always with data analysis, the first step is to find an appropriate dataset for your purposes. First, I played around with Gapminder foundation’s data on life expectancy, GDP per capita, and population statistics (available at: Gapminder Foundation 2022), but decided to use eventually the data of Our World in Data (2022) (this, by the way, is perhaps my favorite data source).
# Import libraries
library(tidyverse)
library(here)
# Import GDP data
gdp <- paste(here("content/blog/psoviet-bubble-charts/data/gdp-per-capita-worldbank.csv")) %>%
read.csv()
# Import life expectancy data
life_expectancy <- paste(here("content/blog/psoviet-bubble-charts/data/life-expectancy.csv")) %>%
read.csv()
# Import population data
population <- paste(here("content/blog/psoviet-bubble-charts/data/population-since-1800.csv")) %>%
read.csv()
# Put all data frames into list
df_list <- list(gdp, life_expectancy, population)
# Merge all data frames in list
full_data <- Reduce(function(x, y) merge(x, y, all=TRUE), df_list)
# Filter for post-Soviet countries
p_soviet <- full_data %>%
filter(Entity %in% c("Russia",
"Belarus",
"Ukraine",
"Estonia",
"Latvia",
"Lithuania",
"Armenia",
"Azerbaijan",
"Georgia",
"Turkmenistan",
"Uzbekistan",
"Kazakhstan",
"Kyrgyzstan",
"Tajikistan",
"Moldova"),
# Let's select only years 1990-2019
Year >= 1990 & Year <= 2019) %>%
# Rename some lengthy variable names for the new dataset
rename(gdp_per_capita = GDP.per.capita..PPP..constant.2017.international...,
life_exp = Life.expectancy,
pop = Population..historical.estimates.,
country = Entity,
cntr_code = Code,
year = Year)
# Add country groups to post-Soviet states data
p_soviet <- p_soviet %>%
mutate(
country_group = case_when(
# CIS-membership
country %in% c("Belarus",
"Kazakhstan",
"Kyrgyzstan",
"Russia",
"Tajikistan",
"Uzbekistan") ~ "CIS-country",
# EU-membership
country %in% c("Estonia",
"Latvia",
"Lithuania") ~ "EU-member",
# EU-partnership
country %in% c("Georgia",
"Ukraine") ~ "EU-partner",
# EU-partnership and CIS-member
country %in% c("Armenia",
"Azerbaijan",
"Moldova") ~ "CIS-country & EU-partner",
# Other categories
country == "Turkmenistan" ~ "Other"
)
)
# Look at the data
head(p_soviet)
## country cntr_code year gdp_per_capita life_exp pop
## 1 Armenia ARM 1990 5180.061 67.879 3538164
## 2 Armenia ARM 1991 4616.945 67.870 3505249
## 3 Armenia ARM 1992 2735.787 67.990 3442820
## 4 Armenia ARM 1993 2554.172 68.218 3363111
## 5 Armenia ARM 1994 2757.232 68.538 3283664
## 6 Armenia ARM 1995 3008.233 68.938 3217349
## country_group
## 1 CIS-country & EU-partner
## 2 CIS-country & EU-partner
## 3 CIS-country & EU-partner
## 4 CIS-country & EU-partner
## 5 CIS-country & EU-partner
## 6 CIS-country & EU-partner
tail(p_soviet)
## country cntr_code year gdp_per_capita life_exp pop country_group
## 445 Uzbekistan UZB 2014 5764.493 70.671 30426394 CIS-country
## 446 Uzbekistan UZB 2015 6086.716 70.928 30929556 CIS-country
## 447 Uzbekistan UZB 2016 6346.335 71.171 31441753 CIS-country
## 448 Uzbekistan UZB 2017 6518.805 71.388 31959774 CIS-country
## 449 Uzbekistan UZB 2018 6755.481 71.573 32476232 CIS-country
## 450 Uzbekistan UZB 2019 7014.325 71.725 32981714 CIS-country
It seems like the data wrangling went as supposed to. Let’s now proceed to actual interactive plotting.
Interactive Bubble Chart: the post-Soviet Edition
After the data importing and wrangling is done and our data is ready, all we need to do is to use our data within the plot_ly()
function from the plotly package.
Note that I filtered the year variable to start from 1995, because the GDP per capita data for Estonia, Latvia, and Lithuania was missing from 1990–1994 in this dataset.
# Import the plotly package
library(plotly)
# Create the plotly plot
soviet_fig <- p_soviet %>%
filter(year >= 1995) %>%
plot_ly(
x = ~gdp_per_capita, # defines the X-axis/horizontal axis
y = ~life_exp, # defines the Y-axis/vertical axis
size = ~pop, # this defines the "bubbles"
color = ~country_group, # defines categories
frame = ~year, # defines
# Adding hover over options to scatter dots
text = ~paste("</br> Country: ", country,
"</br> Life Expectancy: ", round(life_exp, digits = 0),
"</br> GDP per capita: ", round(gdp_per_capita, digits = 0),
"</br> Population: ", pop),
# Adding hex colors to country groups
colors = c("#ff031c", # CIS-countries
"#ffa303", # CIS-country & EU-partner
"#0307ff", # EU-member
"#1bab3a", # EU-partner
"#03dbfc"), # Other
alpha = 0.5, # defines the opacity/transparency of bubbles
sizes = c(10,1000), # defines bubble size: play around with these values
hoverinfo = "text", # defines what is displayed when hovering over the bubbles
type = "scatter", # defines chart type
mode = "markers")
# Add labels
soviet_fig <- soviet_fig %>%
layout(title = list(text = "Life expectancy, GDP per capita and population in post-Soviet\ncountries (in 1995-2019)",
x = 0.1), # this changes the title position to left
xaxis = list(title = "GDP per capita (in constant international-$)"),
yaxis = list(title = "Life expectancy (in years)"),
legend=list(title=list(text='<b> Country Group </b>')))
# Print the final plot
soviet_fig
Now our interactive plot is ready: life expectancy is displayed on the Y-axis, GDP per capita on the X-axis, bubbles capture the population size, and colors depict the country groups. Here, “CIS” refers to “Commonwealth of Independent States” countries, “EU-partner” refers to “European Union’s Eastern Partnership” countries, and “EU” refers to “European Union” countries. “CIS-country & EU-partner,” in turn, depicts countries that are both Eastern Partnership countries and members of the Commonwealth of Independent States (NB: Belarus is suspended from the former, and hence displayed only as a “CIS-country”).
As we can see by examining this chart, there seems to a positive linear relationship between life expectancy and wealth within the group of post-Soviet countries too. We may also observe that most of the post-Soviet countries were quite tightly clustered together still in 1995, and both the life expectancy and GDP per capita were low in most countries. However, as we press the “Play”-button and move closer to the present, we can see that the Baltic states – Estonia, Latvia, and Lithuania – are significantly richer in terms of GPD per capita in 2019 than the rest of the post-Soviet countries, and also their life expectancy is higher than in rest of the countries examined here. Moreover, Turkmenistan has significantly lower life expectancy compared to other post-Soviet countries in 2019, and it’s the only country that doesn’t belong to any cross-national union. We can also see that Russia, Kazakhstan, Uzbekistan, and Ukraine are the most population-rich countries (i.e. the biggest bubbles) of former Soviet Union countries. It’s also notable how GDP per capita stagnates or rolls back in 2008–2009, shortly after the global financial crisis hit the world. Naturally, one can do further interpretations about the chart too.
Now we are done. The next step is to do your own analysis and apply your own data. Thanks for reading, hopefully you found these instructions useful and interesting!