Hans Rosling-style Animated Bubble Plots in R

The post-Soviet Edition

Image is a screenshot from Gapminder’s webpage

Purpose

In this blog post, I’ll show how to make Hans Rosling -style interactive bubble charts with R. The resulting interactive plot we’ll build looks like this:


Hans Rosling-style Bubble Plots

Hans Rosling was a physician, academic, public speaker and enthusiastic promoter of a fact-based worldview. I heard the first time about him when I was taking intro courses on statistics, and I remember thinking already back then that his visualizations were impressive. I also managed (finally) to read his Factfulness book (Rosling, Rönnlund, and Rosling 2020) recently, and I thought it would be nice to try out the kind of visualizations he uses on his presentations.

Luckily, it turned out that imitating Rosling’s visualizations isn’t so hard these days if you know a little bit of R. I started my investigation by simply Googling “hans rosling style plots in r.” The first instructions I found were done by Tristan Ganry (2018), and I learned from his instructions that you can generate Rosling-style plots easily using gganimate (for creating GIF files) or plotly (for HTML files) package. Since I was mainly interested in creating HTML-type interactive plots, this post deals only with the plotly package method.

Interactive (Bubble) Charts Using the plotly Package

So, how to imitate the kind of bubble chart as shown under the title of this blog post? After some Googling, I found the needed instructions from plotly package’s webpage under the topic “Intro to Animations in R” (plotlygraphs 2017). Actually, the example data used in these instructions was exactly from Gapminder, and there’s a R package for this dataset which can be downloaded to your R environment simply by typing library(gapminder) in your RMarkdown code chunk or your R console:

library(gapminder)
gapminder
## # A tibble: 1,704 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows

Here’s an example of interactive plot from plotly’s webpage (ibid.):

library(plotly)
library(gapminder)

df <- gapminder 
fig <- df %>%
  plot_ly(
    x = ~gdpPercap, 
    y = ~lifeExp, 
    size = ~pop, 
    color = ~continent, 
    frame = ~year, 
    text = ~country, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  )
fig <- fig %>% layout(
    xaxis = list(
      type = "log"
    )
  )

fig


In the chart above, life expectancy is displayed on Y-axis (the vertical axis), gross domestic product (GDP) per capita is displayed on X-axis (the horizontal axis), bubbles capture the population size in each country (the bigger the bubble, the more populated country), and color describes which continent countries represent. By pressing the “Play”-button, we can see how these variables have changed over time between the years 1952 and 2007. As we can see from the chart, life expectancy and wealth are related to each other: as either one grows, another grows too, and vice versa. We can also see that in 1952 Asian and African countries were clustered in the bottom-left corner of the graph (poor and short life expectancy), whereas European and Oceanian countries are closer to the top-right corner (rich and long life expectancy). Most of the countries in the “Americas” group, in turn, are somewhere in between the poor and short life expectancy, and rich and long life expectancy. However, as we press the “Play”-button and look at how these variables change in time, we can see that more and more countries are clustering in the top-right corner (rich and long life expectancy) of the chart. This is exactly one of Rosling’s main points: it’s dated to speak about “the West and the rest” because today many non-Western countries have caught up the “West” in life expectancy and wealth.

However, since I’m specializing in Russian and former Soviet Union countries, I decided to do my Rosling-style bubble chart visualization only with post-Soviet countries. Also, since all post-Soviet countries are located either in Europe or in Asia, I wanted to create custom country groups for my own visualization (Rosling usually divides countries by continents, as seen in the visualization above).

Data

As always with data analysis, the first step is to find an appropriate dataset for your purposes. First, I played around with Gapminder foundation’s data on life expectancy, GDP per capita, and population statistics (available at: Gapminder Foundation 2022), but decided to use eventually the data of Our World in Data (2022) (this, by the way, is perhaps my favorite data source).

# Import libraries
library(tidyverse)
library(here)

# Import GDP data
gdp <- paste(here("content/blog/psoviet-bubble-charts/data/gdp-per-capita-worldbank.csv")) %>% 
  read.csv()

# Import life expectancy data
life_expectancy <- paste(here("content/blog/psoviet-bubble-charts/data/life-expectancy.csv")) %>% 
  read.csv()

# Import population data
population <- paste(here("content/blog/psoviet-bubble-charts/data/population-since-1800.csv")) %>% 
  read.csv()

# Put all data frames into list
df_list <- list(gdp, life_expectancy, population)

# Merge all data frames in list
full_data <- Reduce(function(x, y) merge(x, y, all=TRUE), df_list)

# Filter for post-Soviet countries
p_soviet <- full_data %>% 
  filter(Entity %in% c("Russia",  
                        "Belarus",
                        "Ukraine",
                        "Estonia",
                        "Latvia",
                        "Lithuania",
                        "Armenia",
                        "Azerbaijan",
                        "Georgia",
                        "Turkmenistan",
                        "Uzbekistan",
                        "Kazakhstan",
                        "Kyrgyzstan",
                        "Tajikistan",
                        "Moldova"),
         # Let's select only years 1990-2019
         Year >= 1990 & Year <= 2019) %>% 
  # Rename some lengthy variable names for the new dataset
  rename(gdp_per_capita = GDP.per.capita..PPP..constant.2017.international...,
         life_exp = Life.expectancy,
         pop = Population..historical.estimates.,
         country = Entity,
         cntr_code = Code,
         year = Year)

# Add country groups to post-Soviet states data
p_soviet <- p_soviet %>% 
  mutate(
    country_group = case_when(
      # CIS-membership
      country %in% c("Belarus",
                     "Kazakhstan",
                     "Kyrgyzstan",
                     "Russia",
                     "Tajikistan",
                     "Uzbekistan") ~ "CIS-country",
      # EU-membership
      country %in% c("Estonia",
                     "Latvia",
                     "Lithuania") ~ "EU-member",
      # EU-partnership
      country %in% c("Georgia",
                     "Ukraine") ~ "EU-partner",
      # EU-partnership and CIS-member
      country %in% c("Armenia",
                     "Azerbaijan",
                     "Moldova") ~ "CIS-country & EU-partner",
      # Other categories
      country == "Turkmenistan" ~ "Other"
    )
  )

# Look at the data
head(p_soviet)
##   country cntr_code year gdp_per_capita life_exp     pop
## 1 Armenia       ARM 1990       5180.061   67.879 3538164
## 2 Armenia       ARM 1991       4616.945   67.870 3505249
## 3 Armenia       ARM 1992       2735.787   67.990 3442820
## 4 Armenia       ARM 1993       2554.172   68.218 3363111
## 5 Armenia       ARM 1994       2757.232   68.538 3283664
## 6 Armenia       ARM 1995       3008.233   68.938 3217349
##              country_group
## 1 CIS-country & EU-partner
## 2 CIS-country & EU-partner
## 3 CIS-country & EU-partner
## 4 CIS-country & EU-partner
## 5 CIS-country & EU-partner
## 6 CIS-country & EU-partner
tail(p_soviet)
##        country cntr_code year gdp_per_capita life_exp      pop country_group
## 445 Uzbekistan       UZB 2014       5764.493   70.671 30426394   CIS-country
## 446 Uzbekistan       UZB 2015       6086.716   70.928 30929556   CIS-country
## 447 Uzbekistan       UZB 2016       6346.335   71.171 31441753   CIS-country
## 448 Uzbekistan       UZB 2017       6518.805   71.388 31959774   CIS-country
## 449 Uzbekistan       UZB 2018       6755.481   71.573 32476232   CIS-country
## 450 Uzbekistan       UZB 2019       7014.325   71.725 32981714   CIS-country


It seems like the data wrangling went as supposed to. Let’s now proceed to actual interactive plotting.

Interactive Bubble Chart: the post-Soviet Edition

After the data importing and wrangling is done and our data is ready, all we need to do is to use our data within the plot_ly() function from the plotly package.

Note that I filtered the year variable to start from 1995, because the GDP per capita data for Estonia, Latvia, and Lithuania was missing from 1990–1994 in this dataset.

# Import the plotly package 
library(plotly)

# Create the plotly plot 
soviet_fig <- p_soviet %>% 
  filter(year >= 1995) %>% 
  plot_ly(
    x = ~gdp_per_capita, # defines the X-axis/horizontal axis
    y = ~life_exp, # defines the Y-axis/vertical axis
    size = ~pop, # this defines the "bubbles"
    color = ~country_group, # defines categories
    frame = ~year, # defines 
    # Adding hover over options to scatter dots
    text = ~paste("</br> Country: ", country,
                  "</br> Life Expectancy: ", round(life_exp, digits = 0),
                  "</br> GDP per capita: ", round(gdp_per_capita, digits = 0),
                  "</br> Population: ", pop),
    # Adding hex colors to country groups
    colors = c("#ff031c", # CIS-countries
               "#ffa303", # CIS-country & EU-partner
              "#0307ff", # EU-member
              "#1bab3a", # EU-partner
              "#03dbfc"), # Other
    alpha = 0.5, # defines the opacity/transparency of bubbles
    sizes = c(10,1000), # defines bubble size: play around with these values
    hoverinfo = "text", # defines what is displayed when hovering over the bubbles
    type = "scatter", # defines chart type
    mode = "markers")

# Add labels
soviet_fig <- soviet_fig %>%
  layout(title = list(text = "Life expectancy, GDP per capita and population in post-Soviet\ncountries (in 1995-2019)",
                      x = 0.1), # this changes the title position to left
         xaxis = list(title = "GDP per capita (in constant international-$)"),
         yaxis = list(title = "Life expectancy (in years)"),
         legend=list(title=list(text='<b> Country Group </b>')))

# Print the final plot
soviet_fig


Now our interactive plot is ready: life expectancy is displayed on the Y-axis, GDP per capita on the X-axis, bubbles capture the population size, and colors depict the country groups. Here, “CIS” refers to “Commonwealth of Independent States” countries, “EU-partner” refers to “European Union’s Eastern Partnership” countries, and “EU” refers to “European Union” countries. “CIS-country & EU-partner,” in turn, depicts countries that are both Eastern Partnership countries and members of the Commonwealth of Independent States (NB: Belarus is suspended from the former, and hence displayed only as a “CIS-country”).

As we can see by examining this chart, there seems to a positive linear relationship between life expectancy and wealth within the group of post-Soviet countries too. We may also observe that most of the post-Soviet countries were quite tightly clustered together still in 1995, and both the life expectancy and GDP per capita were low in most countries. However, as we press the “Play”-button and move closer to the present, we can see that the Baltic states – Estonia, Latvia, and Lithuania – are significantly richer in terms of GPD per capita in 2019 than the rest of the post-Soviet countries, and also their life expectancy is higher than in rest of the countries examined here. Moreover, Turkmenistan has significantly lower life expectancy compared to other post-Soviet countries in 2019, and it’s the only country that doesn’t belong to any cross-national union. We can also see that Russia, Kazakhstan, Uzbekistan, and Ukraine are the most population-rich countries (i.e. the biggest bubbles) of former Soviet Union countries. It’s also notable how GDP per capita stagnates or rolls back in 2008–2009, shortly after the global financial crisis hit the world. Naturally, one can do further interpretations about the chart too.

Now we are done. The next step is to do your own analysis and apply your own data. Thanks for reading, hopefully you found these instructions useful and interesting!

References

Ganry, Tristan. 2018. “How to Build Animated Charts Like Hans Rosling Doing It All in R.” https://towardsdatascience.com/how-to-build-animated-charts-like-hans-rosling-doing-it-all-in-r-570efc6ba382.
Gapminder Foundation. 2022. “Download the Data | Gapminder.” https://www.gapminder.org/data/.
Our World in Data. 2022. “Our World in Data.” https://ourworldindata.org.
plotlygraphs. 2017. “Intro to Animations.” https://plotly.com/r/animations/.
Rosling, Hans, Anna Rosling Rönnlund, and Ola Rosling. 2020. Factfulness: Ten Reasons We’re Wrong about the World and Why Things Are Better Than You Think. Reprint edition. New York City: Flatiron Books.
Eemil Mitikka
Eemil Mitikka
PhD Researcher in Russian & Eurasian Studies