Life Expectancy Animation#
In this Step, you will create an animated scatterplot on global demography similar to the one in a famous talk by Hans Rosling.
Step 1: Download the Data#
You will use data by the Gapminder Foundation. Go to www.gapminder.org/data and download CSV files on:
life expectancy
fertility rate, total
population
Step 2#
Load all three tables. Make sure the data is correct:
inspect the dimensions you should have 200+ rows and 60+ columns.
make sure the country names are in the row index
make sure the years are in the column index
use
df.columns
to check whether the column index consists of strings or integers (both are ok but you need to know)
Step 3#
Choose fertility and life expectancy for the year 2015 and put them into a single table.
Step 4#
Remove all rows with missing values.
Step 5#
Draw a scatterplot of fertility over life expectancy in 2015.
Step 6#
Repeat steps 3-5 for the year 1960. What differences do you observe?
Ideally, format the axes of both plots so that they are the same. This
can be done (e.g. with plt.axes([left, bottom, width, height])
).
Step 7#
Write a function that allows you to draw a scatterplot for any given year.
Step 8#
Create one scatterplot for each year from 1960 to 2010 and write it to a file.
Step 9#
Connect the scatterplots to an animation.
You can use the imageio
module. Install it with:
pip install imageio
Then adapt and run the code:
import imageio
plots = [
imageio.imread(f'myplot_{year}.png'))
for year in range(___, ___)
]
imageio.mimsave('animation.gif', plots, fps=20)
Step 10#
Also read the population file. Because the population is written like 500k or 20M, you need a conversion function that converts a single value:
def pop_convert(x):
if pop.endswith('k'):
return int(pop[:-1]) * 100_000
...
Use the df.apply()
function to convert the entire DataFrame or a
single column:
df.apply(pop_convert)
Step 11#
Divide the population by 100_000
. Use the population to control the
size of the bubbles in the scatterplot (df.plot.scatter
accepts an
argument s
of the type pd.Series
).
Step 12#
Let's color by continent:
Read a list of country-continent pairs (continents.csv
).
Create an extra continent column.
Ignore countries for which the continent information does not fit (probably because of spelling).
Use the continents to color the scatterplot bubbles by continent.
Hint
In matplotlib, this is very tedious. You may need to convert the
continents to integers starting from 0. With seaborn
it is a lot easier.
Step 13#
If you use the data on your website or GitHub profile, please copy the license remark from the Gapminder page.