Timelines of Baby Names#

In this chapter, we would like to know how the popularity of a name changes over time.

To reach that, we will need to wrangle and aggregate the data.

If you want to use the pandas library, you find a list of useful functions at the bottom.

Task 1#

Calculate for each name in the file yob2000.txt its percentage of total births. Store this percentage as an additional column.

Read all files yob1880.txt, yob1881.txt ... yob2021.txt. Add an extra column for the year. Concatenate them into a single data structure.

Calculate the total number of births for each year. Visualize the timeline as a line plot.

Now, create a timeline for your own name. First check if your name occurs at all. If yes, create a table with the columns year and number.

You may want to sum up the binary genders or choose one. With few exceptions, the influence on the result is tiny.

If your name is not very frequent, there might be missing data for some years. Insert missing data with a 0.

Draw a line plot and label the axes.

Investigate the popularity of the names of some US celebrities over the last 130 years. Plot a time line with 2-4 names.

Inspect the following celebrities or choose your own:

name	comment
Madonna	wrote “Like a Prayer”
Lance	went to the moon
Barack	president
Katrina	hurricane in New Orleans
Luke	Jedi
Leia	princess from Star Wars
Frida	painter, biography went on a Broadway show
Arielle	mermaid
Harley	it's not a motorcycle. It's a chopper
Tyrion	character in ‘Game of Thrones’
Khaleesi	job title in ‘Game of Thrones’

Finally, we will normalize the data. Repeat Task 4 or 5, but divide the count of a given name by the total number of births of that year.

How does the result change and why is this important?

function	comment
`df[new] = value`	create a column filled with the same value
``df[new] = df[col] * x``	create a column using arithmetics on an old one
`df[new] = df[col].a pply(func)`	create a column by calling a function for each value
`pd.concat[list]`	concatenates DataFrames. `list` is a list of DataFrames
`df[col].fillna(valu e, inplace=True)`	fills missing data
`df.groupby(col)`	splits the DataFrame into multiple sub-DataFrames
`df[col].plot()`	create line plots for 1+ columsn
`df.set_index(col)`	moves a column into the index (used for the x-axis in a plot)