Select Rows and Columns#

Find your Crew#

../_images/crew.png

As a spaceship captain, you are nothing without your crew. You have five capable officers that help you run things. Unfortunately, you have no idea where they are. You assemble the entire crew on the flight deck. How did your officers look like again?

You will need to load the crew roster crew.csv to identify them.


Show column names#

You may want to access column names as a Python list. This is also useful to check what types the names are.

df.columns

Select a column#

A single column is returned as a pd.Series:

df['id']

Select multiple columns#

Multiple columns require double square brackets. The inner one is a list of column names:

df[['black_spots', 'white_spots']]

Select columns by position#

The Python slicing notation can be applied to select by position. The first slice selects all rows, the second selects columns 2-5.

df.iloc[:, 1:4]

Select rows by position#

You can use the first slice only to access rows:

df.iloc[10:20]

Select rows by index label#

This is very useful if your index contains something else than numbers, e.g.

earcolor = df.set_index('ears')  # new DF with different index
earcolor.loc['pink']

Filter by value#

This is very powerful selection logic that is applied to all rows simultaneously.

The notation with double square brackets looks a bit weird first. It is easier to understand if you know the inner expression results in a boolean mask that is used to filter the rows of the DataFrame.

df[df['ears'] == 'pink']

df[df['black_spots'] < 3]

df[df['black_spots'].between(3, 7)]

df[(df['black_spots'] < 3) & (df['white_spots'] > 7)]

Note that you have to use the binary operators &, | to combine multiple conditions. The logical operators and, or will not work.


Select random rows#

df.sample(7)

../_images/space_panda.jpeg

Challenge#

Select rows from the crew roster crew.csv to find your five officers. You have a couple of hints:

  • the Helmspanda (responsible for steering the ship) has the id 247.

  • the Data Science Officer (responsible for DS of course) has more than 18 white spots. They also have their ears dyed in indigo.

  • the Paw Plant (responsible for the reactor and engines) has an id between 100 and 199.

  • the Bamboo Chef (responsible for nutrition) has their ears dyed in chartreuse. They have fewer white spots than the paw plant.

  • the Pandalorian (responsible for weapons and tactics) has an unknown ear color. They wear a helmet all the time.

  • all of your officers have at least 12 white spots.

  • three of your officers have exactly 9 black spots.

  • none of your officers has their ears dyed blue.

Identify all five of them.


Data Generation#

Below you find the code to generate the data in crew.csv:

import pandas as pd
import numpy as np

np.random.seed(42)
N = 500
EARS = ['black', 'white', 'pink', 'blue', 'green', 'red', 'neon', 'orange', 'chartreuse', 'indigo', 'peachpuff', 'piercing', None]

index = pd.Series(np.arange(N), name='id')

df = pd.DataFrame(
    {
        'white_spots': np.random.randint(1, 20, size=N),
        'black_spots': np.random.randint(1, 20, size=N),
        'ears': np.random.choice(EARS, size=N)
    },
    index=index
)
df.to_csv('crew.csv')

Recap Exercise: Select Rows#

Load a dataset of penguins:

import seaborn as sns

df = sns.load_dataset('penguins')

Solve the following tasks:

# 1. display all Chinstrap penguins
...

# 2. display all penguins with a beak that is exactly 36 mm long
...

# 3. display the penguins with flippers shorter than 190 mm
...

# 4. display penguins weighing between 4000 and 4300 g
...

# 5. display female Adelie penguins
...

# 6. find at least one female Chinstrap penguin with a beak between 50 and 55 mm
...

# 7. display penguins with a missing bill length
...