Inspect DataFrames#

Check all Stations#

../_images/corridor.jpeg

Before you can take off, you need to check whether your ship is in a good shape. The Data Starfleet inspection procedure is to check all stations of the ship, and see whether there is anything that requires your attention. The ships computer has prepared a tabular report already. All you need to do is to see what is in the table.

Load the report ship_inspection.csv into a DataFrame and check the following commands.


What is a DataFrame?#

The class pandas.DataFrame is the central data structure in pandas. In most aspects, you can think of it as a table with row and column labels, also called Index. Each column has the type pandas.Series.

../_images/dataframe.svg

Inspect a few rows#

You might want to see a few rows. In Jupyter, you can do that by typing the variable name in a cell. In a Python script, you could print(df). But you could use the more explicit methods head() and tail():

df.head(3)
df.tail(3)
Can I call head() and tail() without a number?

Yes. The default number if you leave away the argument is 5.


Number of rows and columns#

The shape of a DataFrame is alway a tuple of two integers (rows, columns).

df.shape
When do I need to check df.shape?

df.shape is your most important command when debugging. If the shape is not what you expect, everything is wrong.


Data Types#

When you load data from a CSV file, pandas automatically infers data types. Sometimes a single wrong value converts a numerical column to strings.

df.dtypes
What does the type object mean?

The type object usually means that the column contains strings.


Generic Overview#

You can check data types, the number of entries for each column and memory size with a single command:

df.info()

Unique Values#

With categorical columns, you might want to know, what are the most frequent values or what different values occur. This also helps you to identify some data errors.

df['column_name'].value_counts()

If you are not interested in the count, check the unique values:

df['column_name'].unique()

../_images/inspect.jpeg

Challenge#

Inspect the report ship_inspection.csv. Solve the following tasks

  • display the number of rows and columns

  • display the last 5 rows

  • list the column names

  • how many life forms are there on the bridge?

  • how many stations does the ship have?

  • how many different security levels are there?

  • there is one life support value that is neither 0 or 100%. How much is it?