In this recipe we’ll learn how to easily count missing values in Pandas DataFrames and Series. We’ll specifically look into two key cases:
- Count missing/null values in a DataFrame column (Series).
- Count missing values in a specific DataFrame row
Create the Dataset
Type this into your Python environment:
# Python3
# let's create the dataframe
import pandas as pd
data= [["Debbie", np.nan], ["Tommy", np.nan],[np.nan, 25000],
['Tim', 30000]]
cols = ['person', 'budget']
df = pd.DataFrame(data, columns=cols)
df.head()
Here’s our newly created Dataset:
person | budget | |
---|---|---|
0 | Debbie | NaN |
1 | Tommy | NaN |
2 | NaN | 25000.0 |
3 | Tim | 30000.0 |
Count nan values in Pandas columns
We’ll first subset out the budget column into a Series an apply the isna() Series method.
# count NA in a column
df['budget'].isna().sum()
The result is 2.
Important Note: Using the df[‘budget’].isna().count() statement will simply calculate the Series lenght and provide an erroneous result. In this case – 4.
Count missing values in a row
A bit more interesting scenario is how to calculate the number of nan values ina specific row.
# sum NA in rows
df['num_of_na'] = df.apply(lambda na: na.isnull().sum(), axis=1)
We use a lambda function to calculate the number of missing values. The axis=1 statement allows us to apply the calculation across the different columns of the Dataframe.
person | budget | num_of_na | |
---|---|---|---|
0 | Debbie | NaN | 1 |
1 | Tommy | NaN | 1 |
2 | NaN | 25000.0 | 1 |
3 | Tim | 30000.0 | 0 |
In case of feedback or questions, please leave us a comment.