How to slice Series and Dataframes with Pandas notnull?

As we all know, we often source data that is not suitable for analysis from the get go. Sometimes as part of your Data Wrangling process we need to easily filter and subset our data and omit missing / NaN /empty values to try to make sense of the data in front of us.

To explain this topic we’ll use a very simple DataFrame, which we’ll manually create:

# Python 3
import pandas as pd

# Create the students test DataFrame
students = pd.DataFrame ([["Tommy", 90], ["Harry", 95], 
["Liam", None]], columns = ["Name", "GPA"])

Let’s look at the DataFrame, using the head method:

students.head()
NameGPA
0Tommy90.0
1Harry95.0
2LiamNaN

Filter Null values from a Series

The method pandas.notnull can be used to find empty values (NaN) in a Series (or any array).

Let’s use pd.notnull in action on our example.

pd.notnull(students["GPA"])

Will return True for the first 2 rows in the Series and False for the last.

0     True
1     True
2    False
Name: GPA, dtype: bool

We can use the boolean array to filter the series as following:

students["GPA"][pd.notnull(students["GPA"])]
0    90.0
1    95.0
Name: GPA, dtype: float64

Filter Null values from a DataFrame

More interesting is to use the notnull method on a DataFrame that you might have acquired from a file, a database table, or an API.

Let’s see an example of using pd.notnull on a Dataframe:

students[students["GPA"].notnull()]

Will filter out with empty observations in the GPA column.

NameGPA
0Tommy90.0
1Harry95.0

Notes:

  • This might look like a very simplistic example, but when working when huge datasets, the ability to easily select not null values is extremely powerful.
  • Using the notna method would have provided the same result.
  • Could be that you’ll need to remove observations include empty values. For that you’ll use the dropna method.
  • More examples are available in our tutorial on filtering empty rows from a DataFrame.

Leave a Comment