As we all know, we often source data that is not suitable for analysis from the get go. Sometimes as part of your Data Wrangling process we need to easily filter and subset our data and omit missing / NaN /empty values to try to make sense of the data in front of us.
To explain this topic we’ll use a very simple DataFrame, which we’ll manually create:
# Python 3 import pandas as pd # Create the students test DataFrame students = pd.DataFrame ([["Tommy", 90], ["Harry", 95], ["Liam", None]], columns = ["Name", "GPA"])
Let’s look at the DataFrame, using the head method:
Filter Null values from a Series
The method pandas.notnull can be used to find empty values (NaN) in a Series (or any array).
Let’s use pd.notnull in action on our example.
Will return True for the first 2 rows in the Series and False for the last.
0 True 1 True 2 False Name: GPA, dtype: bool
We can use the boolean array to filter the series as following:
0 90.0 1 95.0 Name: GPA, dtype: float64
Filter Null values from a DataFrame
More interesting is to use the notnull method on a DataFrame that you might have acquired from a file, a database table, or an API.
Let’s see an example of using pd.notnull on a Dataframe:
Will filter out with empty observations in the GPA column.
- This might look like a very simplistic example, but when working when huge datasets, the ability to easily select not null values is extremely powerful.
- Using the notna method would have provided the same result.
- Could be that you’ll need to remove observations include empty values. For that you’ll use the dropna method.
- More examples are available in our tutorial on filtering empty rows from a DataFrame.