How to display notnull rows and columns in a Python dataframe?

When doing data wrangling, one of the common tasks you might have is to deal with empty values. Those typically show up as NaN in your pandas DataFrame. There are several ways to deal with NaN values, such as dropping them altogether or filled them with an aggregated value. Today’s tutorial provides the basic tools for filtering and selecting columns and rows that don’t have any empty values.

As always we’ll first create a simple DataFrame in Python Pandas:

#import libraries
import pandas as pd
import numpy as np

#Define a dictionary with some smaple data
data = ({"Year": [ 2018, 2017, 2016, 2019, 2020, 2020] ,
         "Quarter": [ np.nan, "Q1", np.nan, "Q2", "Q3", "Q4"],
         "Sales": [20000, 30000, 40000, 35000, 16000, 16000]})

# Create a dataframe from a dictionary
data = pd.DataFrame.from_dict(data)

#look into the data
data.head(6)
YearQuarterSales
02018NaN20000
12017Q130000
22016NaN40000
32019Q235000
42020Q316000
52020Q416000

Find number of non-empty entries

As the DataFrame is rather simple, it’s pretty easy to see that the Quarter columns have 2 empty (NaN) values.

That said, let’s use the info() method for DataFrames to take a closer look at the DataFrame columns information:

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Year     6 non-null      int64 
 1   Quarter  4 non-null      object
 2   Sales    6 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 272.0+ bytes

We clearly see that the Quarter column has 4 non-nulls. We could have found that in this following way as well:

data["Quarter"].notnull()

This will return the following mask:

0    False
1     True
2    False
3     True
4     True
5     True
Name: Quarter, dtype: bool

Then we can apply the sum() method.

data["Quarter"].notnull().sum()

Which returns the value 4.

Displaying rows without null values

If we want just to select rows with no NaN value, then the easiest way to do that is use the DataFrame dropna() method. This removes any empty values from the dataset.

data.dropna()

Here’s the result

YearQuarterSales
12017Q130000
32019Q235000
42020Q316000
52020Q416000

Note: If you want to persist the changes to the dataset, you should use the inplace parameter.

# create a new DataFrame with no empty values.
new_data = data.dropna(inplace=True)

An alternative (and less elegant) way to remove the empty entries is by using the mask we defined in the previous section:

data[data["Quarter"].notnull()]

Select columns with NaNs

This is also easily accomplished with the dropna() method, as shown below:

data.dropna(axis=1)

The entire Quarter column is removed from the DataFrame


Year
Sales
0201820000
1201730000
2201640000
3201935000
4202016000
5202016000

As indicated above, use the inplace switch with dropna() to persist your changes.

Plotting the data

After removing the non empty values, we can visualize the data with a simple bi-variate bar chart. Simple visualization can be accomplished in Pandas without using the Matplotlib or Seaborn libraries.

data.dropna().groupby("Year")["Sales"].sum().plot(x = "Year", y= "Sales",kind="bar") 

Leave a Comment