How to delete rows based on conditions from a Pandas dataframe?

When working with spreadsheets and tabular data, being imported from CSV files or database table, you might need to clean up rows from your Pandas dataframe based on a row condition.

This short tutorial provides a few Python snippets that you’ll be able to use in the following situations:

  1. You would like to delete a row that contains one or more column values
  2. You would like to remove a row based on its index
    • Here will specifically look into dropping your first and last dataframe rows.

As always, we’ll create our example Pandas dataframe first. Will do it by passing a dictionary with column value to the pd.DataFrame method.

import pandas as pd

data = ({"Year": [ 2020, 2019, 2020, 2019, 2019, 2020] ,
         "Sales Person": ["John", "Xiao", "Evelyn", "Maria", "Samantha", "Maria"],
         "Sales Budget": [20000, 32000, 30000, 32000, 54000, 25000]})

df = pd.DataFrame(data=data)

Let’s take a look at the data:

df.head(6)
YearSales PersonSales Budget
02020John20000
12019Xiao32000
22020Evelyn30000
32019Maria32000
42019Samantha54000
52020Maria25000

Drop rows based on value or condition

Let’s assume that we want to filter the dataframe based on the Sales Budget. We’ll go ahead and first remove all rows with Sales budget greater or equal to 30K. The easiest way to do that is to create a new dataframe which represents a subset of the original dataframe according to the required values/conditions.

Let’s do that step by step.

df["Sales Budget"]>30000

We get the vector of booleans

0    False
1     True
2    False
3     True
4     True
5    False
Name: Sales Budget, dtype: bool

Now we’ll use the vector in order to filter the dataframe as shown below:

subset = df[(df["Sales Budget"]>30000)]
subset.head()

Here’s the result:

YearSales PersonSales Budget
12019Xiao32000
32019Maria32000
42019Samantha54000

There is an alternative way to accomplish that: we can use the unary operand (~) to negate a complete statement in Python. Our code could also look as following and produce the same result:

subset = df[~(df["Sales Budget"]<=30000)]
subset.head()

Filter rows according to a complex condition

Let’s assume that we want to filter the rows that has a budget > 30K AND are assigned to Maria.

# the "&" sign represents a boolean AND
subset_AND = df[~(df["Sales Budget"]<=30000) & (df["Sales Person"] =="Maria") ]
subset_AND.head()
YearSales PersonSales Budget
32019Maria32000

Now we’ll that we want to filter the rows that has a budget > 30K OR are assigned to Maria.

# the "|" sign represents a boolean OR
subset_OR = df[~(df["Sales Budget"]<=30000) | (df["Sales Person"] =="Maria") ]
subset_OR.head()
YearSales PersonSales Budget
12019Xiao32000
32019Maria32000
42019Samantha54000
52020Maria25000

Drop a row by index

Let’s assume we want to drop the observation with index 1 and 3. We’ll just go ahead and pass a list of index numbers to the Pandas drop method.

df.drop(index=[1,3])
YearSales PersonSales Budget
02020John20000
22020Evelyn30000
42019Samantha54000
52020Maria25000

Persisting your changes

If you are interested in removing the rows and persist the change in your dataframe, instead or creating a new one, use the inplace parameter as shown below.

df.drop(index=[1,3], inplace=True)

Removing the first row

Index [0] represents the first row in your dataframe, so we’ll pass it to the drop method.

df.drop(index=[0])

Deleting the first couple of rows

df.drop(index = df.index[0:2])

Removing the last row

df.drop(len(df)-1)

len(<dataframe>) returns the dataframe number of rows; we’ll substruct 1 as the index of the first row is 0.

Leave a Comment