How to drop columns from your dataframe based on conditions?

Yesterday we learnt about how we can easily delete rows from a dataframe based on specific conditions, and today, we’ll focus on a similar process just for columns.

In this post we’ll show how to handle a few scenarios:

Removing one columns based on labels
Deleting multiple columns based on labels
Dropping columns base on conditions on cell values.
Dropping columns based on its name

OK, as always let’s quickly create our very basic test dataframe in the Python data analysis library, Pandas.

import pandas as pd

sales_data = ({"Year": [ 2020, 2019, 2020, 2019, 2019, 2020] ,
         "Sales Person": ["John", "Xiao", "Evelyn", "Maria", "Samantha", "Maria"],
         "Area": ["HK", "USA", "JAP", "USA", "USA", "USA"],
         "Sales Budget": [10000, 12000, 30500, 22000, 38600, 15000]})

sales = pd.DataFrame(data=sales_data)

sales.head(6)

	Year	Sales Person	Area	Sales Budget
0	2020	John	HK	10000
1	2019	Xiao	USA	12000
2	2020	Evelyn	JAP	30500
3	2019	Maria	USA	22000
4	2019	Samantha	USA	38600
5	2020	Maria	USA	15000

Remove dataframe column by label

This is a simple scenario, all we want to do is remove a column by it’s label/title/name. We’ll use the Pandas drop method.

sales.drop(axis=1, labels="Area")

Same result can be accomplished using the columns parameter


sales.drop(columns ="Area")

Delete multiple columns

Also here the procedure is not too complex, we just need to assign a list of the columns we want to remove to the columns parameter.

# Drop multiple columns
sales.drop(columns =["Area", "Sales Person"])

Remove columns based on row values /conditions

This one is a bit more tricky. Let us assume that we would like to remove any column if in any of its rows (observations) the value HK is found.

We’ll first create a list using list comprehensions, that will parse the dataframe and add to the list one or more column names that has the value HK in one of their cells.

collist = [item for item in sales.columns if sales[item].any() == "HK" ]
sales.drop(columns = collist)

The Area colun will be removed:

	Year	Sales Person	Sales Budget
0	2020	John	10000
1	2019	Xiao	12000
2	2020	Evelyn	30500
3	2019	Maria	22000
4	2019	Samantha	38600
5	2020	Maria	15000

Delete columns based on column names

Quite similarly, we can use a list comprehension to drop columns based on attributes. In this example we’ll remove columns which name starts with Sales.

collist = [item for item in sales.columns if item.startswith("Sales")]
 sales.drop(columns = collist)

And the result:

	Year	Area
0	2020	HK
1	2019	USA
2	2020	JAP
3	2019	USA
4	2019	USA
5	2020	USA

Note: if you would like to persist your changes to the datafram don’t forget to use the switch inplace=True in your drop method:

sales.drop(columns ="Area", inplace=True)

How to delete columns from your Pandas dataframe?

Remove dataframe column by label

Delete multiple columns

Remove columns based on row values /conditions

Delete columns based on column names

Leave a Comment Cancel reply