How to delete columns from your Pandas dataframe?

Yesterday we learnt about how we can easily delete rows from a dataframe based on specific conditions, and today, we’ll focus on a similar process just for columns.

In this post we’ll show how to handle a few scenarios:

  1. Removing one columns based on labels
  2. Deleting multiple columns based on labels
  3. Dropping columns base on conditions on cell values.
  4. Dropping columns based on its name

OK, as always let’s quickly create our very basic test dataframe in the Python data analysis library, Pandas.

import pandas as pd

sales_data = ({"Year": [ 2020, 2019, 2020, 2019, 2019, 2020] ,
         "Sales Person": ["John", "Xiao", "Evelyn", "Maria", "Samantha", "Maria"],
         "Area": ["HK", "USA", "JAP", "USA", "USA", "USA"],
         "Sales Budget": [10000, 12000, 30500, 22000, 38600, 15000]})

sales = pd.DataFrame(data=sales_data)

sales.head(6)

YearSales PersonAreaSales Budget
02020JohnHK10000
12019XiaoUSA12000
22020EvelynJAP30500
32019MariaUSA22000
42019SamanthaUSA38600
52020MariaUSA15000

Remove dataframe column by label

This is a simple scenario, all we want to do is remove a column by it’s label/title/name. We’ll use the Pandas drop method.

sales.drop(axis=1, labels="Area")

Same result can be accomplished using the columns parameter


sales.drop(columns ="Area")

Delete multiple columns

Also here the procedure is not too complex, we just need to assign a list of the columns we want to remove to the columns parameter.

# Drop multiple columns
sales.drop(columns =["Area", "Sales Person"])

Remove columns based on row values /conditions

This one is a bit more tricky. Let us assume that we would like to remove any column if in any of its rows (observations) the value HK is found.

We’ll first create a list using list comprehensions, that will parse the dataframe and add to the list one or more column names that has the value HK in one of their cells.

collist = [item for item in sales.columns if sales[item].any() == "HK" ]
sales.drop(columns = collist)

The Area colun will be removed:

YearSales PersonSales Budget
02020John10000
12019Xiao12000
22020Evelyn30500
32019Maria22000
42019Samantha38600
52020Maria15000

Delete columns based on column names

Quite similarly, we can use a list comprehension to drop columns based on attributes. In this example we’ll remove columns which name starts with Sales.

collist = [item for item in sales.columns if item.startswith("Sales")]
 sales.drop(columns = collist)

And the result:

YearArea
02020HK
12019USA
22020JAP
32019USA
42019USA
52020USA

Note: if you would like to persist your changes to the datafram don’t forget to use the switch inplace=True in your drop method:

sales.drop(columns ="Area", inplace=True)

Leave a Comment