Yesterday we learnt about how we can easily delete rows from a dataframe based on specific conditions, and today, we’ll focus on a similar process just for columns.
In this post we’ll show how to handle a few scenarios:
- Removing one columns based on labels
- Deleting multiple columns based on labels
- Dropping columns base on conditions on cell values.
- Dropping columns based on its name
OK, as always let’s quickly create our very basic test dataframe in the Python data analysis library, Pandas.
import pandas as pd
sales_data = ({"Year": [ 2020, 2019, 2020, 2019, 2019, 2020] ,
"Sales Person": ["John", "Xiao", "Evelyn", "Maria", "Samantha", "Maria"],
"Area": ["HK", "USA", "JAP", "USA", "USA", "USA"],
"Sales Budget": [10000, 12000, 30500, 22000, 38600, 15000]})
sales = pd.DataFrame(data=sales_data)
sales.head(6)
Year | Sales Person | Area | Sales Budget | |
---|---|---|---|---|
0 | 2020 | John | HK | 10000 |
1 | 2019 | Xiao | USA | 12000 |
2 | 2020 | Evelyn | JAP | 30500 |
3 | 2019 | Maria | USA | 22000 |
4 | 2019 | Samantha | USA | 38600 |
5 | 2020 | Maria | USA | 15000 |
Remove dataframe column by label
This is a simple scenario, all we want to do is remove a column by it’s label/title/name. We’ll use the Pandas drop method.
sales.drop(axis=1, labels="Area")
Same result can be accomplished using the columns parameter
sales.drop(columns ="Area")
Delete multiple columns
Also here the procedure is not too complex, we just need to assign a list of the columns we want to remove to the columns parameter.
# Drop multiple columns
sales.drop(columns =["Area", "Sales Person"])
Remove columns based on row values /conditions
This one is a bit more tricky. Let us assume that we would like to remove any column if in any of its rows (observations) the value HK is found.
We’ll first create a list using list comprehensions, that will parse the dataframe and add to the list one or more column names that has the value HK in one of their cells.
collist = [item for item in sales.columns if sales[item].any() == "HK" ]
sales.drop(columns = collist)
The Area colun will be removed:
Year | Sales Person | Sales Budget | |
---|---|---|---|
0 | 2020 | John | 10000 |
1 | 2019 | Xiao | 12000 |
2 | 2020 | Evelyn | 30500 |
3 | 2019 | Maria | 22000 |
4 | 2019 | Samantha | 38600 |
5 | 2020 | Maria | 15000 |
Delete columns based on column names
Quite similarly, we can use a list comprehension to drop columns based on attributes. In this example we’ll remove columns which name starts with Sales.
collist = [item for item in sales.columns if item.startswith("Sales")]
sales.drop(columns = collist)
And the result:
Year | Area | |
---|---|---|
0 | 2020 | HK |
1 | 2019 | USA |
2 | 2020 | JAP |
3 | 2019 | USA |
4 | 2019 | USA |
5 | 2020 | USA |
Note: if you would like to persist your changes to the datafram don’t forget to use the switch inplace=True in your drop method:
sales.drop(columns ="Area", inplace=True)