Pandas: select multiple columns in a dataframe based on value / condition / index

In today’s recipe we’ll learn how you can subset a DataFrame by selecting multiple columns.

Preparations

We’ll first create a simple DataFrame that we’ll use throughout the recipe:

# Import Pandas
import pandas as pd

# Create Dataframe from a dictionary
target = pd.DataFrame({"quarter": [ 1, 2, 3, 4,4,3,2,1],
                     "person": ["Debbie", "Kim", "Dorothy", "Tim"] * 2,
                    "area":['North', 'South', 'West','Midwest']* 2,
                     "budget": [43654, 35795, 20860, 25390, 41575, 31964, 31284, 42118]})
target.head()

Here’s our DataFrame

quarterpersonareabudget
01DebbieNorth43654
12KimSouth35795
23DorothyWest20860
34TimMidwest25390
44DebbieNorth4157

Selecting multiple columns in a Pandas Dataframe

We’ll look into the following cases:

  • Subset by column names / label
  • Subset based on a list of columns
  • Select by column position / index
  • Select by condition / specific value

Select by label

We’ll pass the column names as a list as shown below:

# by name
target[['area', 'budget']]

Using the loc indexer

# using the loc indexer - by label
target.loc[:,['area', 'budget']]

Subset multiple cols by list of names

Pretty similar than above, but for better readibility we’ll define the list as a variable upfront.

# predefined Python list
my_cols = ['area', 'budget']
target[my_cols]

Select columns by index / position

In this case we’ll use the iloc indexer to make selections according to the position of the column which is expressed as an integer.

# using the column index
target.iloc[:,2:4]

All the methods discussed above will render the same result

areabudget
0North43654
1South35795
2West20860
3Midwest25390
4North41575
5South31964
6West31284
7Midwest42118

Select by condition / value

Let’s now assume that we want to subset the df by conditions on the column names. This would be useful if you have a very wide data set and you would like to select only columns matching a specific pattern in their name.

In our example we would like to select our columns which name contains the pattern ‘a’.

target.loc[:,target.columns.str.contains('a')]

If you would like to filter the DataFrame rows by one or more conditions, look into this tutorial.

Leave a Comment