Filter Pandas dataframes using loc and multiple columns / values / conditions

In today’s recipe we’ll learn how to quickly combine between using the loc label indexer with boolean selection to subset and filter DataFrames as part of our exploratory data analysis process.

Note: The Dataframe loc method gets both rows and columns parameters. Here’s the syntax: df.loc[rows, columns]. In today’s post we’ll pass the condition into the rows parameter input to subset the DataFrame. There is no need to input the columns labels.

Let’s start by creating a sample sales budgets dataset.

# Import Pandas into Python
import pandas as pd

# Now, let's create the Sales budgets dataframe 
df = pd.DataFrame({"quarter": [ 1, 2, 3, 4] * 2 ,
                     "person": ["Debbie", "Kim", "Dorothy", "Tim"] * 2,
                     "budget": [20000, 30000, 40000, 10000, 15000, 20000, 30000, 45000]})
df.head(10)

Filter DataFrame rows with loc and conditions

  1. Define the filter statement – this could involve one or more columns with boolean conditions.
  2. Filter the DataFrame according to the statement you have defined

Using a single condition example:

Let’s assume that we want to look into all sales budgets for the first quarter. We’ll define the filter and use the loc indexer to subset the data.

my_filter = df['quarter'] == 1
df.loc[my_filter]

Will return:

quarterpersonbudget
01Debbie20000
41Debbie15000

Using multiple conditions example:

In the same fashion we can define more complex conditions involving multiple columns / values with boolean indexing.

Example 1:

Let’s assume we would like to look into all budgets over 30K AND relevant to Q3:

mf_1 = df['quarter'] == 3
mf_2 = df['budget'] > 30000

# & represents the Boolean AND operator
df.loc[mf_1 & mf_2]

Will return:

quarterpersonbudget
23Dorothy40000

Example 2:

Let’s assume we would like to look into all budgets over 30K OR relevant to Q3:

mf_1 = df['quarter'] == 3
mf_2 = df['budget'] > 30000

# | represents the Boolean OR operator
df.loc[mf_1 | mf_2]

Will return:

quarterpersonbudget
23Dorothy40000
63Dorothy30000
74Tim45000

Example 3:

Let’s write a more complex condition using isin. We’ll want to select budget rows for Kim and Tim.

mf = df['person'].isin(['Kim', 'Tim'])

df.loc[mf]

Will return:

  quarter person budget
1 2 Kim 30000
3 4 Tim 10000
5 2 Kim 20000
7 4 Tim 45000

Follow up questions or feedback – please leave me a comment.

Leave a Comment