In today’s recipe we’ll learn how to leverage the loc label indexer with boolean selection to subset and filter DataFrames as part of our exploratory data analysis process.
Note: The Dataframe loc method gets both rows and columns parameters. Here’s the syntax: df.loc[rows, columns]. In today’s post we’ll pass the condition into the rows parameter input to subset the DataFrame. In our case, there will be no need to input the columns labels.
Let’s start by creating a sample sales budgets dataset.
# Import Pandas into Python
import pandas as pd
# Now, let's create the Sales budgets dataframe
df = pd.DataFrame({"quarter": [ 1, 2, 3, 4] * 2 ,
"person": ["Debbie", "Kim", "Dorothy", "Tim"] * 2,
"budget": [20000, 30000, 40000, 10000, 15000, 20000, 30000, 45000]})
df.head(10)
Filter DataFrame rows with loc and conditions
- Define the filter statement – this could involve one or more columns with boolean conditions.
- Filter the DataFrame according to the statement you have defined
Using a single condition example:
Let’s assume that we want to look into all sales budgets for the first quarter. We’ll define the filter and use the loc indexer to subset the data.
my_filter = df['quarter'] == 1
df.loc[my_filter]
Will return:
quarter | person | budget | |
---|---|---|---|
0 | 1 | Debbie | 20000 |
4 | 1 | Debbie | 15000 |
Using multiple conditions example:
In the same fashion we can define more complex conditions involving multiple columns / values with boolean indexing.
Example 1:
Let’s assume we would like to look into all budgets over 30K AND relevant to Q3:
mf_1 = df['quarter'] == 3
mf_2 = df['budget'] > 30000
# & represents the Boolean AND operator
df.loc[mf_1 & mf_2]
Will return:
quarter | person | budget | |
---|---|---|---|
2 | 3 | Dorothy | 40000 |
Example 2:
Let’s assume we would like to look into all budgets over 30K OR relevant to Q3:
mf_1 = df['quarter'] == 3
mf_2 = df['budget'] > 30000
# | represents the Boolean OR operator
df.loc[mf_1 | mf_2]
Will return:
quarter | person | budget | |
---|---|---|---|
2 | 3 | Dorothy | 40000 |
6 | 3 | Dorothy | 30000 |
7 | 4 | Tim | 45000 |
Example 3:
Let’s write a more complex condition using isin. We’ll want to select budget rows for Kim and Tim.
mf = df['person'].isin(['Kim', 'Tim'])
df.loc[mf]
Will return:
quarter | person | budget | |
---|---|---|---|
1 | 2 | Kim | 30000 |
3 | 4 | Tim | 10000 |
5 | 2 | Kim | 20000 |
7 | 4 | Tim | 45000 |
In case of follow up questions or feedback – kindly leave me a comment using the comment section below.