How to filter and subset a Python dataframe by specific values?

Quite often as part of your data wrangling, you’ll invariably need to subset or filter your dataset and select for specific rows by one or multiple values. In today’s quick tutorial i’ll show how to do exactly that.

Creating your data

We’ll start by creating a very simple data set which we’ll use for learning purposes. As we typically do, we’ll first go ahead and import the Pandas library to our work environment and then create a dataframe from a dictionary as shown below.

# import Python Pandas
import pandas as pd

# Now, let's create the dataframe 
df = pd.DataFrame({"Year": [ 2018, 2019, 2020] ,
                     "Person": ["John", "Kim", "Bob"],
                     "Budget": [20000, 30000, 40000]})
'Show the dataframe header
df.head()
Year Person Budget
0 2018 John 20000
1 2019 Kim 30000
2 2020 Bob 40000

Find df entries by single value

Let’s assume that we would like to show all entries with budget greater than 30K. We’ll first go ahead and define a variable containing the filtering criteria. Then we’ll apply it on the dataframe and pull the selected records

# single condition
mask = df["Budget"]> 30000
df[mask]

Here’s the result:

Year Person Budget
2 2020 Bob 40000

Select rows by multiple conditions

Next we’ll run through a bit more complex example that has a complex logical select criteria.

# Complex select criteria with boolean AND (&)
mask = (df["Budget"]>= 30000) & (df["Year"] == 2020)
df[mask]
Year Person Budget
1 2019 Kim 30000

We can also use boolean OR as well.

# Using logical boolean OR
df[(df["Budget"]>= 30000) | (df["Year"] == 2020)]

The result will be:

Year Person Budget
1 2019 Kim 30000
2 2020 Bob 40000

Subset with label (Loc)

Here’s a good example on filtering with boolean conditions with loc.

Find rows by index

You can also select specific rows or values in your dataframe by index as shown below. As a simple example, the code below will subset the first two rows according to row index.

df.iloc[[0,1],:]

The following subset will be returned

Year Person Budget
0 2018 John 20000
1 2019 Kim 30000

Bonus: Create a quick bar plot of your data

Besides its very powerful data manipulation capabilities, Pandas has built in data visualization capabilities based on other Python library named MatplotLib. Here’s a simple bar chart that you can create with one line of code, the parameters are pretty much self explanatory.

df.plot(x="Year", y="Budget", kind="bar", title="Budget by year");
Pandas bar chart showing filtered dataframe row values

Frequently Asked Questions

How do I filter a Pandas dataframe to find rows matching multiple conditions?

Combine boolean conditions using the & (and) and | (or) operators, wrapping each condition in parentheses. For example, df[(df['age'] > 30) & (df['city'] == 'Boston')] returns only rows where both criteria are met. You can chain as many conditions as needed this way. The .loc[] accessor works well here too, since it accepts boolean arrays and lets you select specific columns in the same operation. Once you have filtered your subset, you may want to drop unnecessary columns from your filtered dataframe to keep only the fields relevant to your analysis.

What is the difference between loc, iloc, and conditional filtering?

The .loc[] accessor selects rows and columns by label, so you reference actual index values and column names. The .iloc[] accessor selects by integer position, treating the dataframe like a zero-indexed array. Conditional filtering uses boolean masks applied directly to the dataframe, such as df[df['score'] >= 90], and is the most common approach for value-based lookups. A practical workflow involves filtering rows by condition, then using groupby to count unique values across the resulting subset to summarize your findings quickly.

Can I find the first row that matches a specific value without scanning the entire dataframe?

Use df[df['column'] == value].head(1) or df.loc[df['column'] == value].iloc[0] to retrieve the first matching row. For large dataframes, .idxmax() or .idxmin() can locate the index of extreme values efficiently without a full boolean scan. Another approach is df.query("column == @value"), which can be more readable when working with complex expressions and variable references.

Leave a Comment