How to count unique values in a Pandas Groupby object?

Today’s Python data wrangling recipe is about counting number of non unique entries in a Pandas DataFrame groupby object.

We’ll start by importing the Pandas library and then acquiring our dataset from a comma separated value file into a DataFrame. We’ll use the Pandas read_csv function:

import pandas as pd
# assuming your csv file is placed in your folder named data
data = pd.read_csv('../data/deliveries.csv')

Let’s take a quick look at the DataFrame Header:

data.head()
typetimedayorder_amountdel_tip
0FoodAfternoonTUE324
1FoodNightWED466
2MedicinesMorningTUE410
3MedicinesNoonTUE423
4GroceriesNoonWED103

We would like to first create a Groupby object:

data.groupby('type')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000206A70DA6A0>

As you can see, the object type is a DataFrame Groupby.

Let’s now go ahead and aggregate the unique values of th Groupbyaccroding to the delivery type.

Method 1:

We’ll define a new column named num_orders to hold the unique values (nunique).

data.groupby('type').agg(num_orders=('order_amount','nunique'))

The result will be the following data frame.

num_orders
type
Food3
Gear6
Groceries5
Medicines6

Method 2:

We can also aggregate using a dictionary notation:

data.groupby('type').agg({'order_amount':'nunique'})

Also here, we’ll receive a resulting DataFrame object .

order_amount
type
Food3
Gear6
Groceries5
Medicines6

Method 3

Here we’ll use an alternative grouping notation and get a Series object

data.groupby('type')['order_amount'].nunique()
type
Food         3
Gear         6
Groceries    5
Medicines    6
Name: order_amount, dtype: int64

From here we can use Pandas, Matplotlib or Seaborn to draw a simple bar chart to visualize our results

Practical Tips

When should I use nunique() instead of count() with groupby?

Use nunique() when you need the number of distinct values per group, and count() when you need the total number of non-null entries including duplicates. For example, grouping sales data by region with count() tells you how many transactions occurred, while nunique() on the customer column reveals how many unique buyers exist per region. After computing grouped counts, visualizing the results often clarifies patterns — a guide on plotting grouped data in Pandas covers bar charts and other visualizations that pair naturally with aggregated output.

How do I count unique values across multiple columns in a single groupby?

Pass a dictionary to agg() specifying nunique for each target column: df.groupby('department').agg({'employee_id': 'nunique', 'project': 'nunique'}). This returns one row per department with separate unique counts. You can rename the output columns by chaining .rename(columns={...}) for clarity. If the raw data contains exact duplicates across rows that inflate your counts, identifying and handling duplicate values before grouping ensures your aggregation reflects genuinely distinct entries rather than repeated records.

Does nunique() handle NaN values in grouped data?

By default, nunique() excludes NaN values from the distinct count, which is usually the desired behavior. A group containing values [A, B, NaN, A] returns a unique count of 2, not 3. If you need to include NaN as its own category, fill missing values before grouping with df['col'].fillna('Missing') and then apply groupby().nunique(). This approach keeps the aggregation transparent and avoids silent data loss in your analysis. The same NaN-handling logic applies to value_counts(), which accepts a dropna=False parameter to explicitly count missing entries.

Leave a Comment