How to count unique values in a Pandas Groupby object?

Today’s Python data wrangling recipe is about counting number of non unique entries in a Pandas DataFrame groupby object.

We’ll start by importing the Pandas library and then acquiring our dataset from a comma separated value file into a DataFrame. We’ll use the Pandas read_csv function:

import pandas as pd
# assuming your csv file is placed in your folder named data
data = pd.read_csv('../data/deliveries.csv')

Let’s take a quick look at the DataFrame Header:

data.head()
typetimedayorder_amountdel_tip
0FoodAfternoonTUE324
1FoodNightWED466
2MedicinesMorningTUE410
3MedicinesNoonTUE423
4GroceriesNoonWED103

We would like to first create a Groupby object:

data.groupby('type')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000206A70DA6A0>

As you can see, the object type is a DataFrame Groupby.

Let’s now go ahead and aggregate the unique values of th Groupbyaccroding to the delivery type.

Method 1:

We’ll define a new column named num_orders to hold the unique values (nunique).

data.groupby('type').agg(num_orders=('order_amount','nunique'))

The result will be the following data frame.

num_orders
type
Food3
Gear6
Groceries5
Medicines6

Method 2:

We can also aggregate using a dictionary notation:

data.groupby('type').agg({'order_amount':'nunique'})

Also here, we’ll receive a resulting DataFrame object .

order_amount
type
Food3
Gear6
Groceries5
Medicines6

Method 3

Here we’ll use an alternative grouping notation and get a Series object

data.groupby('type')['order_amount'].nunique()
type
Food         3
Gear         6
Groceries    5
Medicines    6
Name: order_amount, dtype: int64

From here we can use Pandas, Matplotlib or Seaborn to draw a simple bar chart to visualize our results

Leave a Comment