Today’s Python data wrangling recipe is about counting number of non unique entries in a Pandas DataFrame groupby object.
We’ll start by importing the Pandas library and then acquiring our dataset from a comma separated value file into a DataFrame. We’ll use the Pandas read_csv function:
import pandas as pd
# assuming your csv file is placed in your folder named data
data = pd.read_csv('../data/deliveries.csv')
Let’s take a quick look at the DataFrame Header:
data.head()
type | time | day | order_amount | del_tip | |
---|---|---|---|---|---|
0 | Food | Afternoon | TUE | 32 | 4 |
1 | Food | Night | WED | 46 | 6 |
2 | Medicines | Morning | TUE | 41 | 0 |
3 | Medicines | Noon | TUE | 42 | 3 |
4 | Groceries | Noon | WED | 10 | 3 |
We would like to first create a Groupby object:
data.groupby('type')
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000206A70DA6A0>
As you can see, the object type is a DataFrame Groupby.
Let’s now go ahead and aggregate the unique values of th Groupbyaccroding to the delivery type.
Method 1:
We’ll define a new column named num_orders to hold the unique values (nunique).
data.groupby('type').agg(num_orders=('order_amount','nunique'))
The result will be the following data frame.
num_orders | |
---|---|
type | |
Food | 3 |
Gear | 6 |
Groceries | 5 |
Medicines | 6 |
Method 2:
We can also aggregate using a dictionary notation:
data.groupby('type').agg({'order_amount':'nunique'})
Also here, we’ll receive a resulting DataFrame object .
order_amount | |
---|---|
type | |
Food | 3 |
Gear | 6 |
Groceries | 5 |
Medicines | 6 |
Method 3
Here we’ll use an alternative grouping notation and get a Series object
data.groupby('type')['order_amount'].nunique()
type Food 3 Gear 6 Groceries 5 Medicines 6 Name: order_amount, dtype: int64
From here we can use Pandas, Matplotlib or Seaborn to draw a simple bar chart to visualize our results