How to plot multiple data columns in a DataFrame?

Today’s recipe is dedicated to plotting and visualizing multiple data columns in Pandas. We’ll be using the DataFrame plot method that simplifies basic data visualization without requiring specifically calling the more complex Matplotlib library.

Data acquisition

We’ll be using a simple dataset, which will generate and load into a Pandas DataFrame using the code available in the box below. I suggest that you’ll copy and paste it into your Python editor or notebook if you are interested to follow along.

# Import Pandas
import pandas as pd

# Define data

quarter = [ 3, 1, 4, 2, 1, 4, 4, 4]
city = ['Chicago', 'Boca Raton', 'Miami','Omaha']
actual= [3568, 2367, 2152, 3027, 3695, 2495, 4134, 2162]
target = [1492, 2064, 2180, 1014, 2836, 1064, 2880, 2862]

# load DataFrame
sales_df = pd.DataFrame({"quarter": quarter,
                    "city":city* 2,
                    "actual": actual,
                     "target": target})
sales_df.head()
quartercityactualtarget
03Chicago35681492
11Boca Raton23672064
24Miami21522180
32Omaha30271014
41Chicago36952836

Pandas DataFrame plotting examples

In this section we’ll go through the more prevalent visualization plots for Pandas DataFrames:

  • Bars
  • Stacked Bars
  • Scatter
  • Multiple Lines

Grouping the data

We’ll start by grouping the data using the Groupby method:

# group the data

sales_by_city = sales_df.groupby('city').agg(planned_sales =('target','sum'), actual_sales  =('actual','sum'))

Here’s our data:

planned_salesactual_sales
city
Boca Raton31284862
Chicago43287263
Miami50606286
Omaha38765189

Create Pandas barplots charts

# create a pandas Bar plot
sales_by_city.plot(kind='bar', title= 'Planned vs Actual',cmap='Dark2', figsize=(10,6), rot=30);

Here’s the result:

Note:

  • The figsize parameter receives a tuple representing the size (width and height) of our chart.
  • The title parameter helps to define the chart title (‘Planned vs Actual’).

Stacked bar charts

Adding the parameter stacked=True allows to deliver a nice stacked chart:

# create a stacked plot
sales_by_city.plot(kind='bar', title= 'Planned vs Actual',cmap='Dark2', stacked=True, figsize=(10,6), rot=30);

Horizontal bars

sales_by_city.plot(kind='barh',title= 'Planned vs Actual', cmap='Accent',figsize=(10,6), rot=30);

Line chart (multiple lines)

sales_by_city.plot(kind='line',title= 'Planned vs Actual', style='-o', cmap='Dark2',figsize=(10,6), rot=30);

Note the usage of the Matplotlib style parameter to specify the line formatting:

Pandas scatter with multiple columns

For completeness here’s the code for the scatter chart. Note that it’s required to explicitely define the x and y values.

sales_by_city.plot(kind='scatter',x= 'actual_sales', y= 'planned_sales', title= 'Planned vs Actual',figsize=(10,6));

You might also want to take a look at our tutorial on plotting Pandas datetime timeseries. In case of additional questions, please leave us a comment.

Leave a Comment