How to sum columns in Pandas dataframes (+ Examples)?

In today’s recipe we’ll touch on the basics of adding numeric values in a pandas DataFrame.

We’ll cover the following cases:

Sum all rows of one or multiple columns
Sum by column name/label into a new column
Adding values by index
Dealing with nan values
Sum values that meet a certain condition

Creating the dataset

We’ll start by creating a simple dataset

# Python3
# import pandas into your Python environment.
import pandas as pd

# Now, let's create the dataframe 
budget = pd.DataFrame({"person": ["John", "Kim", "Bob"],
                        "quarter": [1, 1, 1] ,
                        "consumer_budg": [15000, 35000, 45000],
                         "enterprise_budg": [20000, 30000, 40000] })
budget.head()

	person	quarter	consumer_budg	enterprise_budg
0	John	1	15000	20000
1	Kim	1	35000	30000
2	Bob	1	45000	40000

How to sum a column? (or more)

For a single column we’ll simply use the Series Sum() method.

# one column
budget['consumer_budg'].sum()

95000

Also the DataFrame has a Sum() method, which we’ll use to add multiple columns:

#addingmultiple columns
cols = ['consumer_budg', 'enterprise_budg']
budget[cols].sum()

We’ll receive a Series objects with the results:

consumer_budg      95000
enterprise_budg    90000
dtype: int64

Sum row values into a new column

More interesting is the case that we want to compute the values by adding multiple column values in a specific row. See this simple example below

# using the column label names
budget['total_budget'] = budget['consumer_budg'] + budget['enterprise_budg']

We have created a new column as shown below:

	person	quarter	consumer_budg	enterprise_budg	total_budget
0	John	1	15000	20000	35000
1	Kim	1	35000	30000	65000
2	Bob	1	45000	40000	85000

Note: We could have also used the loc method to subset by label.

Adding columns by index

We can also refer to the columns to sum by index, using the iloc method.

# by index
budget['total_budget'] = budget.iloc[:,2]+ budget.iloc[:,3]

Result will be similar as above

Sum with conditions

In this example, we would like to define a column named high_budget and populate it only if the total_budget is over the 80K threshold.

budget['high_budget'] = budget.query('consumer_budg + enterprise_budg > 80000')['total_budget']

Adding columns with null values

Here we might need a bit of pre-processing to get rid of the null values using fillna().

Let’s quickly create a sample dataset containing null values (see last row).

# with nan
import numpy as np
budget_nan = pd.DataFrame({"person": ["John", "Kim", "Bob", 'Court'],
                        "quarter": [1, 1, 1,1] ,
                        "consumer_budg": [15000, 35000, 45000, 50000],
                         "enterprise_budg": [20000, 30000, 40000, np.nan ] })

	person	quarter	consumer_budg	enterprise_budg	high_budget
0	John	1	15000	20000.0	35000.0
1	Kim	1	35000	30000.0	65000.0
2	Bob	1	45000	40000.0	85000.0
3	Court	1	50000	NaN	NaN

Now lets use the DataFrame fillna() method to mass override the null values with Zeros so that we can sum the column values.

budget_nan.fillna(0, inplace=True)
budget_nan['high_budget'] = budget_nan['consumer_budg'] + budget_nan['enterprise_budg']
budget_nan

Voi’la

	person	quarter	consumer_budg	enterprise_budg	high_budget
0	John	1	15000	20000.0	35000.0
1	Kim	1	35000	30000.0	65000.0
2	Bob	1	45000	40000.0	85000.0
3	Court	1	50000	0.0	50000.0

Pandas: Sum column values (entire or in a row) in Python