Pandas: Sum column values (entire or in a row) in Python

In today’s recipe we’ll touch on the basics of adding numeric values in a pandas DataFrame.

We’ll cover the following cases:

  • Sum all rows of one or multiple columns
  • Sum by column name/label into a new column
  • Adding values by index
  • Dealing with nan values
  • Sum values that meet a certain condition

Creating the dataset

We’ll start by creating a simple dataset

# Python3
# import pandas into your Python environment.
import pandas as pd

# Now, let's create the dataframe 
budget = pd.DataFrame({"person": ["John", "Kim", "Bob"],
                        "quarter": [1, 1, 1] ,
                        "consumer_budg": [15000, 35000, 45000],
                         "enterprise_budg": [20000, 30000, 40000] })
budget.head()
personquarterconsumer_budgenterprise_budg
0John11500020000
1Kim13500030000
2Bob14500040000

How to sum a column? (or more)

For a single column we’ll simply use the Series Sum() method.

# one column
budget['consumer_budg'].sum()

95000

Also the DataFrame has a Sum() method, which we’ll use to add multiple columns:

#addingmultiple columns
cols = ['consumer_budg', 'enterprise_budg']
budget[cols].sum()

We’ll receive a Series objects with the results:

consumer_budg      95000
enterprise_budg    90000
dtype: int64

Sum row values into a new column

More interesting is the case that we want to compute the values by adding multiple column values in a specific row. See this simple example below

# using the column label names
budget['total_budget'] = budget['consumer_budg'] + budget['enterprise_budg']

We have created a new column as shown below:

personquarterconsumer_budgenterprise_budgtotal_budget
0John1150002000035000
1Kim1350003000065000
2Bob1450004000085000

Note: We could have also used the loc method to subset by label.

Adding columns by index

We can also refer to the columns to sum by index, using the iloc method.

# by index
budget['total_budget'] = budget.iloc[:,2]+ budget.iloc[:,3]

Result will be similar as above

Sum with conditions

In this example, we would like to define a column named high_budget and populate it only if the total_budget is over the 80K threshold.

budget['high_budget'] = budget.query('consumer_budg + enterprise_budg > 80000')['total_budget']

Adding columns with null values

Here we might need a bit of pre-processing to get rid of the null values using fillna().

Let’s quickly create a sample dataset containing null values (see last row).

# with nan
import numpy as np
budget_nan = pd.DataFrame({"person": ["John", "Kim", "Bob", 'Court'],
                        "quarter": [1, 1, 1,1] ,
                        "consumer_budg": [15000, 35000, 45000, 50000],
                         "enterprise_budg": [20000, 30000, 40000, np.nan ] })
personquarterconsumer_budgenterprise_budghigh_budget
0John11500020000.035000.0
1Kim13500030000.065000.0
2Bob14500040000.085000.0
3Court150000NaNNaN

Now lets use the DataFrame fillna() method to mass override the null values with Zeros so that we can sum the column values.

budget_nan.fillna(0, inplace=True)
budget_nan['high_budget'] = budget_nan['consumer_budg'] + budget_nan['enterprise_budg']
budget_nan

Voi’la

personquarterconsumer_budgenterprise_budghigh_budget
0John11500020000.035000.0
1Kim13500030000.065000.0
2Bob14500040000.085000.0
3Court1500000.050000.0

Leave a Comment