Pandas: Sum column values (entire or in a row) in Python

In today’s recipe we’ll touch on the basics of adding numeric values in a pandas DataFrame.

We’ll cover the following cases:

  • Sum all rows of one or multiple columns
  • Sum by column name/label into a new column
  • Adding values by index
  • Dealing with nan values
  • Sum values that meet a certain condition

Creating the dataset

We’ll start by creating a simple dataset

# Python3
# import pandas into your Python environment.
import pandas as pd

# Now, let's create the dataframe 
budget = pd.DataFrame({"person": ["John", "Kim", "Bob"],
                        "quarter": [1, 1, 1] ,
                        "consumer_budg": [15000, 35000, 45000],
                         "enterprise_budg": [20000, 30000, 40000] })
budget.head()
personquarterconsumer_budgenterprise_budg
0John11500020000
1Kim13500030000
2Bob14500040000

How to sum a column? (or more)

For a single column we’ll simply use the Series Sum() method.

# one column
budget['consumer_budg'].sum()

95000

Also the DataFrame has a Sum() method, which we’ll use to add multiple columns:

#addingmultiple columns
cols = ['consumer_budg', 'enterprise_budg']
budget[cols].sum()

We’ll receive a Series objects with the results:

consumer_budg      95000
enterprise_budg    90000
dtype: int64

Sum row values into a new column

More interesting is the case that we want to compute the values by adding multiple column values in a specific row. See this simple example below

# using the column label names
budget['total_budget'] = budget['consumer_budg'] + budget['enterprise_budg']

We have created a new column as shown below:

personquarterconsumer_budgenterprise_budgtotal_budget
0John1150002000035000
1Kim1350003000065000
2Bob1450004000085000

Note: We could have also used the loc method to subset by label.

Adding columns by index

We can also refer to the columns to sum by index, using the iloc method.

# by index
budget['total_budget'] = budget.iloc[:,2]+ budget.iloc[:,3]

Result will be similar as above

Sum with conditions

In this example, we would like to define a column named high_budget and populate it only if the total_budget is over the 80K threshold.

budget['high_budget'] = budget.query('consumer_budg + enterprise_budg > 80000')['total_budget']

Adding columns with null values

Here we might need a bit of pre-processing to get rid of the null values using fillna().

Let’s quickly create a sample dataset containing null values (see last row).

# with nan
import numpy as np
budget_nan = pd.DataFrame({"person": ["John", "Kim", "Bob", 'Court'],
                        "quarter": [1, 1, 1,1] ,
                        "consumer_budg": [15000, 35000, 45000, 50000],
                         "enterprise_budg": [20000, 30000, 40000, np.nan ] })
personquarterconsumer_budgenterprise_budghigh_budget
0John11500020000.035000.0
1Kim13500030000.065000.0
2Bob14500040000.085000.0
3Court150000NaNNaN

Now lets use the DataFrame fillna() method to mass override the null values with Zeros so that we can sum the column values.

budget_nan.fillna(0, inplace=True)
budget_nan['high_budget'] = budget_nan['consumer_budg'] + budget_nan['enterprise_budg']
budget_nan

Voi’la

personquarterconsumer_budgenterprise_budghigh_budget
0John11500020000.035000.0
1Kim13500030000.065000.0
2Bob14500040000.085000.0
3Court1500000.050000.0

Things You Should Know

How does Pandas handle NaN values when summing columns?

The sum() method skips NaN values by default, treating them as zero for summation purposes. A column with values [10, NaN, 30] returns 40, not NaN. Override this with df['col'].sum(skipna=False), which returns NaN if any value is missing — useful when incomplete data should invalidate the total. For cumulative sums, cumsum() also skips NaN by default but preserves the NaN positions in output, so you can trace exactly where gaps exist. When you need to isolate specific rows before summing, techniques for finding and filtering DataFrame rows by value let you build precise subsets that feed directly into your aggregation.

What is the difference between summing across rows versus columns?

Setting axis=0 (the default) sums each column top to bottom, producing one total per column. Setting axis=1 sums each row left to right, producing one total per row — useful for calculating row-level totals like a student’s combined exam score across subjects. Combine row-wise sums with groupby() for segmented totals: group by category first, then sum within each group. For related aggregation workflows where you need distinct counts rather than sums, counting unique values with groupby and nunique follows the same grouping pattern with a different aggregation function.

Can I sum only specific columns in a DataFrame?

Pass a list of column names to select before calling sum(): df[['revenue', 'cost']].sum() returns totals for just those two columns. For conditional sums, combine boolean indexing with column selection — df[df['region'] == 'West'][['revenue']].sum() totals revenue for the West region only. You can also use agg() to apply different functions to different columns simultaneously: df.agg({'revenue': 'sum', 'transactions': 'mean'}) returns the total revenue alongside the average transaction count in a single operation.

Leave a Comment