In today’s recipe we’ll touch on the basics of adding numeric values in a pandas DataFrame.
We’ll cover the following cases:
- Sum all rows of one or multiple columns
- Sum by column name/label into a new column
- Adding values by index
- Dealing with nan values
- Sum values that meet a certain condition
Creating the dataset
We’ll start by creating a simple dataset
# Python3
# import pandas into your Python environment.
import pandas as pd
# Now, let's create the dataframe
budget = pd.DataFrame({"person": ["John", "Kim", "Bob"],
"quarter": [1, 1, 1] ,
"consumer_budg": [15000, 35000, 45000],
"enterprise_budg": [20000, 30000, 40000] })
budget.head()
| person | quarter | consumer_budg | enterprise_budg | |
|---|---|---|---|---|
| 0 | John | 1 | 15000 | 20000 |
| 1 | Kim | 1 | 35000 | 30000 |
| 2 | Bob | 1 | 45000 | 40000 |
How to sum a column? (or more)
For a single column we’ll simply use the Series Sum() method.
# one column
budget['consumer_budg'].sum()
95000
Also the DataFrame has a Sum() method, which we’ll use to add multiple columns:
#addingmultiple columns
cols = ['consumer_budg', 'enterprise_budg']
budget[cols].sum()
We’ll receive a Series objects with the results:
consumer_budg 95000 enterprise_budg 90000 dtype: int64
Sum row values into a new column
More interesting is the case that we want to compute the values by adding multiple column values in a specific row. See this simple example below
# using the column label names
budget['total_budget'] = budget['consumer_budg'] + budget['enterprise_budg']
We have created a new column as shown below:
| person | quarter | consumer_budg | enterprise_budg | total_budget | |
|---|---|---|---|---|---|
| 0 | John | 1 | 15000 | 20000 | 35000 |
| 1 | Kim | 1 | 35000 | 30000 | 65000 |
| 2 | Bob | 1 | 45000 | 40000 | 85000 |
Note: We could have also used the loc method to subset by label.
Adding columns by index
We can also refer to the columns to sum by index, using the iloc method.
# by index
budget['total_budget'] = budget.iloc[:,2]+ budget.iloc[:,3]
Result will be similar as above
Sum with conditions
In this example, we would like to define a column named high_budget and populate it only if the total_budget is over the 80K threshold.
budget['high_budget'] = budget.query('consumer_budg + enterprise_budg > 80000')['total_budget']
Adding columns with null values
Here we might need a bit of pre-processing to get rid of the null values using fillna().
Let’s quickly create a sample dataset containing null values (see last row).
# with nan
import numpy as np
budget_nan = pd.DataFrame({"person": ["John", "Kim", "Bob", 'Court'],
"quarter": [1, 1, 1,1] ,
"consumer_budg": [15000, 35000, 45000, 50000],
"enterprise_budg": [20000, 30000, 40000, np.nan ] })
| person | quarter | consumer_budg | enterprise_budg | high_budget | |
|---|---|---|---|---|---|
| 0 | John | 1 | 15000 | 20000.0 | 35000.0 |
| 1 | Kim | 1 | 35000 | 30000.0 | 65000.0 |
| 2 | Bob | 1 | 45000 | 40000.0 | 85000.0 |
| 3 | Court | 1 | 50000 | NaN | NaN |
Now lets use the DataFrame fillna() method to mass override the null values with Zeros so that we can sum the column values.
budget_nan.fillna(0, inplace=True)
budget_nan['high_budget'] = budget_nan['consumer_budg'] + budget_nan['enterprise_budg']
budget_nan
Voi’la
| person | quarter | consumer_budg | enterprise_budg | high_budget | |
|---|---|---|---|---|---|
| 0 | John | 1 | 15000 | 20000.0 | 35000.0 |
| 1 | Kim | 1 | 35000 | 30000.0 | 65000.0 |
| 2 | Bob | 1 | 45000 | 40000.0 | 85000.0 |
| 3 | Court | 1 | 50000 | 0.0 | 50000.0 |
Things You Should Know
How does Pandas handle NaN values when summing columns?
The sum() method skips NaN values by default, treating them as zero for summation purposes. A column with values [10, NaN, 30] returns 40, not NaN. Override this with df['col'].sum(skipna=False), which returns NaN if any value is missing — useful when incomplete data should invalidate the total. For cumulative sums, cumsum() also skips NaN by default but preserves the NaN positions in output, so you can trace exactly where gaps exist. When you need to isolate specific rows before summing, techniques for finding and filtering DataFrame rows by value let you build precise subsets that feed directly into your aggregation.
What is the difference between summing across rows versus columns?
Setting axis=0 (the default) sums each column top to bottom, producing one total per column. Setting axis=1 sums each row left to right, producing one total per row — useful for calculating row-level totals like a student’s combined exam score across subjects. Combine row-wise sums with groupby() for segmented totals: group by category first, then sum within each group. For related aggregation workflows where you need distinct counts rather than sums, counting unique values with groupby and nunique follows the same grouping pattern with a different aggregation function.
Can I sum only specific columns in a DataFrame?
Pass a list of column names to select before calling sum(): df[['revenue', 'cost']].sum() returns totals for just those two columns. For conditional sums, combine boolean indexing with column selection — df[df['region'] == 'West'][['revenue']].sum() totals revenue for the West region only. You can also use agg() to apply different functions to different columns simultaneously: df.agg({'revenue': 'sum', 'transactions': 'mean'}) returns the total revenue alongside the average transaction count in a single operation.