How to insert an empty column to a Pandas DataFrame with Python?

Here’s a question we got:

Is there a simple way to add an empty column to a DataFrame in Pandas, or should i be adding the new column in my source comma separated value file (CSV) so that it will be created when i export my file to create the DataFrame?

Inserting empty columns to Pandas DF

In a nutshell, it’s quite easy to append columns (empty or full) into your existing DF. Let’s see a few simple examples.

We’ll use a very simple test DataFrame. Run this code in your Data Analysis environment:

import pandas as pd
# If you are working with your own DataFrame, you can avoid importing NumPy. 
# NumPy to create a random matrix of integers that we'll convert to a Dataframe.
import numpy as np
np.random.seed(42)

# Let's create the dataframe
grades = pd.DataFrame(np.random.randint(70,100, (5,3)), 
columns = ["test1", "test2", "test3"])

grades.head()

Here’s the fictional grades DataFrame we just created:

Now let us assume that we want to add an empty Average column. Let’s call it Avg. Here’s the code you’ll need to run:

grades["Avg"] =""

Here’s our DataFrame now, note the empty column that was added:

grades.head()

We can also populate the new column with the same value accross all the rows. In this example we’ll use NaN (null) values. Note: don’t forget to import numpy first.

grades["Avg"] =np.nan

Filling the empty column

For completeness, here’s how to append a column with calculated values to our DataFrame. In this example we’ll show how we easily calculate the average grade per student.

grades["Avg"] = grades.mean(axis = 1)

Add new columns with lambda

Let’s assume that we want now to calculate the difference between the highest and lowest score for each observation. For that we can use an anonymous lambda function and populate a new column with calculated values as shown below.

grades['max_diff'] = grades.apply(lambda x: (x.max() - x.min()), axis=1)

Note the axis=1 statement above that determines that the minimum calculations will be done across the different columns of each specific row.

We can also use lambda functions and the apply method them in order to create multi columns at the same time.

Add a new column with default value

Another possibility is to populate all the new column rows with the same value.

grades['college'] = 'Michigan Data College'

Insert a column at a specific position

By default the new columns are added at the last position. But Let’s now assume that we want to insert the column into a different position in the DataFrame.

We can use the insert DF method:

grades.insert(loc=0, column='college',value = 'Michigan Data College')

Note that the loc parameter determines the integer value of the column position. In this case we inserted the college column in the first position (loc=0).

collegetest1test2test3max_diff
0Michigan Data College73907417.0
1Michigan Data College8793926.0
2Michigan Data College99787722.0
3Michigan Data College94948113.0
4Michigan Data College80789113.0

Column values from a list

Now let’s assume that we want to populate the new column with values from a list:

students = ['Zayn', 'Harry', 'Liam', 'Louis', 'Niall']
grades['students'] = students
grades.head()
collegetest1test2test3max_diffstudents
0Michigan Data College73907417.0Zayn
1Michigan Data College8793926.0Harry
2Michigan Data College99787722.0Liam
3Michigan Data College94948113.0Louis
4Michigan Data College80789113.0Niall

Removing the new column

You can also delete the empty column with ease. Here’s the Python code for that:

# use the inplace=True parameter only in case you would like to perform 
# a permanent update to your DataFrame
grades.drop("Avg", axis=1, inplace=True)

Feel free to use the comments section below to post any comments.

Leave a Comment