How to insert an empty column to a Pandas DataFrame with Python?

Here’s a question we got:

Is there a simple way to add an empty column to a DataFrame in Pandas, or should i be adding the new column in my source comma separated value file (CSV) so that it will be created when i export my file to create the DataFrame?

Inserting empty columns to Pandas DF

In a nutshell, it’s quite easy to append columns (empty or full) into your existing DF. Let’s see a few simple examples.

We’ll use a very simple test DataFrame. Run this code in your Data Analysis environment:

import pandas as pd
# If you are working with your own DataFrame, you can avoid importing NumPy. 
# NumPy to create a random matrix of integers that we'll convert to a Dataframe.
import numpy as np
np.random.seed(42)

# Let's create the dataframe
grades = pd.DataFrame(np.random.randint(70,100, (5,3)), 
columns = ["test1", "test2", "test3"])

grades.head()

Here’s the fictional grades DataFrame we just created:

Pandas dataframe with empty column added

Now let us assume that we want to add an empty Average column. Let’s call it Avg. Here’s the code you’ll need to run:

grades["Avg"] =""

Here’s our DataFrame now, note the empty column that was added:

grades.head()

We can also populate the new column with the same value accross all the rows. In this example we’ll use NaN (null) values. Note: don’t forget to import numpy first.

grades["Avg"] =np.nan

DataFrame after adding multiple empty columns

Filling the empty column

For completeness, here’s how to append a column with calculated values to our DataFrame. In this example we’ll show how we easily calculate the average grade per student.

grades["Avg"] = grades.mean(axis = 1)

Add new columns with lambda

Let’s assume that we want now to calculate the difference between the highest and lowest score for each observation. For that we can use an anonymous lambda function and populate a new column with calculated values as shown below.

grades['max_diff'] = grades.apply(lambda x: (x.max() - x.min()), axis=1)

Note the axis=1 statement above that determines that the minimum calculations will be done across the different columns of each specific row.

We can also use lambda functions and the apply method them in order to create multi columns at the same time.

Add a new column with default value

Another possibility is to populate all the new column rows with the same value.

grades['college'] = 'Michigan Data College'

Insert a column at a specific position

By default the new columns are added at the last position. But Let’s now assume that we want to insert the column into a different position in the DataFrame.

We can use the insert DF method:

grades.insert(loc=0, column='college',value = 'Michigan Data College')

Note that the loc parameter determines the integer value of the column position. In this case we inserted the college column in the first position (loc=0).

	college	test1	test2	test3	max_diff
0	Michigan Data College	73	90	74	17.0
1	Michigan Data College	87	93	92	6.0
2	Michigan Data College	99	78	77	22.0
3	Michigan Data College	94	94	81	13.0
4	Michigan Data College	80	78	91	13.0

Column values from a list

Now let’s assume that we want to populate the new column with values from a list:

students = ['Zayn', 'Harry', 'Liam', 'Louis', 'Niall']
grades['students'] = students
grades.head()

	college	test1	test2	test3	max_diff	students
0	Michigan Data College	73	90	74	17.0	Zayn
1	Michigan Data College	87	93	92	6.0	Harry
2	Michigan Data College	99	78	77	22.0	Liam
3	Michigan Data College	94	94	81	13.0	Louis
4	Michigan Data College	80	78	91	13.0	Niall

Removing the new column

You can also delete the empty column with ease. Here’s the Python code for that:

# use the inplace=True parameter only in case you would like to perform 
# a permanent update to your DataFrame
grades.drop("Avg", axis=1, inplace=True)

Feel free to use the comments section below to post any comments.