Here’s a question we got:
Is there a simple way to add an empty column to a DataFrame in Pandas, or should i be adding the new column in my source comma separated value file (CSV) so that it will be created when i export my file to create the DataFrame?
Inserting empty columns to Pandas DF
In a nutshell, it’s quite easy to append columns (empty or full) into your existing DF. Let’s see a few simple examples.
We’ll use a very simple test DataFrame. Run this code in your Data Analysis environment:
import pandas as pd
# If you are working with your own DataFrame, you can avoid importing NumPy.
# NumPy to create a random matrix of integers that we'll convert to a Dataframe.
import numpy as np
np.random.seed(42)
# Let's create the dataframe
grades = pd.DataFrame(np.random.randint(70,100, (5,3)),
columns = ["test1", "test2", "test3"])
grades.head()
Here’s the fictional grades DataFrame we just created:
Now let us assume that we want to add an empty Average column. Let’s call it Avg. Here’s the code you’ll need to run:
grades["Avg"] =""
Here’s our DataFrame now, note the empty column that was added:
grades.head()
We can also populate the new column with the same value accross all the rows. In this example we’ll use NaN (null) values. Note: don’t forget to import numpy first.
grades["Avg"] =np.nan
Filling the empty column
For completeness, here’s how to append a column with calculated values to our DataFrame. In this example we’ll show how we easily calculate the average grade per student.
grades["Avg"] = grades.mean(axis = 1)
Add new columns with lambda
Let’s assume that we want now to calculate the difference between the highest and lowest score for each observation. For that we can use an anonymous lambda function and populate a new column with calculated values as shown below.
grades['max_diff'] = grades.apply(lambda x: (x.max() - x.min()), axis=1)
Note the axis=1 statement above that determines that the minimum calculations will be done across the different columns of each specific row.
We can also use lambda functions and the apply method them in order to create multi columns at the same time.
Add a new column with default value
Another possibility is to populate all the new column rows with the same value.
grades['college'] = 'Michigan Data College'
Insert a column at a specific position
By default the new columns are added at the last position. But Let’s now assume that we want to insert the column into a different position in the DataFrame.
We can use the insert DF method:
grades.insert(loc=0, column='college',value = 'Michigan Data College')
Note that the loc parameter determines the integer value of the column position. In this case we inserted the college column in the first position (loc=0).
college | test1 | test2 | test3 | max_diff | |
---|---|---|---|---|---|
0 | Michigan Data College | 73 | 90 | 74 | 17.0 |
1 | Michigan Data College | 87 | 93 | 92 | 6.0 |
2 | Michigan Data College | 99 | 78 | 77 | 22.0 |
3 | Michigan Data College | 94 | 94 | 81 | 13.0 |
4 | Michigan Data College | 80 | 78 | 91 | 13.0 |
Column values from a list
Now let’s assume that we want to populate the new column with values from a list:
students = ['Zayn', 'Harry', 'Liam', 'Louis', 'Niall']
grades['students'] = students
grades.head()
college | test1 | test2 | test3 | max_diff | students | |
---|---|---|---|---|---|---|
0 | Michigan Data College | 73 | 90 | 74 | 17.0 | Zayn |
1 | Michigan Data College | 87 | 93 | 92 | 6.0 | Harry |
2 | Michigan Data College | 99 | 78 | 77 | 22.0 | Liam |
3 | Michigan Data College | 94 | 94 | 81 | 13.0 | Louis |
4 | Michigan Data College | 80 | 78 | 91 | 13.0 | Niall |
Removing the new column
You can also delete the empty column with ease. Here’s the Python code for that:
# use the inplace=True parameter only in case you would like to perform
# a permanent update to your DataFrame
grades.drop("Avg", axis=1, inplace=True)
Feel free to use the comments section below to post any comments.