How to copy a Python Pandas dataframe?

Let’s take a closer look at this question and try to answer with an example:

I would like to clone my existing Pandas dataframe to ensure that when i manipulate the data, the original version stays untouched. Is there a possibility to accomplish that?

Creating the DataFrame

Let’s create a very simple DataFrame

# Import pandas and NumPy
import pandas as pd
import numpy as np
np.random.seed(42)

# Create the dataFrame
grades = pd.DataFrame(np.random.random((5,3)), columns = ["test1", "test2", "test3"])
grades.head()

Copying Pandas DataFrames

Deep DataFrame clone

The Pandas method DataFrame.Copy, allows you to either deep copy (clone) your dataframe data and index. When deep copying, a new dataframe is created. Here’s an example:

Let us clone our grades data.

#Cloning the dataset
grades_deep = grades.copy()
grades_deep.head()

We’ll now go ahead and modify the test1 column.

# Let's make a simple modification to one of the columns 
grades_deep["test1"] = grades_deep["test1"] / 2

#Let's look at the cloned DF after the change
grades_deep.head()

But when we look into the original DataFrame we see that no changes were made. The reason is as we previously mentioned, the Data and indices were created anew; hence they are modified.

# Changes to the deep copied DF won't affect the original dataframe
grades.head()

Note: This works the other way around as well. Changes to the original data also won’t be reflected in the cloned df.

Shallow DataFrame copy

If you would like to ensure that changes to the copied data are also reflected in the original DataFrame, you should make a shallow copy. This allows to maintain reference between both objects.

Here’s the Python syntax you should use:

grades_shallow = grades.copy(deep=False)

Note the copy method parameter deep=False. It keep the original and copied DataFrames connected so changes are reflected in both objects.

Leave a Comment