Let’s take a closer look at this question and try to answer with an example:
I would like to clone my existing Pandas dataframe to ensure that when i manipulate the data, the original version stays untouched. Is there a possibility to accomplish that?
Creating the DataFrame
Let’s create a very simple DataFrame
# Import pandas and NumPy import pandas as pd import numpy as np np.random.seed(42) # Create the dataFrame grades = pd.DataFrame(np.random.random((5,3)), columns = ["test1", "test2", "test3"]) grades.head()
Copying Pandas DataFrames
Deep DataFrame clone
The Pandas method DataFrame.Copy, allows you to either deep copy (clone) your dataframe data and index. When deep copying, a new dataframe is created. Here’s an example:
Let us clone our grades data.
#Cloning the dataset grades_deep = grades.copy() grades_deep.head()
We’ll now go ahead and modify the test1 column.
# Let's make a simple modification to one of the columns grades_deep["test1"] = grades_deep["test1"] / 2 #Let's look at the cloned DF after the change grades_deep.head()
But when we look into the original DataFrame we see that no changes were made. The reason is as we previously mentioned, the Data and indices were created anew; hence they are modified.
# Changes to the deep copied DF won't affect the original dataframe grades.head()
Note: This works the other way around as well. Changes to the original data also won’t be reflected in the cloned df.
Shallow DataFrame copy
If you would like to ensure that changes to the copied data are also reflected in the original DataFrame, you should make a shallow copy. This allows to maintain reference between both objects.
Here’s the Python syntax you should use:
grades_shallow = grades.copy(deep=False)
Note the copy method parameter deep=False. It keep the original and copied DataFrames connected so changes are reflected in both objects.