In today’s recipe i would like to expand on different methods for replacing values in a Pandas series. In this tutorial we’ll do extensive usage of the Series replace method, that will prove very useful to quickly manipulate our data.
Replacing data in a Python series
We’ll touch on several cases:
- Change Series values
- Replacing values by index
- Changing null/Nan occurrences
- Changing multiple occurrences of the same value
- Changing multiple values
- Using regex to replace partial values
- Replace by conditions
Creating the dataset
We’ll first import Pandas and Numpy and create our sample series.
#Python3
import pandas as pd, numpy as np
names_list = ['John', 'Dorothy', np.nan, 'Eva', 'Harry', 'Liam']
names = pd.Series(names_list)
names.head()
Here’s our series:
0 John 1 Dorothy 2 NaN 3 Eva 4 Harry dtype: object
Replacing series data by value
We’ll do a simple replace:
names.replace(to_replace='John', value= 'Paul')
0 Paul 1 Dorothy 2 NaN 3 Eva 4 Harry 5 Liam dtype: object
Change series values by index
# find and modify value by index
names.replace(to_replace=names[0], value = 'Paul')
Important note: Use the Inplace=True parameter in order to persist your modifications to the series:
# un-comment to persist your changes
#names.replace(to_replace='John', value= 'Paul', inplace= True)
Modify nan values
# nan values
names.replace(to_replace=np.nan, value = 'new')
0 John 1 Dorothy 2 new 3 Eva 4 Harry 5 Liam dtype: object
Dealing with multiple values
Let’s now create a bit longer Series, containing duplicated values, then make multiple replacements as needed.
#multiple values
dup_names_list = ['John', 'Dorothy', np.nan, 'Berry', 'Harry', 'John','Derryll', 'John']
dup_names = pd.Series(dup_names_list)
# using forward fill - modified values highlighted below
dup_names.replace(to_replace='John', method='ffill')
0 John 1 Dorothy 2 NaN 3 Berry 4 Harry 5 Harry 6 Derryll 7 Derryll dtype: object
Using regex
In this quick snippet we’ll make a partial replacement of a Series value. Say that we want to change all occurrences of the string ‘rry’ to be ‘ddy’.
Note: As we want to make a partial modification, the replace method won’t be effective and will not be working, unless we add the regex parameter.
# regex
dup_names.replace(to_replace='rry', value='ddy',regex=True)
0 John 1 Dorothy 2 NaN 3 Beddy 4 Haddy 5 John 6 Deddyll 7 John dtype: object
Replace values by condition
THe last topic in this recipe will be to modify Series value according to a specific condition. In this case we’ll use a boolean condition to replace all occurrences of the strings John and Berry to Paul.
dup_names.loc[(dup_names =='John') | (dup_names =='Berry')] = 'Paul'
dup_names
0 Paul 1 Dorothy 2 NaN 3 Paul 4 Harry 5 Paul 6 Derryll 7 Paul dtype: object
Questions or feedback? Kindly leave us a comment.