In today’s quick tutorial we’ll learn how to re-format your column contents so that you can split data located in a DataFrame column into one or more columns.
Most probably you’ll be acquiring your data from an API, database, text or comma separated value file. But in this example we’ll use a simple dataframe that we’ll define manually out of a dictionary.
# Python3
import pandas as pd
targets = pd.DataFrame({ "manager": ["Johns;Tim ", "Mcgregor; Dave", "DeRocca; Leo", "Haze; Jim"] ,
"target": [42000, 85000, 45000, 33000]})
Let’s look at the data:
targets.head()
manager | target | |
---|---|---|
0 | Johns;Tim | 42000 |
1 | Mcgregor; Dave | 85000 |
2 | DeRocca; Leo | 45000 |
3 | Haze; Jim | 33000 |
Split columns in Pandas dataframes
Splitting into multiple columns with str.split()
We would like to split the manager column into two. As you can see the name and family name are separated by a semi-colon.
manager = targets['manager']
targets[['last_name','first_name']] = manager.str.split(";", n=1, expand=True)
targets
Explanation:
- We user str.split() method to first convert the Series to a string.
- The str.split() method receives a couple of parameters:
- First (‘;’) is the split character, in this case a semi-colon.
- Second (N=1) is the number of splits we would like to define for the string. If you would to divide the column text into more than two columns, set your N parameter accordingly.
- Expand=True has to be specified, as otherwise the string will not be divided into different columns.
- Once split the strings are kept in two columns we’ll add to the dataframe: ‘last_name’,’first_name’
Here’s our DataFrame:
manager | target | last_name | first_name | |
---|---|---|---|---|
0 | Johns;Tim | 42000 | Johns | Tim |
1 | Mcgregor; Dave | 85000 | Mcgregor | Dave |
2 | DeRocca; Leo | 45000 | DeRocca | Leo |
3 | Haze; Jim | 33000 | Haze | Ji |
Dividing a DataFrame column by comma
If your separator is a comma, then we just need to adjust the separator parameter of the split method.
targets[['last_name','first_name']] = manager.str.split(",", n=1, expand=True)
Export DataFrame Column values to list
Couple of readers asked me about this topic. We have a detailed post on this topic. Look here.
Drop a redundant Pandas column
if you would like to keep only one of the new columns we just created, you can use the following code:
targets.drop('first_name', axis=1)
More on removing Pandas dataframe columns can be found here.