This quick tutorial will cover the topic of writing a DataFrame column to a Python list.
Consider the following code snippet:
# import the pandas package
import pandas as pd
# initialize the DataFrame
data = pd.DataFrame({ "manager": ["Debbie", "Daisy", "Dorothy", "Tim"] * 2,
"target": [32000, 45000, 18000, 20000] * 2})
data.head()
Here’s our DataFrame header:
manager | target | |
---|---|---|
0 | Debbie | 32000 |
1 | Daisy | 45000 |
2 | Dorothy | 18000 |
3 | Tim | 20000 |
4 | Debbie | 32000 |
Using pd.DataFrame.tolist() to write column to a list
Here’s the Python code you’ll need to export column values to a Python list:
# Dataframe Column values to Python list
data['manager'].values.tolist()
Here’s the list object we’ll get:
[‘Debbie’, ‘Daisy’, ‘Dorothy’, ‘Tim’, ‘Debbie’, ‘Daisy’, ‘Dorothy’, ‘Tim’]
Pandas column index to list
data['manager'].index.tolist()
[0, 1, 2, 3, 4, 5, 6, 7]
Unique column values to list
As shown above, we were able to export the values to a list. The list however had duplicated entries. Here’s how to ensure that the list has only unique values:
# Export unique column values only
data['manager'].unique().tolist()
[‘Debbie’, ‘Daisy’, ‘Dorothy’, ‘Tim’]
Unique list values using set
Other way to ensure uniqueness in the list is to convert it to a set to get rid of duplicated values.
unique_managers = set(data['manager'].values.tolist())
list(unique_managers)
tolist() with condition
What if we just want to write values that satisfy a condition?
# Condition: Only managers which name starts with the letter D
cond = data['manager'].str.startswith('D')
data[cond]['manager'].unique().tolist()
Here’s the result:
[‘Debbie’, ‘Daisy’, ‘Dorothy’]
Series values to List
For completeness, here’s a short snippet for writing Series values to a list (you can find a more complete example here)
manager_series = pd.Series(["Debbie", "Kim", "Dorothy", "Tim"])
manager_series.to_list()