In today’s recipe we’ll show how you can very easily convert np arrays to Pandas series or dataframe objects in Python.
We’ll look into the following scenarios:
- Convert array to a Series
- Ndarray to DataFrame column
- NumPy array to DataFrame column with column names
- Insert array to existing DataFrame row
Data Preparation
Let’s define a simple two dimensional array that we’ll use as an example in this recipe. Copy this Python code and paste it in your favorite development environment or in Jupyter Notebook.
# Python3
import numpy as np
import pandas as pd
np.random.seed(10)
my_array = np.random.randint(1000, 10000, (6,2))
Note: If you receive a ‘modulenotfound’ error when importing Numpy, look at how to fix it here.
Let’s take a look at the auto-generated dataset:
print(my_array)
[[2289 8293] [2344 8291] [5829 2520] [7400 6648] [5452 1239] [3443 3102]]
We have created an ndarray, let’s verify that:
type(my_array)
numpy.ndarray
Note: use the shape property to find out the number of rows and columns of the ndarray (my_array.shape)
Convert Numpy array to Pandas Series/Column
First case we’ll cover is to write one of the ndarray columns to a Python Series object. That’s easy with the pd.Series function.
# 1. Numpy Ndarray to PD series
actuals = pd.Series(data = my_array[:,0],dtype='int32' )
actuals.head()
As expected we got a series:
0 2289 1 2344 2 5829 3 7400 4 5452 dtype: int32
Ndarray to Pandas Dataframe
We’ll now write our Numpy ndarray directly to a Pandas Dataframe.
Without columns names
#2. array to df
revenue = pd.DataFrame(data = my_array)
With column names
For better readability and to ease on your data analysis, you should define column headings as shown below:
#3. Numpy array to Pandas dataframe with columns
revenue = pd.DataFrame(data = my_array, columns= ['budget', 'actual'] )
revenue.head()
Here’s our dataframe header:
budget | actual | |
---|---|---|
0 | 2289 | 8293 |
1 | 2344 | 8291 |
2 | 5829 | 2520 |
3 | 7400 | 6648 |
4 | 5452 | 1239 |
Add Numpy array to DataFrame row
We’ll now create a simple array and append it to DataFrame.
# 4. Insert NP array to exiting dataframe row
new_array = np.array([1850, 1950])
revenue.loc[len(revenue)] = new_array
revenue.tail()
Our new ndarray was inserted in the last position of our df.
Note: If required you can reset the df index using the reset_index DataFrame method:
revenue.reset_index(drop=True)