As part of your data analysis work you will often encounter the need to sort your data. The Pandas library provides the required capability to sort your dataframes by values or row indexes. Let’s take a look.
Creating your data
We’ll start by creating simple dataframe. Note that in addition to Pandas we’ll import the Numpy library as well as we are using it in order to define empty (NaN) values in our learning dataset.
Start by adding the snipet below to an empty Jupyter Notebook or other Python editor of choice.
# import Pandas and Numpy libraries
import pandas as pd
import numpy as np
data = ({"Year": [ 2018, 2017, 2016, 2019, 2020, 2020] ,
"Sales": [20000, 30000, 40000, 35000, 16000, np.nan]})
sales = pd.DataFrame (data=data)
Let’s take a quick look into our Sales dataset:
Year | Sales | |
---|---|---|
0 | 2018 | 20000.0 |
1 | 2017 | 30000.0 |
2 | 2016 | 40000.0 |
3 | 2019 | 35000.0 |
4 | 2020 | 16000.0 |
5 | 2020 | NaN |
Sort ascending
We’ll start by showing how to simply order the dataframe in ascending order. This is accomplish by the very handy sort_values dataframe method.
sales.sort_values(by="Sales", ascending=True)
Year | Sales | |
---|---|---|
4 | 2020 | 16000.0 |
0 | 2018 | 20000.0 |
1 | 2017 | 30000.0 |
3 | 2019 | 35000.0 |
2 | 2016 | 40000.0 |
5 | 2020 | NaN |
Sort descending column (reverse)
Reversing the dataset order is also pretty simple.
sales.sort_values(by="Sales", ascending=False)
Year | Sales | |
---|---|---|
2 | 2016 | 40000.0 |
3 | 2019 | 35000.0 |
1 | 2017 | 30000.0 |
0 | 2018 | 20000.0 |
4 | 2020 | 16000.0 |
5 | 2020 | NaN |
Sort by multiple columns
Chances are that you might want to use multiple columns to define your sort criteria. If so, simply pass a list of the relevant columns to the by parameter in the sort_value method. Note that we introduced the ignore_index switch to reset the index column after the datae is sorted.
sales.sort_values(by=["Year", "Sales"], ascending=True,ignore_index=True)
Year | Sales | |
---|---|---|
0 | 2016 | 40000.0 |
1 | 2017 | 30000.0 |
2 | 2018 | 20000.0 |
3 | 2019 | 35000.0 |
4 | 2020 | 16000.0 |
5 | 2020 | NaN |
Placing empty values first
As you can see above, empty values (marked by NaN are by default placed in the end of the sorted dataset. Pandas provide us the ability to place the NaN values at the beginning of the ordered dataframe.
sales.sort_values(by="Sales", ascending=True,ignore_index=True, na_position="first")
Sort by columns index / index
Arranging the dataset by index is accomplished with the sort_index dataframe method.
sales.sort_index()
Saving you changes
You are able to persist your changes to your dataframe by using the inplace=True parameter in the sort_value method as shown below.
sales.sort_values(by=”Sales”, ascending=True,ignore_index=True, inplace= True)