How to sort a Python dataframe by column values and index?

As part of your data analysis work you will often encounter the need to sort your data. The Pandas library provides the required capability to sort your dataframes by values or row indexes. Let’s take a look.

Creating your data

We’ll start by creating simple dataframe. Note that in addition to Pandas we’ll import the Numpy library as well as we are using it in order to define empty (NaN) values in our learning dataset.

Start by adding the snipet below to an empty Jupyter Notebook or other Python editor of choice.

# import Pandas and Numpy libraries
import pandas as pd
import numpy as np

data = ({"Year": [ 2018, 2017, 2016, 2019, 2020, 2020] ,
         "Sales": [20000, 30000, 40000, 35000, 16000, np.nan]})

sales = pd.DataFrame (data=data)

Let’s take a quick look into our Sales dataset:

	Year	Sales
0	2018	20000.0
1	2017	30000.0
2	2016	40000.0
3	2019	35000.0
4	2020	16000.0
5	2020	NaN

Sort ascending

We’ll start by showing how to simply order the dataframe in ascending order. This is accomplish by the very handy sort_values dataframe method.

sales.sort_values(by="Sales", ascending=True)

	Year	Sales
4	2020	16000.0
0	2018	20000.0
1	2017	30000.0
3	2019	35000.0
2	2016	40000.0
5	2020	NaN

Sort descending column (reverse)

Reversing the dataset order is also pretty simple.

sales.sort_values(by="Sales", ascending=False)

	Year	Sales
2	2016	40000.0
3	2019	35000.0
1	2017	30000.0
0	2018	20000.0
4	2020	16000.0
5	2020	NaN

Sort by multiple columns

Chances are that you might want to use multiple columns to define your sort criteria. If so, simply pass a list of the relevant columns to the by parameter in the sort_value method. Note that we introduced the ignore_index switch to reset the index column after the datae is sorted.

sales.sort_values(by=["Year", "Sales"], ascending=True,ignore_index=True)

	Year	Sales
0	2016	40000.0
1	2017	30000.0
2	2018	20000.0
3	2019	35000.0
4	2020	16000.0
5	2020	NaN

Placing empty values first

As you can see above, empty values (marked by NaN are by default placed in the end of the sorted dataset. Pandas provide us the ability to place the NaN values at the beginning of the ordered dataframe.

sales.sort_values(by="Sales", ascending=True,ignore_index=True, na_position="first")

Sort by columns index / index

Arranging the dataset by index is accomplished with the sort_index dataframe method.

sales.sort_index()

Saving you changes

You are able to persist your changes to your dataframe by using the inplace=True parameter in the sort_value method as shown below.

sales.sort_values(by=”Sales”, ascending=True,ignore_index=True, inplace= True)