How to sort a Python dataframe by column values and index?

As part of your data analysis work you will often encounter the need to sort your data. The Pandas library provides the required capability to sort your dataframes by values or row indexes. Let’s take a look.

Creating your data

We’ll start by creating simple dataframe. Note that in addition to Pandas we’ll import the Numpy library as well as we are using it in order to define empty (NaN) values in our learning dataset.

Start by adding the snipet below to an empty Jupyter Notebook or other Python editor of choice.

# import Pandas and Numpy libraries
import pandas as pd
import numpy as np

data = ({"Year": [ 2018, 2017, 2016, 2019, 2020, 2020] ,
         "Sales": [20000, 30000, 40000, 35000, 16000, np.nan]})

sales = pd.DataFrame (data=data)

Let’s take a quick look into our Sales dataset:

YearSales
0201820000.0
1201730000.0
2201640000.0
3201935000.0
4202016000.0
52020NaN

Sort ascending

We’ll start by showing how to simply order the dataframe in ascending order. This is accomplish by the very handy sort_values dataframe method.

sales.sort_values(by="Sales", ascending=True)
YearSales
4202016000.0
0201820000.0
1201730000.0
3201935000.0
2201640000.0
52020NaN

Sort descending column (reverse)

Reversing the dataset order is also pretty simple.

sales.sort_values(by="Sales", ascending=False)
YearSales
2201640000.0
3201935000.0
1201730000.0
0201820000.0
4202016000.0
52020NaN

Sort by multiple columns

Chances are that you might want to use multiple columns to define your sort criteria. If so, simply pass a list of the relevant columns to the by parameter in the sort_value method. Note that we introduced the ignore_index switch to reset the index column after the datae is sorted.

sales.sort_values(by=["Year", "Sales"], ascending=True,ignore_index=True)
YearSales
0201640000.0
1201730000.0
2201820000.0
3201935000.0
4202016000.0
52020NaN

Placing empty values first

As you can see above, empty values (marked by NaN are by default placed in the end of the sorted dataset. Pandas provide us the ability to place the NaN values at the beginning of the ordered dataframe.

sales.sort_values(by="Sales", ascending=True,ignore_index=True, na_position="first")

Sort by columns index / index

Arranging the dataset by index is accomplished with the sort_index dataframe method.

sales.sort_index()

Saving you changes

You are able to persist your changes to your dataframe by using the inplace=True parameter in the sort_value method as shown below.

sales.sort_values(by=”Sales”, ascending=True,ignore_index=True, inplace= True)

Leave a Comment