Pandas vs. Polars: The Battle of Performance

While performing data analysis tasks, chances are you have encountered Pandas. It has been the most predominant library in data analysis for a long time. Polars on the other hand is a relatively new library that boasts high performance and memory efficiency. But, which one is better?

Here, you will see a comparison of the performance between Pandas and Polars across a range of common data manipulation tasks.

4

Measuring Performance: Metrics and Benchmark Dataset

This comparison will take into account the ability of Pandas andPolarslibraries to manipulate the Black Friday Sale dataset fromKaggle. This dataset contains 550,068 rows of data. It includes information about customer demographics, purchase history, and product details.

To ensure fair performance measurements, the comparison will use execution time as a standard performance metric on each task. The platform to run the code for each comparison task will be Google Colab.

A person holding an orange and blue Python sticker

The full source code that compares the Pandas and Polars libraries is available in aGitHub repository.

Reading Data From a CSV File

This task compares the time it takes for each library to read data from the Black Friday Sale dataset. The dataset is inCSV format. Pandas and Polars offer similar functionality for this task.

Pandas take twice the time it takes Polars to read data in the Black Friday Sale dataset​​​.

A bar chart showing comparison between the time it takes for Pandas vs Polars to read a CSV file

Selecting Columns

This task measures the time it takes for each library to select the columns from the dataset. It involves selecting theUser_IDandPurchasecolumns.

Polars take significantly less time to select columns from the dataset as compared to Pandas.

A bar chart showing comparison between the execution time Pandas vs Polars takes in selecting columns

Filtering Rows

This task compares the performance of each library in filtering rows where theGendercolumn is F from the dataset.

Polars take a very short time as compared to Pandas to filter out the rows.

A bar chart showing comparison between the time it takes Pandas vs Polars to filter rows

Grouping and Aggregating Data

This task involves grouping data by one or more columns. Then, performing some aggregation functions on the groups. It measures the time it takes for each library to group the data by theGendercolumn and calculate the average purchase amount for each group.

Again, Polars outperform Pandas. But the margin is not as huge as that of filtering the rows.

Applying Functions to Data

This task involves applying a function to one or more columns. It measures the time it takes for each library to multiply thePurchasecolumn by 2.

you’re able to barely see the Polars bar. Polars once again outperform Pandas.

Merging Data

This task involvesmerging two or more DataFrameson the basis that one or more common columns exist. It measures the time it takes for each library to merge theUser_IDandPurchasecolumns from two separate DataFrames.

It takes both libraries some time to complete this task. But Polars takes almost half the time Pandas takes to merge the data.

Why Polars Is Able to Outperform Pandas?

In all the data manipulation tasks above, Polars outperform Pandas. There are several reasons why Polars may outperform Pandas in execution time.

Expand Your Data Science Skills

There are many Python libraries out there that can help you in data science. Pandas and Polars are just a small fraction. To improve your program’s performance, you should familiarize yourself with more data science libraries. This will help you compare and choose which library best suits your use case.

Want to model data and create visualisations in with Python? You’ll need these data science libraries.

Sometimes the smallest cleaning habit makes the biggest mess.

Quality apps that don’t cost anything.

The key is not to spook your friends with over-the-top shenanigans.

Obsidian finally feels complete.

These are the best free movies I found on Tubi, but there are heaps more for you to search through.

Technology Explained

PC & Mobile