pythonalgorithmbenchmarking

Why is my Python implementation of selection sort seemingly so fast?


I'm starting to study algorithms and their efficiency. I started by selection sort. For the sake of interest, I wanted to compare the running time of the same algorithm implementation in python and c++. I got an unexpected result for me.

All the tests were performed on the same dataset.

the benchmark for python:

------------------------------ benchmark: 1 tests -----------------------------
Name (time in ns)                     Min      Max     Mean  StdDev   Median     
-------------------------------------------------------------------------------
test_selection_sort_benchmark     55.5665  63.8664  57.6280  1.7687  56.9664 
-------------------------------------------------------------------------------

the benchmark for cpp:

Run on (10 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 1.68, 1.72, 1.74
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
test_selection_sort_mean    118014759 ns    117982272 ns           30
test_selection_sort_median  117923875 ns    117889667 ns           30
test_selection_sort_stddev     266736 ns       258257 ns           30
test_selection_sort_cv           0.23 %          0.22 %            30

both have results in nanoseconds.

If I understood the results correctly, then the python implementation worked many times faster than in C++. I need help to figure out how this happened.

I published the source code in the repository https://github.com/Gwinkamp/sort-list-benchmark

python selection sort implementation:

def find_smallest_element(arr: list) -> int:
    smallest_item = arr[0]
    smallest_idx = 0

    for idx in range(1, len(arr)):
        item = arr[idx]
        if item < smallest_item:
            smallest_item = item
            smallest_idx = idx

    return smallest_idx


def selection_sort(arr: list) -> list:
    sorted_arr = []

    while len(arr) != 0:
        smallest_idx = find_smallest_element(arr)
        sorted_arr.append(arr.pop(smallest_idx))

    return sorted_arr

cpp selection sort implementation:

#include <vector>

int find_smallest_element(std::vector<int> &arr)
{
  int smallest_element = arr[0];
  size_t smallest_idx = 0;

  for (size_t i = 1; i < arr.size(); ++i)
  {
    int item = arr[i];
    if (item < smallest_element)
    {
      smallest_element = item;
      smallest_idx = i;
    }
  }

  return smallest_idx;
}

std::vector<int> selection_sort(std::vector<int> arr)
{
  if (arr.empty())
  {
    return arr;
  }

  std::vector<int> sorted_arr;
  sorted_arr.reserve(arr.size());

  while (!arr.empty())
  {
    int smallest_idx = find_smallest_element(arr);
    sorted_arr.push_back(arr[smallest_idx]);
    arr.erase(arr.begin() + smallest_idx);
  }

  return sorted_arr;
}

To create a benchmark in python, I used pytest-benchmark.
To create a benchmark in cpp, I used google benchmark

python benchmark test:

from pytest_benchmark.fixture import BenchmarkFixture
from selection_sort import selection_sort

# data for tests
UNSORTED_LIST = [3, 5, 7, 4, 1, 9]

def test_selection_sort_benchmark(benchmark: BenchmarkFixture) -> None:
    benchmark.pedantic(
        target=selection_sort,
        args=(UNSORTED_LIST,),
        rounds=100,
        warmup_rounds=5,
        iterations=30,
    )

cpp benchmark test:

#include <benchmark/benchmark.h>
#include <vector>

std::vector<int> selection_sort(std::vector<int> arr);

// data for tests
std::vector<int> unsorted_arr = {3, 5, 7, 4, 1, 9};

void test_selection_sort(benchmark::State &state)
{
  for (auto _ : state)
  {
    selection_sort(unsorted_arr);
  }
}

BENCHMARK(test_selection_sort);

BENCHMARK_MAIN();

The test data in the examples has been heavily truncated. I have attached the benchmark results for an array of 10,000 unsorted elements.

Is it possible that the benchmark tools work very differently?

For c++, I also tried measuring the execution time of a function using the chrono library. The results were almost the same as when using google benchmark


Solution

  • I really made a gross mistake in the python implementation by not first copying the source list, which I am modifying. After adding the copy, the python implementation started to run noticeably slower. Many times slower than the c++ implementation, as expected.

    corrected python implementation:

    from typing import List
    from copy import copy
    
    
    def find_smallest_element(arr: list) -> int:
        smallest_item = arr[0]
        smallest_idx = 0
    
        for idx in range(1, len(arr)):
            item = arr[idx]
            if item < smallest_item:
                smallest_item = item
                smallest_idx = idx
    
        return smallest_idx
    
    
    def selection_sort(arr: list) -> list:
        sorted_arr = []
        work_arr = copy(arr)  # <-- !!!
    
        while len(work_arr) != 0:
            smallest_idx = find_smallest_element(work_arr)
            sorted_arr.append(work_arr.pop(smallest_idx))
    
        return sorted_arr
    

    Thanks, jérôme-richard

    updated benchmark result for python:

    ------------------------------- benchmark: 1 tests ----------------------------------
    Name (time in s)                     Min     Max    Mean  StdDev  Median     IQR  
    -------------------------------------------------------------------------------------
    test_selection_sort_benchmark     1.1131  1.1131  1.1131  0.0000  1.1131  0.0000       
    -------------------------------------------------------------------------------------