pythonperformancearchlinuxf-stringpython-decimal

Is there an efficient way to format Decimal?


It is nice, easy and fast to use format-strings in python. So I never considered the performance penalty of this operation. Some time ago I switched my program from float data type to Decimal to eliminate rounding errors. Some performance degradation was expected and it is ok. But I'm surprised to see how large performance penalty I have just from printing a log of formatted Decimal numbers.

Below I illustrate the difference with cProfile results. The question is - are there any efficient ways to have formatted Decimal numbers in python?

Here is a test for float-number formatting:

from cProfile import Profile

a_float = 1234567890.12345

def format_float(value: float) -> str:
    if value is None:
        return ''
    result = f"{value:+,.6f}"
    return result

def test_float():
    for i in range(1000000):
        b = format_float(a_float)

p = Profile()
p.runcall(test_float)
p.print_stats()

it gives result: 1000002 function calls in 0.839 seconds

And here is the same test with Decimal:

from decimal import Decimal
from cProfile import Profile

a_decimal = Decimal('1234567890.12345')

def format_decimal(value: Decimal) -> str:
    if value is None:
        return ''
    result = f"{value:+,.6f}"
    return result

def test_decimal():
    for i in range(1000000):
        b = format_decimal(a_decimal)

p = Profile()
p.runcall(test_decimal)
p.print_stats()

which results in: 55000002 function calls in 27.739 seconds

Is there any way to format Decimal nicely in shorter time?

The discussion in comments brought nice spot - that others don't have such a problem. Here I think the output of the profiler is of some interest.

For the case with float it is pretty short and ends with {method 'disable' of '_lsprof.Profiler' objects} that suggest some optimization I think:

>>> p.print_stats()
         1000002 function calls in 0.735 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.564    0.000    0.564    0.000 <stdin>:1(format_float)
        1    0.171    0.171    0.735    0.735 <stdin>:1(test_float)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

while with Decimal there are a lot of lines:

>>> p.print_stats()
         47000029 function calls in 23.731 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.625    0.000   23.395    0.000 <stdin>:1(format_decimal)
        1    0.335    0.335   23.731   23.731 <stdin>:1(test_decimal)
  1000000    0.947    0.000    1.795    0.000 _pydecimal.py:2622(_rescale)
  1000000    3.218    0.000   22.771    0.000 _pydecimal.py:3758(__format__)
  1000000    0.533    0.000    0.665    0.000 _pydecimal.py:3844(_dec_from_triple)
        1    0.000    0.000    0.000    0.000 _pydecimal.py:3902(__init__)
        5    0.000    0.000    0.000    0.000 _pydecimal.py:3938(_set_integer_check)
        2    0.000    0.000    0.000    0.000 _pydecimal.py:3952(_set_signal_dict)
        9    0.000    0.000    0.000    0.000 _pydecimal.py:3963(__setattr__)
  1000000    0.294    0.000    0.405    0.000 _pydecimal.py:448(getcontext)
  1000000    1.974    0.000    3.767    0.000 _pydecimal.py:6188(_parse_format_specifier)
  1000000    0.870    0.000    0.988    0.000 _pydecimal.py:6268(_format_align)
  1000000    1.734    0.000    1.822    0.000 _pydecimal.py:6295(_group_lengths)
  1000000    6.271    0.000   10.549    0.000 _pydecimal.py:6318(_insert_thousands_sep)
  1000000    0.236    0.000    0.236    0.000 _pydecimal.py:6355(_format_sign)
  1000000    1.390    0.000   13.162    0.000 _pydecimal.py:6365(_format_number)
  3000000    0.466    0.000    0.466    0.000 _pydecimal.py:820(__bool__)
  1000000    0.132    0.000    0.132    0.000 {built-in method __new__ of type object at 0x7e4f98d56d40}
        7    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
 16000000    0.979    0.000    0.979    0.000 {built-in method builtins.len}
  4000000    0.732    0.000    0.732    0.000 {built-in method builtins.max}
  4000000    0.524    0.000    0.524    0.000 {built-in method builtins.min}
        1    0.000    0.000    0.000    0.000 {built-in method fromkeys}
  4000000    0.277    0.000    0.277    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  1000000    0.112    0.000    0.112    0.000 {method 'get' of '_contextvars.ContextVar' objects}
  1000000    0.613    0.000    0.613    0.000 {method 'groupdict' of 're.Match' objects}
  1000000    0.289    0.000    0.289    0.000 {method 'join' of 'str' objects}
  1000000    1.180    0.000    1.180    0.000 {method 'match' of 're.Pattern' objects}
        1    0.000    0.000    0.000    0.000 {method 'set' of '_contextvars.ContextVar' objects}

In fast test ran by Suramuthu R here here I see that his Decimal test is similar to my float and ends with the line:

{method 'disable' of '_lsprof.Profiler' objects}

It looks like some optimization is switched off for Decimal on my PC. Does anyone have idea how to check it?


Solution

  • This happens in the version of Python provided by the Arch repositories because it forces the use of several libraries installed on the system when compiling Python (see Python PKGBUILD), among them libmpdec.

    So, the cause is the lack of the optional mpdecimal dependency, whose absence forces Python to use the Python implementation of Decimal, as yuri kilochek correctly pointed out in the comments:

    ... the problem is that your configuration is using a pure python decimal implementation for some reason (you can see _pydecimal.py in the stats) while it should be using a C implementation.

    enter image description here

    Installing the mpdecimal package fixes the problem:

    sudo pacman -S install mpdecimal
    

    enter image description here

    Being faster than for the float case, 1000002 function calls in 1.104 seconds in this same system.