>>> import sys >>> sys.set_int_max_str_digits(4300) # Illustrative, this is the default. >>> _ = int('2' * 5432) Traceback (most recent call last): ... ValueError: Exceeds the limit (4300) for integer string conversion: value has 5432 digits.
Python 3.10.7 introduced this breaking change for type conversion.
Documentation: Integer string conversion length limitation
Actually I don't understand why
See github issue CVE-2020-10735: Prevent DoS by large int<->str conversions #95778:
Problem
A Denial Of Service (DoS) issue was identified in CPython because we use binary bignum’s for our int implementation. A huge integer will always consume a near-quadratic amount of CPU time in conversion to or from a base 10 (decimal) string with a large number of digits. No efficient algorithm exists to do otherwise.
It is quite common for Python code implementing network protocols and data serialization to do int(untrusted_string_or_bytes_value) on input to get a numeric value, without having limited the input length or to do
log("processing thing id %s", unknowingly_huge_integer)
or any similar concept to convert an int to a string without first checking its magnitude. (http
,json
,xmlrpc
,logging
, loading large values into integer via linear-time conversions such as hexadecimal stored in yaml, or anything computing larger values based on user controlled inputs… which then wind up attempting to output as decimal later on). All of these can suffer a CPU consuming DoS in the face of untrusted data.Everyone auditing all existing code for this, adding length guards, and maintaining that practice everywhere is not feasible nor is it what we deem the vast majority of our users want to do.
This issue has been reported to the Python Security Response Team multiple times by a few different people since early 2020, most recently a few weeks ago while I was in the middle of polishing up the PR so it’d be ready before 3.11.0rc2.
Mitigation
After discussion on the Python Security Response Team mailing list the conclusion was that we needed to limit the size of integer to string conversions for non-linear time conversions (anything not a power-of-2 base) by default. And offer the ability to configure or disable this limit.
The Python Steering Council is aware of this change and accepts it as necessary.
Further discussion can be found on the Python Core Developers Discuss thread Int/str conversions broken in latest Python bugfix releases.
I found this comment by Steve Dower to be informative:
Our apologies for the lack of transparency in the process here. The issue was first reported to a number of other security teams, and converged in the Python Security Response Team where we agreed that the correct fix was to modify the runtime.
The delay between report and fix is entirely our fault. The security team is made up of volunteers, our availability isn’t always reliable, and there’s nobody “in charge” to coordinate work. We’ve been discussing how to improve our processes. However, we did agree that the potential for exploitation is high enough that we didn’t want to disclose the issue without a fix available and ready for use.
We did work through a number of alternative approaches, implementing many of them. The code doing int(gigabyte_long_untrusted_string) could be anywhere inside a json.load or HTTP header parser, and can run very deep. Parsing libraries are everywhere, and tend to use int indiscriminately (though they usually handle ValueError already). Expecting every library to add a new argument to every int() call would have led to thousands of vulnerabilities being filed, and made it impossible for users to ever trust that their systems could not be DoS’d.
We agree it’s a heavy hammer to do it in the core, but it’s also the only hammer that has a chance of giving users the confidence to keep running Python at the boundary of their apps.
Now, I’m personally inclined to agree that int->str conversions should do something other than raise. I was outvoted because it would break round-tripping, which is a reasonable argument that I accepted. We can still improve this over time and make it more usable. However, in most cases we saw, rendering an excessively long string isn’t desirable either. That should be the opt-in behaviour.
Raising an exception from str may prove to be too much, and could be reconsidered, but we don’t see a feasible way to push out updates to every user of int, so that will surely remain global.