pythonpython-3.xmemory-managementmemory-safety

Is Python memory-safe?


With Deno being the new Node.js rival and all, the memory-safe nature of Rust has been mentioned in a lot of news articles, one particular piece stated Rust and Go are good for their memory-safe nature, as are Swift and Kotlin but the latter two are not used for systems programming that widely.

Safe Rust is the true Rust programming language. If all you do is write Safe Rust, you will never have to worry about type-safety or memory-safety. You will never endure a dangling pointer, a use-after-free, or any other kind of Undefined Behavior.

This piqued my interest into understanding if Python can be regarded as memory-safe and if yes or no, how safe or unsafe?

From the outset, the article on memory safety on Wikipedia does not even mention Python and the article on Python only mentions memory management it seems. The closest I've come to finding an answer was this one by Daniel:

The wikipedia article associates type-safe to memory-safe, meaning, that the same memory area cannot be accessed as e.g. integer and string. In this way Python is type-safe. You cannot change the type of a object implicitly.

But even this only seems to imply a connection between two aspects (using an association from Wikipedia, which again is debatable) and no definitive answer on whether Python can be regarded as memory-safe.


Solution

  • Wikipedia lists the following examples of memory safety issues:

    Access errors: invalid read/write of a pointer

    • Buffer overflow - out-of-bound writes can corrupt the content of adjacent objects, or internal data (like bookkeeping information for the heap) or return addresses.
    • Buffer over-read - out-of-bound reads can reveal sensitive data or help attackers bypass address space layout randomization.

    Python at least tries to protect against these.

    • Race condition - concurrent reads/writes to shared memory

    That's actually not that hard to do in languages with mutable data structures. (Advocates of functional programming and immutable data structures often use this fact as an argument in their favor).

    • Invalid page fault - accessing a pointer outside the virtual memory space. A null pointer dereference will often cause an exception or program termination in most environments, but can cause corruption in operating system kernels or systems without memory protection, or when use of the null pointer involves a large or negative offset.

    • Use after free - dereferencing a dangling pointer storing the address of an object that has been deleted.

    • Uninitialized variables - a variable that has not been assigned a value is used. It may contain an undesired or, in some languages, a corrupt value.

    • Null pointer dereference - dereferencing an invalid pointer or a pointer to memory that has not been allocated

    • Wild pointers arise when a pointer is used prior to initialization to some known state. They show the same erratic behaviour as dangling pointers, though they are less likely to stay undetected.

    There's no real way to prevent someone from trying to access a null pointer. In C# and Java, this results in an exception. In C++, this results in undefined behavior.

    • Memory leak - when memory usage is not tracked or is tracked incorrectly
    • Stack exhaustion - occurs when a program runs out of stack space, typically because of too deep recursion. A guard page typically halts the program, preventing memory corruption, but functions with large stack frames may bypass the page.

    Memory leaks in languages like C#, Java, and Python have different meanings than they do in languages like C and C++ where you manage memory manually. In C or C++, you get a memory leak by failing to deallocate allocated memory. In a language with managed memory, you don't have to explicitly de-allocate memory, but it's still possible to do something quite similar by accidentally maintaining a reference to an object somewhere even after the object is no longer needed.

    This is actually quite easy to do with things like event handlers in C# and long-lived collection classes; I've actually worked on projects where there were memory leaks in spite of the fact that we were using managed memory. In one sense, working with an environment that has managed memory can actually make these issues more dangerous because programmers can have a false sense of security. In my experience, even experienced engineers often fail to do memory profiling or write test cases to check for this (again, likely due to the environment giving them a false sense of security).

    Stack exhaustion is quite easy to do in Python too (e.g. with infinite recursion).

    • Heap exhaustion - the program tries to allocate more memory than the amount available. In some languages, this condition must be checked for manually after each allocation.

    Still quite possible - I'm rather embarrassed to admit that I've personally done that in C# by loading an enormous file into memory (although not in Python yet).

    • Double free - repeated calls to free may prematurely free a new object at the same address. If the exact address has not been reused, other corruption may occur, especially in allocators that use free lists.
    • Invalid free - passing an invalid address to free can corrupt the heap.
    • Mismatched free - when multiple allocators are in use, attempting to free memory with a deallocation function of a different allocator[20]
    • Unwanted aliasing - when the same memory location is allocated and modified twice for unrelated purposes.

    Unwanted aliasing is actually quite easy to do in Python. Here's an example in Java (full disclosure: I wrote the accepted answer); you could just as easily do something quite similar in Python. The others are managed by the Python interpreter itself.

    So, it would seem that memory-safety is relative. Depending on exactly what you consider a "memory-safety issue," it can actually be quite difficult to entirely prevent. High-level languages like Java, C#, and Python can prevent many of the worst of these errors, but there are other issues that are difficult or impossible to completely prevent.