[SOLVED] Linux shared library loading and sharing the code with other process

Linux shared library loading and sharing the code with other process

Suppose I have a shared library a.so which is loaded for the first time by my executable. My understanding is that to the middle of VMA, the shared library text sections are mapped. I have two questions;

(1) Is the ld.so going to load this shared memory text section pages to the physical memory and then maps to the VMA of that process?

(2) Suppose a second executable is started which uses the same shared library a.so. Is ld.so going to identify that this shared library is already loaded to the physical memory? How does it work to understand that?

Solution

To be precise, it's not ld.so's job to reserve physical memory or to manage or choose the mapping between virtual and physical memory, it's the kernel's job. When ld.so loads a shared library, it does so through the mmap syscall, and the kernel allocates the needed physical memory⁽¹⁾ and creates a virtual mapping between the library file and the physical memory. What is then returned by mmap is the virtual base address of the mapped library, which will then be used by the dynamic loader as a base to service calls to functions of that library.

Is ld.so going to identify that this shared library is already loaded to the physical memory? How does it work to understand that?

It's not ld.so, but the kernel that is going to identify this. It's a complicated process, but to make it simple, the kernel keeps track of which file is mapped where, and can detect when a request is made to map an already mapped file again, avoiding physical memory allocation if possible.

If the same file (i.e. a file with the same path, or more precisely the same inode) is mapped multiple times, the kernel will look at the existing mappings, and if possible it will reuse the same physical pages to avoid wasting memory. So ideally, if a shared library is loaded multiple times, it could be physically allocated only once.

In practice it's not that simple though. Since memory can also be written to, this "sharing" of physical pages can obviously only occur if the page that needs to be shared is unchanged from the original content of the file (otherwise different processes mapping the same file or library would interfere with each other). This is basically always true for code sections (.text) since they are usually read-only, and for other similar sections (like read-only data). It can also happen for RW sections if they are not modified⁽²⁾. So in short, the .text segments of already loaded libraries are usually only allocated into physical memory once.

(1) Actually, the kernel creates the mapping first, and then only allocates physical memory if the process tries to read or write to it through the mapping. This prevents wasting memory when it's not needed.

(2) This technique of sharing physical memory is managed through a copy-on-write mechanism where the kernel initially maps "clean" pages and marks them as "dirty" when they are written to, duplicating them as needed.