Edited on Oct 23
I would like to understand where does the function getpwuid
allocate memory from. I have some sample code that prints the username for the user id input to the program.
I read the manual page for getpwuid
and is says:
The return value may point to a static area, and may be overwritten by subsequent calls to getpwent(3), getpwnam(), or getpwuid(). (Do not pass the returned pointer to free(3).)
I read that the static area in the memory layout of the process contains the text, initialized data and uninitialized data. But the returned address is not in any of these regions (as far as I can understand - from looking at the boundary of these regions from etext
, edata
and end
).
I have the following questions:
struct passwd
). Who is responsible for freeing it?pwd
at 0x7f2b3c7aba60
why is pwd->pw_name
at 0x5646b174e2a0
?#include <pwd.h>
#include <ctype.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
extern char etext, edata, end;
char* userNameFromId(uid_t uid)
{
struct passwd *pwd;
pwd = getpwuid(uid);
printf("pwd is located at %10p\n", pwd);
printf("pw_name is located at %10p\n", pwd->pw_name);
return (pwd==NULL)?NULL:pwd->pw_name;
}
int main(int argc, char** argv)
{
uid_t u;
char* endptr = NULL;
char* name;
if(argc!=2){
printf("Usage: %s [user_id]\n", argv[0]);
return -1;
}
u = strtol(argv[1], &endptr, 10);
if(*endptr!='\0') {
printf("%s is not a number\nUsage: %s [user_id]\n", argv[1], argv[0]);
return -1;
}
name = userNameFromId(u);
if(name == NULL) {
printf("No user was found with the given id: %s\n", argv[1]);
return -1;
}
printf("program text ends before %10p\n", &etext);
printf("initialized data ends before %10p\n", &edata);
printf("uninitializd data ends before %10p\n", &end);
printf("name is located at %10p\n", &name);
printf("program break is located at %10p\n", sbrk(0));
printf("User name for id %d is %s\n", u, name);
FILE *file = fopen("/proc/self/maps", "r");
if (file == NULL) {
perror("Error opening file");
return -1;
}
char buffer[1024];
while (fgets(buffer, sizeof(buffer), file) != NULL) {
printf("%s", buffer);
}
fclose(file);
return 0;
}
Upon executing this program, it prints something like the following:
pwd is located at 0x7f2b3c7aba60
pw_name is located at 0x5646b174e2a0
program text ends before 0x5646b1272555
initialized data ends before 0x5646b1275010
uninitializd data ends before 0x5646b1275018
name is located at 0x7ffe9895c0a0
program break is located at 0x5646b176f000
User name for id 1000 is rragavendrak
5646b1271000-5646b1272000 r--p 00000000 08:20 7068 /home/rranjithkuma/a.out
5646b1272000-5646b1273000 r-xp 00001000 08:20 7068 /home/rranjithkuma/a.out
5646b1273000-5646b1274000 r--p 00002000 08:20 7068 /home/rranjithkuma/a.out
5646b1274000-5646b1275000 r--p 00002000 08:20 7068 /home/rranjithkuma/a.out
5646b1275000-5646b1276000 rw-p 00003000 08:20 7068 /home/rranjithkuma/a.out
5646b174e000-5646b176f000 rw-p 00000000 00:00 0 [heap]
7f2b3c587000-7f2b3c58a000 rw-p 00000000 00:00 0
7f2b3c58a000-7f2b3c5b2000 r--p 00000000 08:20 2282 /usr/lib/x86_64-linux-gnu/libc.so.6
7f2b3c5b2000-7f2b3c747000 r-xp 00028000 08:20 2282 /usr/lib/x86_64-linux-gnu/libc.so.6
7f2b3c747000-7f2b3c79f000 r--p 001bd000 08:20 2282 /usr/lib/x86_64-linux-gnu/libc.so.6
7f2b3c79f000-7f2b3c7a0000 ---p 00215000 08:20 2282 /usr/lib/x86_64-linux-gnu/libc.so.6
7f2b3c7a0000-7f2b3c7a4000 r--p 00215000 08:20 2282 /usr/lib/x86_64-linux-gnu/libc.so.6
7f2b3c7a4000-7f2b3c7a6000 rw-p 00219000 08:20 2282 /usr/lib/x86_64-linux-gnu/libc.so.6
7f2b3c7a6000-7f2b3c7b3000 rw-p 00000000 00:00 0
7f2b3c7b8000-7f2b3c7ba000 rw-p 00000000 00:00 0
7f2b3c7ba000-7f2b3c7bc000 r--p 00000000 08:20 2279 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f2b3c7bc000-7f2b3c7e6000 r-xp 00002000 08:20 2279 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f2b3c7e6000-7f2b3c7f1000 r--p 0002c000 08:20 2279 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f2b3c7f2000-7f2b3c7f4000 r--p 00037000 08:20 2279 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f2b3c7f4000-7f2b3c7f6000 rw-p 00039000 08:20 2279 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffe9893d000-7ffe9895e000 rw-p 00000000 00:00 0 [stack]
7ffe989b1000-7ffe989b5000 r--p 00000000 00:00 0 [vvar]
7ffe989b5000-7ffe989b7000 r-xp 00000000 00:00 0 [vdso]
Here some information about my system:
rragavendrak@DESKTOP-JJOG9GH:~$ uname -a
Linux DESKTOP-JJOG9GH 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
- I'm unable to understand who is allocating memory for the username string (and the six other fields in
struct passwd
). Who is responsible for freeing it?
The library used is responsible for any allocations and deallocations needed.
Use ldd --version
to see what implementation of the C standard library your linux system is using. Most of the linux distributions uses either glibc or musl (where distributions using glibc is in vast majority).
glibc's getpwuid
uses malloc
and realloc
in combination with a static struct passwd
.
The actual code is full of macros but it looks something like this after preprocessing:
static char *buffer;
struct passwd *getpwuid(uid_t uid) {
static size_t buffer_size;
static struct passwd resbuf;
struct passwd *result;
if (buffer == NULL) { // first call to getpwuid, allocate 1024 bytes
buffer_size = 1024;
buffer = (char *)malloc(buffer_size);
}
while (
buffer != NULL &&
(__getpwuid_r(uid, &resbuf, buffer, buffer_size, &result) == ERANGE))
{
// not enough space in buffer, realloc:
char *new_buf;
buffer_size *= 2;
new_buf = (char *)realloc(buffer, buffer_size);
if (new_buf == NULL) {
free(buffer);
((*__errno_location()) = (ENOMEM));
}
buffer = new_buf;
}
if (buffer == NULL) result = NULL;
return result;
}
The __getpwuid_r
function will use buffer
to store the strings that resbuf
points to if buffer_size
is large enough, which is initially 1024
. If it's not large enough, __getpwuid_r
will fail and the while
loop in getpwuid
will then double the size by buffer_size *= 2;
and then do realloc
and then call __getpwuid_r
again until it succeeds (or realloc
fails).
The strings pointed out by resbuf
(and result
) are all allocated on the heap (where buffer
points). The same area will be used for all calls to getpwuid
until you query for a uid
where the strings exceed buffer_size
. A new heap allocation will then be made. When the program exits, buffer_size
will therefore be the largest needed during the program run.
There's a weak_alias (buffer, FREEMEM_NAME)
in the source that may indicate that it will actually call free
when the program exists - but that's not very important. What's important is that it will only have one area allocated that will be used for all calls, so it will not leak and run out of memory no matter how many times you call getpwuid
.
I only know of very few Linux distributions that don't use glibc, but use musl instead. Although the code is very different from glibc's version, it uses the same approach where line
points at the heap allocated string storage and size
holds the number of bytes currently allocated:
static char *line;
static struct passwd pw;
static size_t size;
struct passwd *getpwuid(uid_t uid)
{
struct passwd *res;
__getpw_a(0, uid, &pw, &line, &size, &res);
return res;
}
Here __getpw_a
internally calls __nscd_query
to get the length of all the struct passwd
strings and then calculates the exact length needed to store them. If the size
is too small, it will call realloc
to make room for this entry. At the end of the program run, only one heap allocated buffer with the max size needed during the program run will be left.
- How can the compiler possibly know how long the username, password and other fields are going to be, so it can allocate the memory statically?
If the standard C library in use is written for an environment where there is a known upper limit to how long each of the strings can be, it can use a static char buffer[sum_of_the_longest_strings_allowed]
for storage. Knowing the longest strings allowed at compile-time may however not be impossible in case the database actually storing the information is located "elsewhere" (like NIS or LDAP). The getpwuid
implementation would have to know of the longest lengths permitted by all the possible backend systems, which may be hard in all but embedded system.
- Why is
pwd
at0x7f2b3c7aba60
why ispwd->pw_name
at0x5646b174e2a0
?
pwd
is pointing at a static struct passwd
(resbuf
in the above glibc implementation) while pwd->pw_name
is pointing inside the heap allocated buffer (where static char *buffer
points in the glibc implementation).