cpointersbuffer

How do I overwrite a string buffer without modifying earlier uses of it


I've recently started learning C, and I'm trying to implement a hash map datastructure. I'm currently implementing my buckets as dynamically resizing array-backed lists. When testing my bucket_remove function I made an error in my test code that I'm struggling to understand.

Here is my new_entry function:

struct Entry *new_entry(char *id)
{
    struct Entry *e = (struct Entry *)malloc(sizeof(struct Entry));
    e->id = id;
    return e;
}

And here is my test_remove function:

static char *test_remove()
{
    struct Bucket *b = new_bucket();

    for (int i = 0; i < 5; i++)
    {
        char id[8];
        snprintf(id, 8, "entry-%d", i);
        struct Entry *e = new_entry(id);
        bucket_add(b, e);
    }

    printf("Entry 0 id: %s.\n", b->_entries[0]->id);
    printf("Entry 1 id: %s.\n", b->_entries[1]->id);

    // ... rest of code omitted for clarity
}

Inside the loop at each iteration, a new entry is created with the correct id, but the previous entries' ids are then updated to the most recent id. So, the console prints:

Entry 0 id: entry-4.
Entry 1 id: entry-4.

I believe this is related to the way I am passing the local id buffer to my new_entity function. Each entry must be receiving a pointer to the same memory. But I can't quite figure out why this is the case.

This is how I understand the code in the loop:

  1. char id[8]; - memory for 8 characters is allocated in the stack and the address is assigned to id.
  2. snprintf(id, 8, "entry-%d", i); - the pointer to the id buffer is passed to snprintf which fills it with the characters of the created string.
  3. struct Entry *e = new_entry(id); - the pointer to the id buffer is passed to new_entry where a new Entry struct is allocated on the heap and the id (pointer) is assigned to its id member.
  4. bucket_add(b, e); - the pointer to a bucket and the pointer to the entry are passed in to bucket_add and the entry (pointer) is added to the bucket's array of entries.

When a new iteration occurs, it looks like a new section of memory is allocated and its new memory address is assigned to a new variable in local scope. So how does the previous memory get overwritten?

I would appreciate if someone could clarify my understanding, please.


Solution

  • Thanks to the comments to my question, I now understand the issue.

    The error I have is a "stack use after scope" error. This error occurs when a pointer to memory in the stack is used after the scope in which the memory was allocated ends.

    In the code below I have created a buffer char id[8] in the for loop, giving it local scope within the block. This means that not only does id go out of scope at the end of each iteration (which most programmers recognise from other programming languages), but, crucially, the underlying memory itself is also marked as available.

    for (int i = 0; i < 5; i++)
    {
        char id[8];
        snprintf(id, 8, "entry-%d", i);
        struct Entry *e = new_entry(id);
        bucket_add(b, e);
    }
    

    In each iteration of my for loop, a new 8 bytes is requested from the stack and assigned to a new variable with local scope called id. My compiler reuses the 8 bytes on the stack that were just made available from the previous iteration, so id is given the same memory address on the stack as the previous iteration. This is implementation specific though, it may not always work this way.

    This in itself is fine and expected as the normal rules for scoping. The error occurs because I take the pointer to stack memory id and copy this pointer in to heap memory here:

    struct Entry *new_entry(char *id)
    {
        struct Entry *e = (struct Entry *)malloc(sizeof(struct Entry));
        e->id = id;
        return e;
    }
    

    The compiler has no problem with a pointer in the heap pointing to memory on the stack. But it becomes an error when this pointer on the heap is used to access the memory in the stack after it goes out of scope and becomes available for reuse. This is undefined behaviour, which means we risk memory corruption, or pointing at unintended memory.

    So, when I accessed the data outside of the local scope in which it was declared:

    printf("Entry 0 id: %s.\n", b->_entries[0]->id);
    printf("Entry 1 id: %s.\n", b->_entries[1]->id);
    

    The output:

    Entry 0 id: entry-4.
    Entry 1 id: entry-4.
    

    The undefined behaviour manifested itself as every entry in my bucket having an id pointer to the same memory location, and thus the same value. This would get worse the later the pointer was used because this memory would be overwritten again shortly.

    This problem is resolved by ensuring when a pointer needs to outlive its scope, that the underlying memory it points to *also outlives its scope. In my case, it means I need to copy the string that id points to, to heap memory and save the new heap address instead.

    I've chosen to use the strdup function to do this:

    #include <string.h>
    
    struct Entry *new_entry(char *id)
    {
        struct Entry *e = (struct Entry *)malloc(sizeof(struct Entry));
        e->id = strdup(id);
        return e;
    }
    

    The output shows that the error is resolved:

    Entry 0 id: entry-0.
    Entry 1 id: entry-1.