csegmentation-faultmallocreallocbuffer-overflow

why is my malloc(1) segfaulting my program?


I have a problem in my C program that makes it so that it segfaults on malloc(1). I've spent many hours trying to find out why it segfaults but

I just can't figure it out. This portion of my program separates strings using identifiers. Example, string="12wo4ne53wone86wo99w5ne"; identifiers, start="wo", end="ne"; would return {"wo4ne", "wone", "wo99w5ne"}.

My code is as follows.

// basically like substring() in java
char* stringAt(char* str, int start, int end) {
    int length = end - start + 1;
    int temp = start;
    char* savedChar = NULL;
    savedChar = malloc(length + 1);
    for(int x = 0; x < length; x++) {
        savedChar[x] = str[temp];
        temp++;
    }
    savedChar[length] = '\0';
    return savedChar;
}

// finds "identifier" in string after certain position in that string
int findIdentifier(char* str, char* identifier, int pos, int isEnd) {

    // checks first character with first character in "identifier"
    for(int x = pos; x < strlen(str) - 1; x++) {
        if(str[x] == identifier[0]) {

            // if first characters match then check rest
            for(int i = 0; i < strlen(identifier); i++) {
                if(str[x + i] != identifier[i]) {
                    i = strlen(identifier);
                }
                if(i == strlen(identifier) - 1) {

                    // isEnd checks if you should count the last position of the identifier...
                    // -like ident="1234" then "end" is the position of "4".
                    // -not end be the position of "1"
                    if(isEnd == 1) {
                        return x + strlen(identifier) - 1;
                    }
                    return x;
                }
            }
        }
    }
    return -1;
}

char** separateStrings(char* str, char* identifier, char* lastIdentifier) {
    char** savedStr = NULL;

    // segfault here!
    savedStr = malloc(1);
    // segfualt here!

    int start = -2;
    int end = 0;
    int x = 0;
    do {

    // first run check/gets first and last positions of the identifier in the string
    if(start == -2) {
        start = findIdentifier(str, identifier, 0, 0);
    } else {
    start = findIdentifier(str, identifier, end + 1, 0);
    }
    end = findIdentifier(str, lastIdentifier, start + 1, 1);
    
    if(start != -1) {
        if(end == -1) {
            end = strlen(str);
        }
        char* newStr = stringAt(str, start, end);

        // dynamically increase the size of savedStr (x is iterated every run through here)
        realloc(savedStr, x + 1);
        savedStr[x] = malloc(strlen(newStr) + 1);
        strcpy(savedStr[x], newStr);
        x++;
    }

    } while(start != -1 && end != -1);
    return savedStr;
}

I have tried looking up why this could not be working and the only thing I can think of is a Buffer Overflow. So I then tried to run findIdentifier() many times but that works fine.

The program is also inconsistent and sometimes runs farther into the code than others but is accurate with its outcomes.


Solution

  • If savedStr is supposed to be a dynamically allocated array of strings, then allocating 1 byte of memory is not sufficient to store a pointer to a char (depending on your platform, this requires either 4 or 8 bytes).

    Even when you realloc it to be larger, you're only adding 1 byte at a time. This when you reference and assign with savedStr[x] = malloc(strlen(newStr) + 1); you're invoking undefined behavior, which in this case it would seem has manifested as a segmentation fault.

    You almost certainly want to allocate:

    savedStr = malloc(sizeof(char *));
    

    And then when you re-allocate:

    realloc(savedStr, (x + 1) * sizeof(char *));
    

    But you also want to capture the pointer realloc returns. If you're not too worried about allocation error checking:

    savedStr = realloc(savedStr, (x + 1) * sizeof(char *));
    

    Breaking out the dynamically resized string vector problem

    One of the core problems your code solves is having an array that grows dynamically. If you break this bit of code out, then you can approach the remainder of your program without worrying about the memory management.

    The size of the array doubles when needed rather than growing by a single entry to minimize allocations. There may be more ideal growth factors for efficiency.

    #include <string.h>
    #include <stdlib.h>
    #include <stdio.h>
    
    struct string_vec {
        char **data;
        size_t cap;
        size_t sz;
    };
    
    struct string_vec *string_vec_new(size_t cap) {
        struct string_vec *vec = malloc(sizeof(struct string_vec));
        if (!vec) return NULL;
    
        *vec = (struct string_vec){
            .data = malloc(sizeof(char *) * cap),
            .cap = cap,
            .sz = 0
        };
    
        return vec;
    }
    
    struct string_vec *string_vec_add(
        struct string_vec *vec,
        const char *str
    ) {
        if (vec->sz < vec->cap) {
            vec->data[vec->sz++] = strdup(str);
            return vec;
        }
    
        char **new_data = realloc(vec->data, sizeof(char *) * vec->cap * 2);
        if (!new_data) return NULL;
    
        vec->data = new_data;
        vec->cap *= 2;
        vec->data[vec->sz++] = strdup(str);
    
        return vec;
    }
    
    void string_vec_del(struct string_vec *vec) {
        for (size_t i = 0; i < vec->sz; i++) {
            free(vec->data[i]);
        }
    
        free(vec->data);
    }