cpointersconstantsstrchr

How does strchr implementation work


I tried to write my own implementation of the strchr() method.

It now looks like this:

char *mystrchr(const char *s, int c) {
    while (*s != (char) c) {
        if (!*s++) {
            return NULL;
        }
    }
    return (char *)s;
}

The last line originally was

return s;

But this didn't work because s is const. I found out that there needs to be this cast (char *), but I honestly don't know what I am doing there :( Can someone explain?


Solution

  • I believe this is actually a flaw in the C Standard's definition of the strchr() function. (I'll be happy to be proven wrong.) (Replying to the comments, it's arguable whether it's really a flaw; IMHO it's still poor design. It can be used safely, but it's too easy to use it unsafely.)

    Here's what the C standard says:

    char *strchr(const char *s, int c);
    

    The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s. The terminating null character is considered to be part of the string.

    Which means that this program:

    #include <stdio.h>
    #include <string.h>
    
    int main(void) {
        const char *s = "hello";
        char *p = strchr(s, 'l');
        *p = 'L';
        return 0;
    }
    

    even though it carefully defines the pointer to the string literal as a pointer to const char, has undefined behavior, since it modifies the string literal. gcc, at least, doesn't warn about this, and the program dies with a segmentation fault.

    The problem is that strchr() takes a const char* argument, which means it promises not to modify the data that s points to -- but it returns a plain char*, which permits the caller to modify the same data.

    Here's another example; it doesn't have undefined behavior, but it quietly modifies a const qualified object without any casts (which, on further thought, I believe has undefined behavior):

    #include <stdio.h>
    #include <string.h>
    
    int main(void) {
        const char s[] = "hello";
        char *p = strchr(s, 'l');
        *p = 'L';
        printf("s = \"%s\"\n", s);
        return 0;
    }
    

    Which means, I think, (to answer your question) that a C implementation of strchr() has to cast its result to convert it from const char* to char*, or do something equivalent.

    This is why C++, in one of the few changes it makes to the C standard library, replaces strchr() with two overloaded functions of the same name:

    const char * strchr ( const char * str, int character );
          char * strchr (       char * str, int character );
    

    Of course C can't do this.

    An alternative would have been to replace strchr by two functions, one taking a const char* and returning a const char*, and another taking a char* and returning a char*. Unlike in C++, the two functions would have to have different names, perhaps strchr and strcchr.

    (Historically, const was added to C after strchr() had already been defined. This was probably the only way to keep strchr() without breaking existing code.)

    strchr() is not the only C standard library function that has this problem. The list of affected function (I think this list is complete but I don't guarantee it) is:

    void *memchr(const void *s, int c, size_t n);
    char *strchr(const char *s, int c);
    char *strpbrk(const char *s1, const char *s2);
    char *strrchr(const char *s, int c);
    char *strstr(const char *s1, const char *s2);
    

    (all declared in <string.h>) and:

    void *bsearch(const void *key, const void *base,
        size_t nmemb, size_t size,
        int (*compar)(const void *, const void *));
    

    (declared in <stdlib.h>). All these functions take a pointer to const data that points to the initial element of an array, and return a non-const pointer to an element of that array.