clinuxglibcvariable-length-arrayrestrict-qualifier

Linux memcpy restrict keyword syntax


I know that the restrict qualifier in C specifies that the memory region pointed by two pointers should not overlap. It was my understanding that the Linux (not SUS) prototype for memcpy looks like -

void* memcpy(void *restrict dest, const void *restrict src, size_t count);

However, when I looked at man7.org/memcpy it seems that the declarations is -

void *memcpy(void dest[restrict .n], const void src[restrict .n], size_t n);

My questions are -

  1. When did this syntax get introduced? C99 or later or is this some GNU extension?
  2. What does the . before n signify? I am familiar with the variable length array declaration. Is the . for the variable appearing after the array specification? Is this part of the standard?

Solution

  • TLDR: It's an ad hoc syntax created in a discussion in a Linux mailing list that is used to express the size of VLA before the variable is declared, the . in .n means n refers to a parameter in the current function declaration, but n may appear after the currently declared parameter. They have also extended the usual int a[restrict n] parameter declaration to void type. I have no idea where such syntax can be found in the official documentation, but the mailing list has all the details.


    The change to the memcpy syntax in the Linux library functions manual was introduced by commit c64cd13e. The commit message is copied here verbatim for reference.

    Various pages: SYNOPSIS: Use VLA syntax in 'void *' function parameters

    Use VLA syntax also for void *, even if it's a bit more weird.

    Admittedly, it is weird enough from the C language perspective, because while void f(int n, int[restrict n]) is valid VLA syntax, void f(int n, void[restrict n]) is not because we are not allowed to have arrays of void.

    For the . before n, if we dig deeper we can find this thread from the linux-man mailing list.

    Let's take an example:

        int getnameinfo(const struct sockaddr *restrict addr,
                        socklen_t addrlen,
                        char *restrict host, socklen_t hostlen,
                        char *restrict serv, socklen_t servlen,
                        int flags);
    

    and some transformations:

        int getnameinfo(const struct sockaddr *restrict addr,
                        socklen_t addrlen,
                        char host[restrict hostlen], socklen_t hostlen,
                        char serv[restrict servlen], socklen_t servlen,
                        int flags);
    
    
        int getnameinfo(socklen_t hostlen;
                        socklen_t servlen;
                        const struct sockaddr *restrict addr,
                        socklen_t addrlen,
                        char host[restrict hostlen], socklen_t hostlen,
                        char serv[restrict servlen], socklen_t servlen,
                        int flags);
    

    (I'm not sure if I used correct GNU syntax, since I never used that extension myself.)

    The first transformation above is non-ambiguous, as concise as possible, and its only issue is that it might complicate the implementation a bit too much. I don't think forward-using a parameter's size would be too much of a parsing problem for human readers.

    I personally find the second form not terrible. Being able to read code left-to-right, top-down is helpful in more complicated examples.

    The second one is unnecessarily long and verbose, and semicolons are not very distinguishable from commas, for human readers, which may be very confusing.

        int foo(int a; int b[a], int a);
        int foo(int a, int b[a], int o);
    

    Those two are very different to the compiler, and yet very similar to the human eye. I don't like it. The fact that it allows for simpler compilers isn't enough to overcome the readability issues.

    This is true, I would probably use it with a comma and/or syntax highlighting.

    I think I'd prefer having the forward-using syntax as a non-standard extension --or a standard but optional language feature-- to avoid forcing small compilers to implement it, rather than having the GNU extension standardized in all compilers.

    The problems with the second form are:

    • it is not 100% backwards compatible (which maybe ok though) as the semantics of the following code changes:

    int n; int foo(int a[n], int n); // refers to different n!

    Code written for new compilers could then be misunderstood by old compilers when a variable with 'n' is in scope.

    • it would generally be fundamentally new to C to have backwards references and parser might need to be changes to allow this

    • a compiler or tool then has to deal also with ugly corner cases such as mutual references:

    int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);

    We could consider new syntax such as

    int foo(char buf[.n], int n);

    Personally, I would prefer the conceptual simplicity of forward declarations and the fact that these exist already in GCC over any alternative. I would also not mind new syntax, but then one has to define the rules more precisely to avoid the aforementioned problems.

    According to my understanding, this basically means the . is a way to refer to a VLA array size parameter that is used before declaration, and one use case is to handle mutual references.

    There is a follow-up thread that states,

    I am ok with the syntax, but I am not sure how this would work. If the type is determined only later you would still have to change parsers (some C compilers do type checking and folding during parsing, so need the types to be known during parsing) and you also still have the problem with the mutual dependencies.

    We thought about using this syntax

    int foo(char buf[.n], int n);

    because it is new syntax which means we can restrict the size to be the name of a parameter instead of allowing arbitrary expressions, which then makes forward references less problematic. It is also consistent with designators in initializers and could also be extend to annotate flexible array members or for storing pointers to arrays in structures:

    struct { int n; char buf[.n]; };

    struct { int n; char (*buf)[.n]; };

    Of course, there was also objection, which I think many people in the SO community would agree with,

    the only point i strongly care about is this one:

    Manual pages should not use

    • non-standard syntax
    • non-portable syntax
    • ambiguous syntax (i.e. syntax that might have different meanings with different compilers or in different contexts)
    • syntax that might be invalid or dangerous with some widely used compiler collections like GCC or LLVM