Git HTTP error 'fatal: protocol error: bad line length character: '

I'm currently trying to create a simple Git HTTP Server in C without an already existing Web server. Currently the only thing I do is creating a server socket and executing the git-http-backend CGI script with the Environment Variables from the Client Request. The Pull Request already works, but only for empty repositories. When I'm trying to clone a repository with content, I'm getting this Error on the client side:

fatal: protocol error: bad line length character:

Here is the communication log between client and server:

C: GET /test.git/info/refs?service=git-upload-pack HTTP/1.1
C: Host: localhost:9000
C: User-Agent: git/2.20.1
C: Accept: */*
C: Accept-Encoding: deflate, gzip
C: Accept-Language: en-US, *;q=0.9
C: Pragma: no-cache
C:

S: HTTP/1.1 200 OK
S: Expires: Fri, 01 Jan 1980 00:00:00 GMT
S: Pragma: no-cache
S: Cache-Control: no-cache, max-age=0, must-revalidate
S: Content-Type: application/x-git-upload-pack-advertisement
S: 
S: 001e# service=git-upload-pack
S: 000000fadd3fba560f4afe000e70464ac3a7a9991ad13eb0
S: HEAD003fdd3fba560f4afe000e70464ac3a7a9991ad13eb0 refs/heads/master
S: 0000

Just a little side note: HTTP/1.1 200 OK is added manually, the rest is from the CGI script. Also you can find my code here. First I had the theory, that the content of of the server response has a false placement of the new lines (e.g. The HEAD should be a line higher), but turns out this is not really the case. So my Question is: Is there anything I could do? Editing this response to a good format is pretty complicated in C, especially with longer responses.

Solution

First of all, please make sure you understand the security implications of handing in data controlled by an outside actor to a function like popen. The implementation you have right now is trivially exploitable by shell injection by adding shell special characters to the request line. Even by just using git with a specially crafted repository name your current code allows arbitrary commands to execute on the server. Try this for example:

git clone "$(echo -e 'http://localhost:9000/;echo\tunexpected\t>helloworld;cat\t/etc/passwd;exit;.git')"

This will create a file in the working directory of the server with the string "unexpected" in it and will send back the contents of /etc/passwd to the client (use wireshark to see it).

To avoid this, you need to make sure that you properly escape the input data so that no shell injection can happen. Ideally you would use mechanisms like execve that allow you to hand in the environment variables and possible command line arguments as buffers instead of producing possibly unsafe strings that are then parsed by a shell. Such a solution is of course a bit more involved as it means restructuring your program.

Then you are using an unsafe way to concatenate strings. strcat has no way to know how large the destination buffer is, it will therefore happily overwrite the stack past the buffer given enough input. This is a classic stack overflow that can then be exploited. Use safer alternatives like strlcat or better yet a proper string library.

Now on to your original problem:

The output you get from git http-backend is raw binary output, including null bytes. In the example response there would indeed be a null byte after the HEAD separating off the supported feature list. You can see that by running your command manually and piping it to something like xxd or dumping it to a file and looking at it with a hex editor.

In the loop where you read from the pipe and then concatenate the output into your response buffer, you truncate the data because strcat operates on C strings that are terminated by a null byte. The rest of the HEAD line and the null byte itself never make it to the response, breaking the git protocol.

You can use fread to read raw data from the pipe directly into the buffer. Then you would need to copy that buffer to the response buffer with a function that doesn't stop at a null byte, like memcpy. For this to work you also need to keep track of the bytes already read and how much space still remains in the response buffer.

Alternatively, since you do not actually do any processing on the final response buffer, you could also directly send the data you read from the pipe to the client socket. This way you don't need to worry about the response buffer size and keeping track of the offset and remaining space. Here is a version that works for the initial request git does:

        char response[10000] = "HTTP/1.1 200 OK\r\n";
        send(client_socket, response, strlen(response), 0);
        while (!feof(g)) {
            size_t bytes_read = fread(response, 1, sizeof(response), g);
            if (bytes_read == 0)
                break;

            send(client_socket, response, bytes_read, 0);
        }

The subsequent POST request then fails.