regexsedcolorize

Trying to colorize Apache http wire logs using sed


We have log files containing output from the Apache Http client. We are seeing output as it goes "over the wire" and it includes lines like:

<< HTTP/1.1 200 The request has succeeded

The chevrons '<<' indicate incoming, in contrast to '>>' for outgoing content. Using 'tail -F' to follow these logs is entertaining enough but I thought it would be a useful exercise to use sed to colorize the output according to whether it is input or output.

A simple test will show you what I mean:

echo '<< HTTP/1.1 200 The request has succeeded' | sed -r -e 's_<<_\x1b[31;1m&\x1b[0m_i' -e 's_>>_\x1b[32;1m&\x1b[0m_i'

for input, and

echo '>> HTTP/1.1 200 The request has succeeded' | sed -r -e 's_<<_\x1b[31;1m&\x1b[0m_i' -e 's_>>_\x1b[32;1m&\x1b[0m_i'

for output.

So far, so good. The descent into regex madness began when it occurred to me that it would be even more useful to highlight the HTTP response codes and colorize them according to the class: green for 2xx and red for 5xx, for example.

So far I can match up to the first digit in the response code with: echo '<< HTTP/1.1 200 The request has succeeded' | sed -r -e 's_<<_\x1b[31;1m&\x1b[0m_i' -e 's_>>_\x1b[32;1m&\x1b[0m_i' -e 's_HTTP[^[:alpha:]]*2\d*_\x1b[32;1m&\x1b[0m_g'

It is only colorizing up to, << HTTP/1.1 2. My expectation was that HTTP[^[:alpha:]]*2\d* would match 'HTTP', followed by everything that is not alphabetic upto '2', followed by any number of digits. Ideally I would use '{2}' rather than '*' but that has the same effect.

Can any regex guru point out my mistake?


Solution

  • echo '<< HTTP/1.1 200 The request has succeeded' | \
    sed -r -e 's_<<_\x1b[31;1m&\x1b[0m_;t http
               s_>>_\x1b[32;1m&\x1b[0m_
    :http
               s_HTTP[^[:alpha:]]\{1,\}2[0-9]\{1,\}\x1b[32;1m&\x1b[0m_g'
    

    Try this.

    try also -u for unbuffered that is better on a real stream