regexsedawk

Using regular expression to extract substring


I want to extract from < to the next from my log-files.

$>cat messages.log
2013-03-24 19:32:37.231 <F280 [192.168.178.22]:5000 -- Unknown>, Msg:[Test1]
2013-03-24 19:32:37.547 <F281 [192.168.178.22]:5000 -- Unknown>, Msg:[Test2
Test3
Test4]
2013-03-24 19:32:38.833 <F280 [192.168.178.22]:5000 -- Unknown>, Msg:[Test5]
2013-03-24 19:32:42.222 <F281 [192.168.178.22]:5000 -- Unknown>, Msg:[Test6]
$>sed 's/.*\<\(.*\) \[.*/\1|/g' messages.log
F280|
F281|
Test3
Test4]
F280|
F281|

I almost got what I wanted except for the output with the newlines. So I'd like to have the following result:

F280|F281|F280|F281

How has the regular expression look like?


Solution

  • I wouldn't create a unreadable regexp to do this I'd use awk here:

    $ awk -F'[< ]' '/^[0-9]+/{s?s=s"|"$4:s=s$4}END{print s}' file
    F280|F281|F280|F281
    

    To make the script more readable:

    /^[0-9]+/ {
        s ? s = s "|" $4
          : s = s $4 }
    END {print s}