bashawksedfreebsdnetscaler

bash script to extract data from large log file


I am using a FreeBSD (on Citrix NetScaler)… I have the challenge of extracting the Mbps from a log that has literally 100's of thousands of lines.

The log look something like this, where the Mbps number with decimal can range from 0.0 to 9999.99 or more. I.e.

#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (1.0)… alphatext_anylength... (more_alphatext_in brackets)... 
#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (500.15)… alphatext_anylength... (more_alphatext_in brackets)... 
#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (1500.01)… alphatext_anylength... (more_alphatext_in brackets)... 

Now the challenge is I want to filter out all the Mbps's bracketed number with decimals that is A) greater than 500mbps, with B) line numbers. I.e., for the above sample output, I want to see only the following:

#>[line number 20] 500.15
#>[line number 55] 1500.01

I have tried:

cat output.log | sed -n -e 's/^.*Mbps//p' |cut -c 3-10

Which gives me 10 characters after Mbps. But this is not smart enough to show only bracketed decimal number that is greater than 500Mbps.

I appreciate this might be a bit if a challenge... however would be grateful for any bash scripts wizards out there that can create magic!

Thanks in advance!


Solution

  • You can use awk to match the lines containing Mbps ( followed by any non-) characters followed by ). Then replace the beginning of the string up to Mbps ( with an empty string and also ) up to the end with an empty string.

    If the remaining line converted to a number (+0) is greater than 500, print the line number and the line.

    awk '
      /Mbps \([^)]*\)/{ sub(/.*Mbps \(/, ""); sub(/\).*/, "") }
      ($0+0) > 500{ print FNR, $0 }
    ' file
    

    Edit: To match lines containing an optional space after Mbps with a value > 50, use

    awk '
      /Mbps ?\([^)]*\)/{ sub(/.*Mbps ?\(/, ""); sub(/\).*/, "") }
      ($0+0) > 50{ print FNR, $0 }
    ' file