perlparsingloggingmtaexim4

perl log parser for exim4 mta


im going to write log parser for exim4 MTA, and i have a couple of questions. (i know that there is an exilog program)

Question: 1. what is better way to parse a line? (its abbout 5Gb of such lines :D ) ive got this $line:

2011-12-24 12:32:12 MeSSag3-Id-Ye <hostname> (from@some.email) <to@some.email> => H=[321.123.321.123] T="Hello this is a test"

and want get all this fields into variables. im using now something likethat ($var,[var2])=($line =~ /somecoolregexp/ ); is it fast/good or i should use something else?


Solution

  • Well, it depends on what you want to do with the data.

    Assuming you have a big while (<>) { ... } around this, you can get the simplest parsing by just using split:

    my @fields = split;
    

    Next level would be to add a bit of meaning

    my ($date, $time, $id, $host, $from, $to, undef, $dest) = split;
    

    (Note, you can assign to undef if you want to ignore a result)

    Finally, you can clean up a lot of the cruft by using a regular expression. You can also combine the split above with smaller regexps to clean each field individually.

    my ($datetime, $id, $host, $from, $to, $dest) = 
        /([\d-]+ [\d:]+) \s+     # date and time together
         (\S+)           \s+     # message id, just a block of non-whitespace
         <(.*?)>         \s+     # hostname in angle brackets, .*? is non-greedy slurp
        \((.*?)\)        \s+     # from email in parens
         <(.*?)>         \s+     # to email in angle brackets
          \S+            \s+     # separated between to-email and dest
          (\S+)                  # last bit, could be improved to (\w)=\[(.*?)\]
         /x;                     # /x lets us break all of this up, so its a bit readable
    

    Of course, you can keep on taking this to all sorts of silliness, but if you're going to start doing more specific parsing of these fields, I'd go with the initial split followed by broken-out field parsing. For example:

     my ($date, $time, ...) = split;
    
     my ($year, $month, $day)    = split(/-/, $date);
     my ($hour, $min,   $sec)    = split(/:/, $time);
     my ($from_user, $from_host) = ( $from =~ /< ([^\@]+) \@ (.*) >/x );
     ...etc...