perlparsingbufferstdoutautoflush

Avoid buffering when parsing stdout with Perl


I want to parse the output of an external program (some shell command) line by line using Perl. The command runs continuously, so I put it into a thread and use shared variables to communicate with my main routine.

Up to now my code looks similar to

#!/usr/bin/perl

use warnings;
use strict;
use threads;
use threads::shared;

my $var :shared; $var="";

threads->create(
    sub {
        # command writes to stdout each ~100ms
        my $cmd = "<long running command> |";
        open(README, $cmd) or die "Can't run program: $!\n";
        while(<README>) {
            my $line = $_;
            # extract some information from line
            $var = <some value>;
            print "Debug\n";
        }
        close(README);
    }
);

while(1) {
    # evaluate variable each ~second
    print "$var\n";
    sleep 1;
}

For some commands this works perfectly fine and the lines are processed just as they come in. Output would be similar to:

...
Debug
Debug
...
<value 1>
...
Debug
Debug
...
<value 2>
...

However, for other commands, this behaves strange and the lines are being processed block wise. So $var doesn't get updated and Debug is not printed either for some time. Then, the suddenly the output is (similar to):

...
<value 1>
<value 1>
<value 1>
...
Debug
Debug
Debug
...
<value 20>

and $var is set to the last/current value. Then this repeats. The parsing is always delayed and done in blocks while $var is not updated in between.

First of all: Is there any better/propper way to parse the output of an external program (line by line!) besides using the pipe?

If not, how can I avoid this behaviour?

I've read, that using autoflush(1); or $|=1; might be a solution but only for the "currently selected output channel". How would I use that in my context?

Thank you in advance.


Solution

  • Thanks to ikegami and Calle Dybedahl I found the following solution for my problem:

    #!/usr/bin/perl
    
    use warnings;
    use strict;
    use threads;
    use threads::shared;
    use sigtrap qw(handler exit_safely normal-signals stack-trace error-signals);
    use IPC::Run qw(finish pump start);
    
    # define shared variable
    my $var :shared; $var="";
    
    # define long running command
    my @cmd = ('<long running command>','with','arguments');
    my $in = '';
    my $out = '';
    # start harness
    my $h = start \@cmd, '<pty<', \$in, '>pty>', \$out;
    
    # create thread
    my $thr = threads->create(
        sub {
            while (1) {
                # pump harness
                $h->pump;
                # extract some information from $out
                $var = <some value>;
                # empty output
                $out = '';
            }
        }
    );
    
    while(1) {
        # evaluate variable each ~second
        print "$var\n";
        sleep 1;
    }
    
    sub exit_safely {
        my ($sig) = @_;
        print "Caught SIG $sig\n";
        # harness has to be killed, otherwise
        # it will continue to run in background
        $h->kill_kill;
        $thr->join();
        exit(0);
    }
    
    exit(0);