perlfilehandle

How to open a filehandle with an existing variable in perl?


In my Perl script, I would like to process lines from either STDIN or a given file, if specified, as common with Linux/UNIX command line utilities.

To this end, I have the following section in my script (simplified for the post):

use strict;
use warnings;

my $in = \*STDIN;
open $in, '<', $ARGV[0] or die if (defined $ARGV[0]);
print while (<$in>);

Essentially, I define $in to be a reference to the STDIN typeglob, so normally, if no argument is specified, the script does print for each line of <STDIN>. So far, so good.

If $ARGV[0] is defined however, I would like to read lines from that. That is what the second meaningful line purports to do. However, it seems that no lines are processed when ran with an argument.


I noticed that after my conditional call to open, $in does not change, even when I expect it to;

my $in = \*STDIN;
print $in, "\n";

open $in, '<', $ARGV[0] or die if (defined $ARGV[0]);
print $in, "\n";

yields

GLOB(0xaa08b2f4f28)
GLOB(0xaa08b2f4f28)

even when $ARGV[0] is defined. Does open not work when the first variable passed is already referring to a filehandle?

The relevant documentation does include the following

About filehandles

The first argument to open, labeled FILEHANDLE in this reference, is usually a scalar variable. (Exceptions exist, described in "Other considerations", below.) If the call to open succeeds, then the expression provided as FILEHANDLE will get assigned an open filehandle. That filehandle provides an internal reference to the specified external file, conveniently stored in a Perl variable, and ready for I/O operations such as reading and writing.

Based on this alone, I do not see why my code would not work.


Solution

  • That's precisely what the null filehandle <> does

    Input from <> comes either from standard input, or from each file listed on the command line.

    So all you need is

    while (<>) { 
        ...
    }
    

    (see the rest of what docs say about it)

    Another, in some cases safer option, is to use a double diamond bracket

    while (<<>>) { } 
    

    Using double angle brackets inside of a while causes the open to use the three argument form (with the second argument being <), so all arguments in ARGV are treated as literal filenames (including "-"). (Note that for convenience, if you use <<>> and if @ARGV is empty, it will still read from the standard input.)

    (again, please see the rest of what docs say)


    For the second part of the question, and following a discussion in comments, it is worth noting that my $in = \*STDIN creates an alias to STDIN (not a copy); see this post. Then open-ing a file with such scalar (that had previously been assigned a reference to a typeglob) as filehandle merely redirects the original typeglob. So here once we open the $in filehandle then STDIN winds up connected to that file.

    This is easily checked

    perl -wE'
        $in = \*STDIN; 
        say "\$in: $$in";                   #--> *main::STDIN
        print while <$in>;                  # type input, then Ctrl-D
        open $in, "<", $ARGV[0] or die $!; 
        say "\$in is: $$in";                #--> *main::STDIN
        print while <$in>;                  # but prints the file
        seek $in, 0, 0; 
        print while <STDIN>;                # prints the file
    ' file
    

    After we type in some input, which is printed back, and Ctrl-D, after open-ing the file the filehandle is shown to still be STDIN but it does print out that file. Then printing STDIN still prints the file.

    The STDIN has been reconnected by open to the file; getting it back isn't simple. So if one is to actually associate STDIN with a lexical then better dupe it. See docs and the linked post.


    As for the direct question -- yes, one can reassign a filehandle by open-ing it.

    But the ... or die if ... syntax is wrong as one cannot chain conditionals like that.

    However, I cannot reproduce the shown behavior as your code actually works for me (on 5.16 and 5.30 on Linux). My best guess then is that such code results in an "undefined behavior" and we get unpredictable and inconsistent behaviors.

    Consider

    E1 or E2 if E3;
    

    where Es stand for Expressions. (This is for open(...) or die($!) if COND;)

    What should if E3 apply to -- the lone E2 or the whole E1 or E2? There is no way to tell and what one may well get then is the dreaded "undefined behavior" (UB) -- it may actually work, sometimes/under some conditions/on some systems, or anything else may happen.

    Now, there may be a little more to it: E2 if E3 cannot be a part of a condition so the interpretation of it all as E1 or (E2 if E3); is directly illegal syntax so perhaps in my program the statement is interpreted as

    (E1 or E2) if E3;
    

    which is fine (and works as intended, as it happens). However, the original statement still must be UB and on OP's system it doesn't work.

    Thus if you do need to have a filehandle at a minimum can fix that by adding parenthesis

    (open $in, '<', $ARGV[0] or die $!) if defined $ARGV[0];
    

    But I'd recommend writing a nice and readable test instead of cramming it into one statement (and dup-ing STDIN to start with).