regexperlpcre

How can I use perl to delete files matching a regex


Due to a Makefile mistake, I have some fake files in my git repo...

$ ls
=0.1.1                  =4.8.0                  LICENSE
=0.5.3                  =5.2.0                  Makefile
=0.6.1                  =7.1.0                  pyproject.toml
=0.6.1,                 all_commands.txt        README_git_workflow.md
=0.8.1                  CHANGES.md              README.md
=1.2.0                  ciscoconfparse/         requirements.txt
=1.7.0                  configs/                sphinx-doc/
=2.0                    CONTRIBUTING.md         tests/
=2.2.0                  deploy_docs.py          tutorial/
=22.2.0                 dev_tools/              utils/
=22.8.0                 do.py
=2.7.0                  examples/
$

I tried this, but it seems that there may be some more efficient means to accomplish this task...

# glob "*" will list all files globbed against "*"
foreach my $filename (grep { /\W\d+\.\d+/ } glob "*") {
    my $cmd1 = "rm $filename";
    `$cmd1`;
}

Question:


Solution

  • Fetch a wider set of files and then filter through whatever you want

    my @files_to_del = grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "$dir/*"; 
    

    I added an anchor (^) so that the regex can only match a string that begins with that pattern, otherwise this can blow away files other than intended. Reconsider what exactly you need.

    Altogether perhaps (or see a one-liner below )

    use warnings;
    use strict;
    use feature 'say';
    
    use File::Glob ':bsd_glob';  # for better glob()
    use Cwd qw(cwd);             # current-working-directory
    
    my $dir = shift // cwd;      # cwd by default, or from input 
    
    my $re = qr/^\W[0-9]+\.[0-9]+/;  
    
    my @files_to_del = grep { /$re/ and not -d } glob "$dir/*"; 
    
    say for @files_to_del;  # please inspect first
    
    #unlink or warn "Can't unlink $_: $!" for @files_to_del;
    

    where that * in glob might as well have some pre-selection, if suitable. In particular, if the = is a literal character (and not an indicator printed by the shell, see footnote) then glob "=*" will fetch files starting with it, and then you can pass those through a grep filter.

    I exclude directories, identified by -d filetest, since we are looking for files (and to not mix with some scary language about directories from unlink, thanks to brian d foy comment).

    If you'd need to scan subdirectories and do the same with them, perhaps recursively -- what doesn't seem to be the case here? -- then we could employ this logic in File::Find::find (or File::Find::Rule, or yet others).

    Or read the directory any other way (opendir+readdir, libraries like Path::Tiny), and filter.


    Or, a quick one-liner ... print (to inspect) what's about to get blown away

    perl -wE'say for grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "*"'
    

    and then delete 'em

    perl -wE'unlink or warn "$_: $!" for grep /^\W[0-9]+\.[0-9]+/ && !-d, glob "*"'
    

    (I switched to a more compact syntax just so. Not necessary)

    If you'd like to be able to pass a directory to it (optionally, or work in the current one) then do

    perl -wE'$d = shift//q(.); ...'  dirpath (relative path fine. optional)
    

    and then use glob "$d/*" in the code. This works the same way as in the script above -- shift pulls the first element from @ARGV, if anything was passed to the script on the command line, or if @ARGV is empty it returns undef and then // (defined-or) operator picks up the string q(.).


    That leading = may be an "indicator" of a file type if ls has been aliased with ls -F, what can be checked by running ls with suppressed aliases, one way being \ls (or check alias ls).

    If that is so, the = stands for it being a socket, what in Perl can be tested for by the -S filetest.

    Then that \W in the proposed regex may need to be changed to \W? to allow for no non-word characters preceding a digit, along with a test for a socket. Like

    my $re = qr/^\W? [0-9]+ \. [0-9]+/x;
    
    my @files_to_del = grep { /$re/ and -S } glob "$dir/*";