shellibm-midrangeibm-ifs

find files in huge directory - very slow


I have a directory with files. The archive is very big and has 1.5 million pdf files inside.
the directory is stored on an IBM i server with OS V7R1 and the machine is new and very fast.
The files are named like this :

invoice_[custno]_[year']_[invoice_number].pdf  
invoice_081500_2013_7534435564.pdf    

No I try to find files with the find command using the Shell.

find  . -name 'invoice_2013_*.pdf'  -type f | ls -l > log.dat

The command took a long time so I aborted the operation with no result.

If I try it with smaller directories all works fine.

Later I want to have a job that runs everey day and finds the files created the last 24 hours but I it aleays runs so slow I can forget this.


Solution

  • That invocation would never work because ls does not read filenames from stdin.

    Possible solutions are:

    Use the find utility's built-in list option:

    find . -name 'invoice_2013_*.pdf' -type f -ls > log.dat
    

     

    Use the find utility's -exec option to execute ls -l for each matching file:

    find . -name 'invoice_2013_*.pdf' -type f -exec ls {} \; > log.dat
    

     

    Pipe the filenames to the xargs utility and let it execute ls -l with the filenames as parameters:

    find . -name 'invoice_2013_*.pdf' -type f | xargs ls -l > log.dat
    

    A pattern search of 1.5 million files in a single directory is going to be inefficient on any filesystem.