linuxmacosbashfat32hfs+

Bash memory "leak" when recursing directories


I'm currently trying to write a script that does some post processing after rsync --max-size=4000000000 has done its job, to allow for full backups to FAT32 (which is the only filesystem that is r/w on all of Windows/Mac/*nix)

I am writing in bash for Mac OS X and Linux; currently testing on OS X. Code is here

https://github.com/taikedz/fullsync32/blob/master/fullsync32

The script recurses through directories finding

and upon finding such files either processes them via tar -cz or split as appropriate, before copying them over.

I use recursion instead of the find utility because of the test for presence of resource fork on a file: it involves checking the size of a special file. Say you have file foo.txt; its resource fork can be found by looking at ls -l foo.txt/..namedfork/rsrc and cheking the length is non-zero.

The basic structure is

recurse() {
  pushd "$1"
    for NODE in *; do
      if [ -d "$NODE" ]; then
        recurse "$NODE"
        continue
      fi
      # (process files here, with calls to split, tar and md5)
    done
  popd
}

recurse ./target/directory

Problem

I ran this against my backups the other day and left it running for a few hours. When I came back I found that my spare 11 GB of RAM had been used up, and it was ~248 MB into swap...

I looked around on Google for issues around bash memory leakage in recursion, and apart from a few tenuously answered forum posts, didn't find much...

The other add result of which (which is Mac specific) is that the "Inactive memory" stays inactive and the system runs slowly... Restart required.

Questions

You inputs are much appreciated!


Solution

  • Is such potentially deep recursion with bash a bad idea in itself?

    Bash is not meant for recursion, but has no problems recursing to a few thousand levels which is more than sufficient for recursing through a filesystem.

    However, Bash, like all languages, are unable to do non-tail recursion to infinite depths like you risk by foregoing find's proven cycle detection.

    is there an ingenious way to iterate rather than recurse in this situation?

    You can iterate over find output:

    find "$1" -print0 | while IFS= read -d '' -r filename
    do
      echo "Operating on $filename"
    done
    

    how to use find to perform the test

    You can run arbitrary external tests with -exec, here invoking bash:

    find / -exec bash -c '[[ -s "$1/..namedfork/rsrc" ]]' _ {} \; -print