regexshellglobfind-util

Filter folders whose name is a timestamp - pattern matching vs. regex matching using the find utility


I am writing a generic shell script which filters out files based on given regex.

My shell script:

files=$(find $path -name $regex)

In one of the cases (to filter), I want to filter folders inside a directory, the name of the folders are in the below format:

20161128-20:34:33:432813246
YYYYMMDD-HH:MM:SS:NS

I am unable to arrive at the correct regex.

I am able to get the path of the files inside the folder using the regex '*data.txt', as I know the name of the file inside it.

But it gives me the full path of the file, something like

/path/20161128-20:34:33:432813246/data.txt

What I want is simply:

/path/20161128-20:34:33:432813246

Please help me in identifying the correct regex for my requirement

NOTE:

I know how to process the data after

files=$(find $path -name $regex)

But since the script needs to be generic for many use cases, I only need the correct regex that needs to be passed.


Solution


  • With your folder names following a fixed-width naming scheme, a pattern would work:

    pattern='[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
    

    Of course, you can take a shortcut if you don't expect false positives:

    pattern='[0-9]*-[0-9]?:[0-9]?:[0-9]?:[0-9]*'
    

    Note how * and ?, unlike in a regex, are not duplication symbols (quantifiers) that refer to the preceding expression, but by themselves represent any sequence of characters (*) or any single character (?).

    If we put it all together:

    files=$(find "$path" -type d -name "$pattern")
    

    Optional background information:

    Below is a regex feature matrix as of GNU find v4.6.0 / BSD find as found on macOS 10.12.1:

    For cross-platform use, sticking with POSIX EREs (extended regular expressions) while using -regextype posix-extended with GNU find and using -E with BSD find is safe, but note that not all features you may expect will be supported, notably \b, \</\> and character class shortcuts such as \d.

    =================== GNU find ===================
    == REGEX FEATURE: \{\}
    TYPE: awk:                                        -
    TYPE: egrep:                                      -
    TYPE: ed:                                         ✓
    TYPE: emacs:                                      -
    TYPE: gnu-awk:                                    -
    TYPE: grep:                                       ✓
    TYPE: posix-awk:                                  -
    TYPE: posix-basic:                                ✓
    TYPE: posix-egrep:                                -
    TYPE: posix-extended:                             -
    TYPE: posix-minimal-basic:                        ✓
    TYPE: sed:                                        ✓
    == REGEX FEATURE: {}
    TYPE: awk:                                        -
    TYPE: egrep:                                      ✓
    TYPE: ed:                                         -
    TYPE: emacs:                                      -
    TYPE: gnu-awk:                                    ✓
    TYPE: grep:                                       -
    TYPE: posix-awk:                                  ✓
    TYPE: posix-basic:                                -
    TYPE: posix-egrep:                                ✓
    TYPE: posix-extended:                             ✓
    TYPE: posix-minimal-basic:                        -
    TYPE: sed:                                        -
    == REGEX FEATURE: \+
    TYPE: awk:                                        -
    TYPE: egrep:                                      -
    TYPE: ed:                                         ✓
    TYPE: emacs:                                      -
    TYPE: gnu-awk:                                    -
    TYPE: grep:                                       ✓
    TYPE: posix-awk:                                  -
    TYPE: posix-basic:                                ✓
    TYPE: posix-egrep:                                -
    TYPE: posix-extended:                             -
    TYPE: posix-minimal-basic:                        -
    TYPE: sed:                                        ✓
    == REGEX FEATURE: +
    TYPE: awk:                                        ✓
    TYPE: egrep:                                      ✓
    TYPE: ed:                                         -
    TYPE: emacs:                                      ✓
    TYPE: gnu-awk:                                    ✓
    TYPE: grep:                                       -
    TYPE: posix-awk:                                  ✓
    TYPE: posix-basic:                                -
    TYPE: posix-egrep:                                ✓
    TYPE: posix-extended:                             ✓
    TYPE: posix-minimal-basic:                        -
    TYPE: sed:                                        -
    == REGEX FEATURE: \b
    TYPE: awk:                                        -
    TYPE: egrep:                                      ✓
    TYPE: ed:                                         ✓
    TYPE: emacs:                                      ✓
    TYPE: gnu-awk:                                    ✓
    TYPE: grep:                                       ✓
    TYPE: posix-awk:                                  -
    TYPE: posix-basic:                                ✓
    TYPE: posix-egrep:                                ✓
    TYPE: posix-extended:                             ✓
    TYPE: posix-minimal-basic:                        ✓
    TYPE: sed:                                        ✓
    == REGEX FEATURE: \< \>
    TYPE: awk:                                        -
    TYPE: egrep:                                      ✓
    TYPE: ed:                                         ✓
    TYPE: emacs:                                      ✓
    TYPE: gnu-awk:                                    ✓
    TYPE: grep:                                       ✓
    TYPE: posix-awk:                                  -
    TYPE: posix-basic:                                ✓
    TYPE: posix-egrep:                                ✓
    TYPE: posix-extended:                             ✓
    TYPE: posix-minimal-basic:                        ✓
    TYPE: sed:                                        ✓
    == REGEX FEATURE: [:digit:]
    TYPE: awk:                                        ✓
    TYPE: egrep:                                      ✓
    TYPE: ed:                                         ✓
    TYPE: emacs:                                      -
    TYPE: gnu-awk:                                    ✓
    TYPE: grep:                                       ✓
    TYPE: posix-awk:                                  ✓
    TYPE: posix-basic:                                ✓
    TYPE: posix-egrep:                                ✓
    TYPE: posix-extended:                             ✓
    TYPE: posix-minimal-basic:                        ✓
    TYPE: sed:                                        ✓
    == REGEX FEATURE: \d
    TYPE: awk:                                        -
    TYPE: egrep:                                      -
    TYPE: ed:                                         -
    TYPE: emacs:                                      -
    TYPE: gnu-awk:                                    -
    TYPE: grep:                                       -
    TYPE: posix-awk:                                  -
    TYPE: posix-basic:                                -
    TYPE: posix-egrep:                                -
    TYPE: posix-extended:                             -
    TYPE: posix-minimal-basic:                        -
    TYPE: sed:                                        -
    == REGEX FEATURE: \s
    TYPE: awk:                                        ✓
    TYPE: egrep:                                      ✓
    TYPE: ed:                                         -
    TYPE: emacs:                                      ✓
    TYPE: gnu-awk:                                    ✓
    TYPE: grep:                                       -
    TYPE: posix-awk:                                  ✓
    TYPE: posix-basic:                                -
    TYPE: posix-egrep:                                ✓
    TYPE: posix-extended:                             ✓
    TYPE: posix-minimal-basic:                        -
    TYPE: sed:                                        -
    =================== BSD find ===================
    == REGEX FEATURE: \{\}
    TYPE: basic:                                      ✓
    TYPE: extended:                                   -
    == REGEX FEATURE: {}
    TYPE: basic:                                      -
    TYPE: extended:                                   ✓
    == REGEX FEATURE: \+
    TYPE: basic:                                      -
    TYPE: extended:                                   -
    == REGEX FEATURE: +
    TYPE: basic:                                      -
    TYPE: extended:                                   ✓
    == REGEX FEATURE: \b
    TYPE: basic:                                      -
    TYPE: extended:                                   -
    == REGEX FEATURE: \< \>
    TYPE: basic:                                      -
    TYPE: extended:                                   -
    == REGEX FEATURE: [:digit:]
    TYPE: basic:                                      ✓
    TYPE: extended:                                   ✓
    == REGEX FEATURE: \d
    TYPE: basic:                                      -
    TYPE: extended:                                   -
    == REGEX FEATURE: \s
    TYPE: basic:                                      -
    TYPE: extended:                                   ✓