Trying to find a command that is flexible enough to allow for some variations of the string, but not other variations of it.
For instance, I am looking for audio files that have some variation of "rain" in the filename only (rains, raining, rained, rainbow, rainfall, a dark rain cloud, etc), whether at the beginning, end or middle of the filename.
However, this also includes words like "brain", "train", "grain", "drain", "Lorraine", et al, which are not wanted (basically any word that has nothing to do with the concept of rain).
Something like this fails:
find . -name '*rain*' ! -name '*brain*'| more
And I'm having no luck with even getting started on building a successful regex variant because I cannot wrap my mind around regex ... for instance, this doesn't do anything:
# this is incomplete, just a stub of where I was going
# -type f also still includes a directory name
find . -regextype findutils-default -iregex '\(*rain*\)' -type f
Any help would be greatly appreciated. If I could see a regex command that does everything I want it to do, with an explanation of each character in the command, it would help me learn more about regex with the find command in general.
edit 1:
Taking cues from all the feedback so far from jhnc and Seth Falco, I have tried this:
find . -type f | grep -Pi '(?<![a-zA-Z])rain'
I think this pretty much works (I don't think it is missing anything), my only issue with it is that it also matches on occurrences of "rain" further up the path, not only in the file name. So I get example output like this:
./Linux/path/to/radiohead - 2007 - in rainbows/09 Jigsaw Falling Into Place.mp3
Since "rain" is not in the filename itself, this is a result I'd rather not see. So I tried this:
find . -type f -printf '%f\n' | grep -Pi '(?<![a-zA-Z])rain'
That does ensure that only filenames are matched, but it also does not output the paths to the filenames, which I would still like to see, so I know where the file is.
So I guess what I really need is a PCRE (PCRE2 ?) which can take the seemingly successful look-behind method, but only apply it after the last path delimiter (/ since I am on Linux), and I am still stumped.
prefix/name
where prefix
can have one or more levels delimited by /
and name does not contain /
find -iregex
matches against entire path (-name
only matches filename)find -iregex
must match entirety of path (eg. "c" is only a partial match and does not match path "a/b/c")find
can return matches against non-files (eg. directories). Given definition 6, we would be unable to tell if name
is a directory or an ordinary file. To satisfy 2, we can exclude non-files using find
's -type f
predicate.
We can compare paths found by find
against our specification by using find
's case-insensitive regex matching predicate (-iregex
). The "grep" flavour (-regextype grep
) is sufficiently expressive.
Just using 1, a suitable regex is: rain
2+6+7 says we must forbid /
after "rain": rain[^/]*$
[/]
matches character in set (ie. /
)[^/]
: ^
inverts match: ie. character that is not /
*
matches preceding match zero or more times$
constrains preceding match to occur at end of input3+5 says there must be no immediately preceding word characters: [^a-z]rain[^/]*$
a-z
is a shortcut for the range a
to z
8 requires matching the prefix explicitly: ^.*[^a-z]rain[^/]*$
^
outside of [...]
constrains subsequent match to occur at beginning of input.
matches anything[^a-z]
matches a non-alphabeticfind . -type f -regextype grep -iregex '^.*[^a-z]rain[^/]*$'
Note: The leading ^
and trailing $
are not actually required, given 8, and could be elided.