regexbashawkinversion

Ignore line if any non-matching character is present


I'm trying to print lines from a file if, and only if, every character in a line meets a certain regex condition.

The problem is that any line that contains any character that meets the regex condition evaluates to true and gets printed, even if it also contains characters outside that range.

I'd prefer to use awk as I already have additional conditions in place that I would like the evaluated line to meet, and would prefer the solution to implement basic regex so I can apply different matching conditions on future files (whereas the grep solution shown here focuses on non-ASCII identification and seems to require --perl-regexp compatibility -- my focus is on meeting a given regex condition across an entire given line).

In the example, uppercase letters fall outside the regex condition and therefore the whole line where they appear should be ignored.

file.txt:
abc123
123abc
123ABC
AbCdEf

When I try...

awk '$0 ~ /[a-z]/ || $0 ~ /[0-9]/' < file.txt

...every line is printed, since the regex condition is met at least once in each line:

abc123
123abc
123ABC
AbCdEf

What I want is to not print a line if any character outside the [a-z] and/or [0-9]range is present, so the desired output here would be:

abc123
123abc

The closest hits I could find when researching this are here and here, but I don't want to search-and-replace anything on the line, I just want to ignore the line and move on to the next one if any unwanted characters are present.


Solution

  • For the given sample input/output, try these:

    $ awk '!/[^a-z0-9]/' ip.txt
    abc123
    123abc
    
    $ grep -v '[^a-z0-9]' ip.txt
    abc123
    123abc
    

    The above solutions will match empty lines as well. To avoid that, you can use:

    awk '/^[a-z0-9]+$/' ip.txt
    grep -xE '[a-z0-9]+' ip.txt