linuxbashunixgrep

Find files with non-printing characters (null bytes)


I have got the log of my application with a field that contains strange characters. I see these characters only when I use less command.

I tried to copy the result of my line of code in a text file and what I see is

CTP_OUT=^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

I'd like to know if there is a way to find these null characters. I have tried with a grep command but it didn't show anything


Solution

  • I hardly believe it, I might write an answer involving cat!

    The characters you are observing are non-printable characters which are often written in Caret notation. The Caret notation of a character is a way to visualize non-printable characters. As mentioned in the OP, ^@ is the representation of NULL.

    If your file has non-printable characters, you can visualize them using cat -vET:

    -E, --show-ends: display $ at end of each line
    -T, --show-tabs: display TAB characters as ^I
    -v, --show-nonprinting: use ^ and M- notation, except for LFD and TAB

    source: man cat

    I've added the -E and -T flag to it, to convert everything non-printable.

    As grep will not output the non-printable characters itself in any form, you have to pipe its output to cat to see them. The following example shows all lines containing non-printable characters

    Show all lines with non-printable characters:

    $ grep -E '[^[:print:]]' --color=never file | cat -vET
    

    Here, the ERE [^[:print:]] selects all non-printable characters.

    Show all lines with NULL:

    $ grep -Pa '\x00' --color=never file | cat -vET
    

    Be aware that we need to make use of the Perl regular expressions here as they understand the hexadecimal and octal notation.

    Various control characters can be written in C language style: \n matches a newline, \t a tab, \r a carriage return, \f a form feed, etc.

    More generally, \nnn, where nnn is a string of three octal digits, matches the character whose native code point is nnn. You can easily run into trouble if you don't have exactly three digits. So always use three, or since Perl 5.14, you can use \o{...} to specify any number of octal digits.

    Similarly, \xnn, where nn are hexadecimal digits, matches the character whose native ordinal is nn. Again, not using exactly two digits is a recipe for disaster, but you can use \x{...} to specify any number of hex digits.

    source: Perl 5 version 26.1 documentation

    An example:

    $ printf 'foo\012\011\011bar\014\010\012foobar\012\011\000\013\000car\012\011\011\011\012' > test.txt
    $ cat test.txt
    foo
                    bar
                       
    foobar
        
            car
    

    If we now use grep alone, we get the following:

    $ grep -Pa '\x00' --color=never test.txt
            
            car
    

    But piping it to cat allows us to visualize the control characters:

    $ grep -Pa '\x00' --color=never test.txt | cat -vET
    ^I^@^K^@car$
    

    Why --color=never: If your grep is tuned to have --color=auto or --color=always it will add extra control characters to be interpreted as color for the terminal. And this might confuse you by the content.

    $ grep -Pa '\x00' --color=always test.txt | cat -vET
    ^I^[[01;31m^[[K^@^[[m^[[K^K^[[01;31m^[[K^@^[[m^[[Kcar$