linuxfindhuman-readable

Finding human-readable files on Unix


I'd like to find human-readable files on my Linux machine without a file extension constraint. Those files should be readable by humans using a text editor, for example: text, configuration, HTML, and source code files.

Is there a way to filter and locate them?


Solution

  • find and file are your friends here:

    find /dir/to/search -type f -exec sh -c 'file -b {} | grep text &>/dev/null' \; -print
    

    This will find any files (NOTE: it will not find symlinks directories sockets, etc., only regular files) in /dir/to/search and run sh -c 'file -b {} | grep text &>/dev/null' ; which looks at the type of file and looks for text in the description. If this returns true (i.e., text is in the line) then it prints the filename.

    NOTE: using the -b flag to file means that the filename is not printed and therefore cannot create any issues with the grep. E.g., without the -b flag the binary file gettext would erroneously be detected as a textfile.

    For example,

    root@osdevel-pete# find /bin -exec sh -c 'file -b {} |  grep text &>/dev/null' \; -print
    /bin/gunzip
    /bin/svnshell.sh
    /bin/unicode_stop
    /bin/unicode_start
    /bin/zcat
    /bin/redhat_lsb_init
    root@osdevel-pete# find /bin -type f -name *text*
    /bin/gettext
    

    If you want to look in compressed files use the --uncompress flag to file. For more information and flags to file see man file.