byte

Byte representation of a text file


I heard that everything on the hard disk are stored as blocks of bytes. If that is the case, what would be the byte representation of a text file?

For example, I have this text.txt on my Mac computer

Hello,
World!

I wonder what the corresponding bytes look like? Should I expect every character translated to its ASCII code? Where on my machine can I find the bytes? It would be nice to have something pre-installed on a typical Mac/linux to view the binary/hex representation of the text file.


Solution

  • The character encoding used when writing a text file is between you and your text editor. Almost certainly not ASCII; Probably, the UTF-8 character of the Unicode character set. Only you will know because that metadata is not saved with the file.

    When reading, a text editor will guess but you should be able to correct it. Other programs either allow you to tell them via a command-line argument or document which you must give it.

    This effectively makes text files useless for casual users.

    To view bytes of a file in hexadecimal:

    xxd -g1 filepath
    

    The file system will store the name, location and size of a file. Programs reading files will stop at the end of the file rather than read all the allocated disk blocks. Also note that the file system doesn't store whether a file is a text file or not. Again, only you know that.