linuxterminalasciicontrol-characters

Why does `^M` appear in terminal output when looking at some files?


I'm trying to send file using curl to an endpoint and save the file to the machine.

Sending curl from Linux and saving it on the machine works well, but doing the same curl from Windows is adding ^M character to every end of line.

I'm printing the file before saving it and can't see ^M. Only viewing the file on the remote machine after saving it shows me ^M.

A simple string replacement doesn't seem to work.

Why is ^M being added? How can I prevent this?


Solution

  • Quick Answer: That's a carriage return. They're a harmless but mildly irritating artifact of how Windows encodes text files. You can strip them out of your files with dos2unix. You can configure most text editors to use "Unix Line Endings" or "LF Line Endings" to prevent them from appearing in new files that you create from Windows PCs in the future.


    Long Answer (with some historical trivia):

    In a plain text file, when you create a new line (by pressing enter/return), a "line break" is embedded in the file. On Unix/Linux, this is a single character, '\n', the "line feed". On Windows, this is two sequential characters, '\r\n', the "carriage return" followed by the "line feed".

    When physical teletype terminals, which behaved much like typewriters, were still in use, the "line feed" character meant "move the paper up to the next line" and the "carriage return" character meant "slide the carriage all the way over so the typing head is on the far left". From the very beginning, nearly all teletype terminals supported implicit carriage return; i.e., triggering a line feed would automatically trigger a carriage return. The developers working on what later evolved into Windows decided that it would be best to include explicit carriage returns, just in case (for some reason) the teletype does not perform one implicitly. The Unix developers, on the other hand, chose to work with the assumption of implicit carriage return.

    The carriage return and line feed are ASCII Control Characters which means they do not have a visible representation as standalone printable characters, instead they affect the output cursor itself (in this case, the position of the output cursor).

    The "^M" you see is a stand-in representation for the carriage return character, used by programs that don't fully "cook" their output (i.e., don't apply the effects of some ASCII Control Characters). (Other control characters have other representations starting with "^", and the "^" character is also used to represent the "ctrl" keyboard key in some Unix programs like nano.)

    You can use dos2unix to convert the line endings from Windows-style to Unix-style.

    $ curl https://example.com/file_with_crlf.txt | dos2unix > file.txt
    

    On some distros, this tool is included by default, on others it can be installed via the package manager (e.g., on Ubuntu, sudo apt install dos2unix). There also exists a package, unix2dos, for the inverse.

    Most "smart" text editors for coding (Sublime, Atom, VS Code, Notepad++, etc.) will happily read and write with either Windows-style or Unix-style line endings (this might require changing some configuration options). Often, the line-endings are auto-detected by scanning the contents of a file, and usually new files are created with the Operating System's native line endings (by default). Even the new version of Notepad supports Unix-style line endings. On the other hand, some Unix tools will produce strange results in the presence of Windows-style line breaks. If your codebase will be used by people on both Unix and Windows operating systems, the nice thing to do is to use Unix-style line endings everywhere.

    Git on Windows also has an optional mode that checks out all files with Windows-style line breaks, but checks them back in with Unix-style line breaks.


    Side Notes (interesting, but not directly related to your question):

    What the carriage return actually does (on a modern virtual terminal, be it Windows or Unix) is move the output cursor to the beginning of the line. If you use the carriage return without a line feed, you can "overwrite" part of a string that has already been printed.

    $ printf "dogdog" ; printf "\rcat\n"
    catdog
    

    Some Unix programs use this to asynchronously update part of the last line of output, to implement things like a live-updating progress indicator. For example, curl, which shows download progress on stdout if the file contents are piped elsewhere.

    Also: If you had a tool that interpreted Windows-style line endings as literally as possible, and you fed it a string with Unix-style line endings such as "hello\nworld", you would get output like this:

    hello
         world
    

    Fortunately, such implementations are extremely rare and, in general, the vast majority of Windows tools can render Unix-style line-endings identically to Windows-style line endings without any problem.