emacsnewlineeoldos2unix

Why can't emacs elisp function 'search-forward' find carriage return characters anymore?


This code for dos2unix conversion in an emacs buffer stopped working at some point. I've been using it since the 90's (emacs 19) and I don't know exactly when it stopped working.

;;; Dos to Unix conversion in buffer
(defun dos2unix ()
  "convert <CR><LF> characters in buffer to <LF>."
  (interactive)
  (save-excursion
    (goto-char (point-min))
    (while (search-forward "\015\012" nil t)
      (replace-match "\012" nil t))))

With windows emacs 27.2 or 25.3, in a file with known DOS line terminators (0x0d 0x0a), it simply doesn't find the carriage return characters.

I've tried searching separately for the carriage return and line feed characters. It finds the line feed characters just fine, but not the carriage returns. I've tried other syntax variations: \015, \x0d, \r, \C-m, (string ?\C-m) and it just doesn't find them. I've also tried using re-search-forward or search-forward-regexp in place of search-forward.


Solution

  • If file in general has CRLF line endings, then Emacs recognises that and seamlessly uses that format without displaying the CR characters. Similarly if there are only CRs and no LFs it will recognise that as the old MacOS format, and seamlessly display the file as if LFs were used, rather than as one long line.

    The mode line shows the recognised syntax. In your case you will see (DOS) in the bottom left of the mode line. If you click that with your mouse, it will cycle through the three options (with the typical Unix style being displayed as just : for brevity).

    If you switch to the Unix line endings the buffer will be modified as the CRs will be removed (which you can confirm via M-x diff-buffer-with-file), so you can then just save the buffer.

    Clicking the mode line like that is calling this command:

    (defun mode-line-change-eol (event)
      "Cycle through the various possible kinds of end-of-line styles."
      (interactive "e")
      (with-selected-window (posn-window (event-start event))
        (let ((eol (coding-system-eol-type buffer-file-coding-system)))
          (set-buffer-file-coding-system
           (cond ((eq eol 0) 'dos) ((eq eol 1) 'mac) (t 'unix))))))
    

    Hence code to do this directly would be:

    (set-buffer-file-coding-system 'unix)
    

    Note that if the file contents are ambiguous about line-endings, then the inconsistencies will be visible and subject to your original code.

    For instance if there was one case of CRLF in an otherwise Unix-like file, you would see that CR and your function would be able to replace it.

    Hence you might want to update your function by firstly calling set-buffer-file-coding-system and then continuing as before.

    See C-hig (emacs)Text Coding for more information.

    Edit: Per the comments, any call to that function marks the buffer as modified, whereas you might not actually be making a change, so we want to do this conditionally:

    (unless (eq 0 (coding-system-eol-type buffer-file-coding-system))
      (set-buffer-file-coding-system 'unix))