rcsvencodingutf-8lf

How to save a dataframe as a .csv file with UTF-8 encoding and LF line ending in R using Rstudio?


I came across this weird situation:

I need to save a dataframe to a .csv file UTF-8 and with a LF ending. I'm using the latest version of R and Rstudio on a Windows 10 machine.

My first attempt was to do naively:

write.csv(df, fileEncoding="UTF-8", eol="\n")

checking with Notepad++, it appears the encoding is UTF-8, however the line ending is CRLF and not LF. Ok, let's double check with Notepad: surprise, surprise, the encoding, according to Notepad, is ANSI. At this point I'm confused.

After looking at the docs for the function write.csv I read that:

CSV files do not record an encoding

I'm not an expert on the topic, so I decide to revert back and save the file as a simple .txt using write.table as follows:

write.table(df, fileEncoding="UTF-8", eol="\n")

again, the same result as above. No changes whatsoever. I tried the combinations

write.csv(df)
write.table(df)

without specified encodings but no change. Then I set the default encoding in Rstudio to be UTF-8 and LF line ending (as in the picture below)

enter image description here

and ran the tests again. No change. What am I missing??


Solution

  • This is an odd one, at least for me. Nonetheless, by reading the docs of write.table I found the solution. Apparently on Windows, to save files Unix-style you have to open a binary connection to a file and then save the file using the desired eol:

    f <- file("filename.csv", "wb")
    write.csv(df, file=f, eol="\n")
    close(f)
    

    As far as the UTF-8 format is concerned, global settings should work fine.

    Check that the eol is LF using Notepad++. UTF-8 is harder to check since on Linux isutf8 (from moreutils) says files are indeed UTF-8 but Windows' Notepad disagrees when saving and says they are ANSI.