rencodingrprofile

Encoding issue in .Rprofile at the startup of R


I use R (v3.5.1) on Windows 10. And there is a .Rprofile file in my working directory. The file contains non-ASCII letters but is saved with UTF-8 encoding. At the startup, the encoding of non-ASCII letters is distorted. For example the code:

nth <- Sys.setlocale(locale = "Lithuanian")
print("Ą Ę Ė Į Š Č Ų")

if run at the startup results in:

[1] "Ä„ Ä\230 Ä– Ä® Å  Ä\214 Ų"

My questions are:

  1. Is it possible to configure R that it sources .Rprofile with UTF-8 encoding at the startup?
  2. Is there another way to get non-ASCII letters encoded correctly at the startup?

Solution

  • Lots of possible answers:

    R will source .Rprofile using the current code page. I don't know what encoding locale "Lithuanian" implies, but if you saved the file in that encoding instead of UTF-8, it might work. (I'm not certain you can change the code page during an R session though.)

    Every now and then I see that Windows claims to have a UTF-8 code page; maybe you can get that to work.

    You could switch to a different OS that has proper UTF-8 support (Linux, MacOS, etc.) if that fails.

    Maybe you could set up two files: a pure ascii .Rprofile that sources a second file, declaring the second file to be UTF-8. For example, put this in your .Rprofile:

    source(".RprofileUTF8.R", encoding="UTF-8")
    

    However, I have to warn you I couldn't get this to work.

    You could use \uxxx escapes for the UTF-8 characters. You can find those with code like

    as.hexmode(utf8ToInt("Ą Ę Ė Į Š Č Ų"))
    

    That shows

    [1] "104" "020" "118" "020" "116" "020" "12e" "020" "160" "020" "10c" "020" "172"
    

    so an equivalent string is "\u104 \u118 \u116 \u12e \u160 \u10c \u172" and for me, putting this in the .Rprofile worked in a Windows session.