rutf-8windows-10cjksaving-data

(R) Save data (vector or dataframe) with chinese character/ UTF-8 and windows 10


I am trying to save some data downloaded from a website that includes some chinese characters. I have tried many things with no success. R studio default text encoding is set to UTF-8, windows 10 region is also set to Beta, use unicode UTF-8 for worldwide language support. Here is the code to reproduce the problem:


##package used
library(jiebaR) ##here for file_coding
library(htm2txt) ## to get the text
library(httr) ## just in case
library(readtext)

##get original text with chinese character
mytxtC <- gettxt("https://archive.li/wip/kRknx")

##print to check that chinese characters appear
mytxtC

##try to save in UTF-8
write.csv(mytxtC, "csv_mytxtC.csv", row.names = FALSE, fileEncoding = "UTF-8")

##check if it is readable
read.csv("csv_mytxtC.csv", encoding = "UTF-8")

##doesn't work, check file encoding
file_coding("csv_mytxtC.csv")
## answer: "windows-1252"

##try with txt
write(mytxtC, "txt_mytxtC.txt")
toto <- readtext("txt_mytxtC.txt")
toto[1,2]

##still not, try file_coding
file_coding("txt_mytxtC.txt")
## "windows-1252" ```

For information
``` Sys.getlocale()
[1] "LC_COLLATE=French_Switzerland.1252;LC_CTYPE=French_Switzerland.1252;LC_MONETARY=French_Switzerland.1252;LC_NUMERIC=C;LC_TIME=French_Switzerland.1252" ```


Solution

  • I changed the setLocal and it seems like it is working. I just added this line in the beginning of the code: Sys.setlocale("LC_CTYPE","chinese")

    just need to remember to change it back eventually. And still, I found it weird that this line makes it possible to use UTF-8 for saving while before it was not possible...