I am trying to save some data downloaded from a website that includes some chinese characters. I have tried many things with no success. R studio default text encoding is set to UTF-8, windows 10 region is also set to Beta, use unicode UTF-8 for worldwide language support. Here is the code to reproduce the problem:
##package used
library(jiebaR) ##here for file_coding
library(htm2txt) ## to get the text
library(httr) ## just in case
library(readtext)
##get original text with chinese character
mytxtC <- gettxt("https://archive.li/wip/kRknx")
##print to check that chinese characters appear
mytxtC
##try to save in UTF-8
write.csv(mytxtC, "csv_mytxtC.csv", row.names = FALSE, fileEncoding = "UTF-8")
##check if it is readable
read.csv("csv_mytxtC.csv", encoding = "UTF-8")
##doesn't work, check file encoding
file_coding("csv_mytxtC.csv")
## answer: "windows-1252"
##try with txt
write(mytxtC, "txt_mytxtC.txt")
toto <- readtext("txt_mytxtC.txt")
toto[1,2]
##still not, try file_coding
file_coding("txt_mytxtC.txt")
## "windows-1252" ```
For information
``` Sys.getlocale()
[1] "LC_COLLATE=French_Switzerland.1252;LC_CTYPE=French_Switzerland.1252;LC_MONETARY=French_Switzerland.1252;LC_NUMERIC=C;LC_TIME=French_Switzerland.1252" ```
I changed the setLocal and it seems like it is working.
I just added this line in the beginning of the code:
Sys.setlocale("LC_CTYPE","chinese")
just need to remember to change it back eventually. And still, I found it weird that this line makes it possible to use UTF-8 for saving while before it was not possible...