I'm working in R with a dataframe containing a column with escaped unicode sequences:
d <- data.frame(id = 1, norm = "m\u0350pini\u0306\u030Ds\u0313u")
I ultimately need to import the dataframe into MongoDB (I'm using Compass) so that the corresponding unicode characters are correctly displayed. I tried saving it to a simple tab delimited text file, but MongoDB treats the unicode column as a string; so I tried saving it as a json
:
library(jsonlite)
j <- tojson(d,dataframe="rows",pretty=T)
write(j,"jstest.json")
However, this automatically adds another backslash to the escaped sequences, giving m\\u0350pini\\u0306\\u030Ds\\u0313u
, which again MongoDB does not interpret as unicode.
If I insert a document into MongoDB manually which single backslashes, the unicode symbols appear, but this is very impractical for me (thousands of documents).
What am I doing wrong?
Thanks for the help.
I tried using mongoimport
:
d <- data.frame(id=c(1,2),unicode=c("m\u0350pini\u030Ds\u03131u","a\u0350mpi\u030D\u03B7"))
js <- toJSON(d,dataframe = "rows",pretty = T)
write(js,"jstest.json")
mongoimport -d test -c newcoll --type json --file jtest.json --jsonArray
However, the documents still don't display the characters:
{ "_id" : ObjectId("666b08603505c9daeb20edc7"), "id" : 1, "unicode" : "m\\u0350pini\\u030Ds\\u03131u" }
{ "_id" : ObjectId("666b08603505c9daeb20edc8"), "id" : 2, "unicode" : "a\\u0350mpi\\u030D\\u03B7" }
The only way I can get the result I want, which is exactly like @Konrad Rudolph 's second comment, is if I manually insert a document with single slashes.
Inside R, export the data to TSV (e.g. via ‘readr’):
readr::write_tsv(d, 'd.tsv')
Use mongoimport
to import the data:
mongoimport --db mydb --collection mycollection --type tsv --file d.tsv --headerline