I have a data frame containing cyrillic (Russian) letters in both of the column names and values that need to be transformed using model.matrix.
model.matrix transforms these variables into unicode characters such as . Is there any way to convert them back, or avoid the conversion in the first place?
library(tibble)
x <- tribble(~"тест", ~value1, ~value2,
"тест", 5, 10,
"тест2", 6, 11)
m <- model.matrix(value1 ~ ., data = x)
The expected result is a model.matrix containing the characters in UTF-8 as they should be.
The problem was solved using stringi
:
library(stringi)
colnames(m) <- stri_unescape_unicode(gsub("<U\\+(....)>",
"\\\\u\\1",
colnames(m)))