rencodingcyrillicmodel.matrix

Cyrillic letters encoded wrong using model.matrix


I have a data frame containing cyrillic (Russian) letters in both of the column names and values that need to be transformed using model.matrix.

model.matrix transforms these variables into unicode characters such as . Is there any way to convert them back, or avoid the conversion in the first place?

library(tibble)
x <- tribble(~"тест", ~value1, ~value2,
         "тест", 5, 10,
         "тест2", 6, 11)
m <- model.matrix(value1 ~ ., data = x)

The expected result is a model.matrix containing the characters in UTF-8 as they should be.


Solution

  • The problem was solved using stringi:

    library(stringi)
    colnames(m) <- stri_unescape_unicode(gsub("<U\\+(....)>",
                                        "\\\\u\\1",
                                        colnames(m)))