rlinuxencodingutf-8rdata

Reading Rdata file with different encoding


I have an .RData file to read on my Linux (UTF-8) machine, but I know the file is in Latin1 because I've created them myself on Windows. Unfortunately, I don't have access to the original files or a Windows machine and I need to read those files on my Linux machine.

To read an Rdata file, the normal procedure is to run load("file.Rdata"). Functions such as read.csv have an encoding argument that you can use to solve those kind of issues, but load has no such thing. If I try load("file.Rdata", encoding = latin1), I just get this (expected) error:

Error in load("file.Rdata", encoding = "latin1") : unused argument (encoding = "latin1")

What else can I do? My files are loaded with text variables containing accents that get corrupted when opened in an UTF-8 environment.


Solution

  • Thanks to 42's comment, I've managed to write a function to recode the file:

    fix.encoding <- function(df, originalEncoding = "latin1") {
      numCols <- ncol(df)
      for (col in 1:numCols) Encoding(df[, col]) <- originalEncoding
      return(df)
    }
    

    The meat here is the command Encoding(df[, col]) <- "latin1", which takes column col of dataframe df and converts it to latin1 format. Unfortunately, Encoding only takes column objects as input, so I had to create a function to sweep all columns of a dataframe object and apply the transformation.

    Of course, if your problem is in just a couple of columns, you're better off just applying the Encoding to those columns instead of the whole dataframe (you can modify the function above to take a set of columns as input). Also, if you're facing the inverse problem, i.e. reading an R object created in Linux or Mac OS into Windows, you should use originalEncoding = "UTF-8".