rcharacter-encodingwritexl

Can't open exported Excel file due to problematic text string


I have the following data:

structure(list(QB5B_2 = structure("Car les GAFA sont des sociétés Américaines et de plus les gouvernements qui composent l'Union Européenne ne sont pas d'accord entre elles sur la stratégie à adopter en ce qui les concerne . Exemple les Gafa payent des impots en Irlande car leurs si<ef>", label = "test", format.spss = "A255", display_width = 0L)), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))

When I look at this data in RStudios View pane, it looks like proper French text:

View(problem) shows:

enter image description here

However, when looking at the data in the console it gives me:

# A tibble: 1 x 1
  QB5B_2                                                                                                                 
  <chr>                                                                                                                  
1 "Car les GAFA sont des soci\xc3\xa9t\xc3\xa9s Am\xc3\xa9ricaines et de plus les gouvernements qui composent l'Union Eu~

So it's clear there is some character encoding problem.

Now, when I try to export the file to Excel with:

library(writexl)
write_xlsx(problem, "test.xlsx")

it does the exporting but I can't open the file in Excel and instead get an error message that a problem has been encountered. Side note: I can import the Excel file perfectly fine with e.g. readxl::read_xlsx("test.xlsx")

So two questions:


Solution

  • Something is quite strange since your input shows a double quote before the text, which usually does not happen when displaying the content of a character-column in a tibble. Look right after the "1":

    # A tibble: 1 x 1
     QB5B_2                                                                                                                 
     <chr>                                                                                                                  
    1 "Car les GAFA sont des soci\xc3\xa9t\xc3\xa9s Am\xc3\xa9ricaines et de plus les gouvernements qui composent l'Union Eu~
    

    Perhaps a solution is to reencode the variable using iconv():

    problem$QB5B_2 <- iconv(problem$QB5B_2, sub = "byte")
    problem
    # A tibble: 1 x 1
       QB5B_2                                                                                                                
       <chr>                                                                                                                 
    1 Car les GAFA sont des sociétés Américaines et de plus les gouvernements qui composent l'Union Européenne ne sont pas …
    

    Another would be to remove the first character:

    problem$QB5B_2 <- str_remove(problem$QB5B_2, pattern = "$.")
    problem
    # A tibble: 1 x 1
       QB5B_2                                                                                                                
       <chr>                                                                                                                 
    1 Car les GAFA sont des sociétés Américaines et de plus les gouvernements qui composent l'Union Européenne ne sont pas …
    

    This does not show how to avoid the issue in the first place, but it should sort you out.

    One difficulty in doing the debugging here is that dput(), which you probably used to replicate the content does not keep the problem...