I am using the foreign
package to read in 8 SPSS files. When they are read in some are re-encoded with UTF-8
and some with CP1252
.
In my R script I want to compare an SPSS level with a piece of text. The test fails because of the "wrong" kind of dash.
> "Not working - long term sick or disabled" == "Not working – long term sick or disabled"
[1] FALSE
> "-" == "–"
[1] FALSE
Every time I re-open the R script in R Studio I have to change the dashes back to the longer versions. Can I save the R script so that the dashes are consistent with the levels in the SPSS file text?
> getOption("encoding")
[1] "native.enc
Find out which character you are dealing with:
Unicode::as.u_char(utf8ToInt("-"))
#[1] U+002D
Unicode::as.u_char(utf8ToInt("–"))
#[1] U+2013
Then use that in your script for comparisons:
"-" == "\u002D"
#[1] TRUE
"\u2013" == "–"
#[1] TRUE