I have been given a SPSS system file that I would like to analyse using R. I am using the following magic for parsing the file into R.
library(memisc)
foo <- spss.system.file("foobar.sav")
bar <- subset(foo, select=c(var1,var2,var3))
When having a look at the parsed data, you get the following:
> bar
Data set with 379 observations and 3 variables
var1 var2 var3
1 gut weiblich Herbst
2 gut mnlich Sommer
3 gut mnlich Sommer
4 gut mnlich Winter
5 gut mnlich Fr�hling
6 gut mnlich Fr�hling
7 gut weiblich Fr�hling
.
.
.
25 gut weiblich Fr�hling
.. ........ ........... ...........
(27 of 379 observations shown)
I guess you get the idea. I am relatively sure that the .sav-file has been saved using the latin1-encoding. How can I tell spss.system.file()
to use this encoding when parsing the SPSS-file?
Thank you everyone for your help. I will be answering my own question. spss.system.file()
reads strings contained in SPSS files as-is, without any translation. The resulting strings therefore do not contain any encoding information. The memisc
package contains a function Iconv
, however, that does exactly what the Unix function iconv
would do.
> library(memisc)
> foo <- spss.system.file("foobar.sav")
> foo <- Iconv(foo,from="Latin1",to="UTF-8")
> foo <- as.data.frame(as.data.set(foo))
> head(foo$Geschlecht)
[1] weiblich männlich männlich männlich männlich männlich
Levels: männlich weiblich
All the best.