[SOLVED] Control encoding when parsing SPSS file using package memisc

Control encoding when parsing SPSS file using package memisc

I have been given a SPSS system file that I would like to analyse using R. I am using the following magic for parsing the file into R.

library(memisc)
foo <- spss.system.file("foobar.sav")
bar <- subset(foo, select=c(var1,var2,var3))

When having a look at the parsed data, you get the following:

> bar
Data set with 379 observations and 3 variables

var1       var2        var3
1      gut    weiblich      Herbst
2      gut mnlich      Sommer
3      gut mnlich      Sommer
4      gut mnlich      Winter
5      gut mnlich Fr�hling
6      gut mnlich Fr�hling
7      gut    weiblich Fr�hling
.
.
.
25      gut    weiblich Fr�hling
.. ........ ........... ...........
(27 of 379 observations shown)

I guess you get the idea. I am relatively sure that the .sav-file has been saved using the latin1-encoding. How can I tell spss.system.file() to use this encoding when parsing the SPSS-file?

Solution

Thank you everyone for your help. I will be answering my own question. spss.system.file() reads strings contained in SPSS files as-is, without any translation. The resulting strings therefore do not contain any encoding information. The memisc package contains a function Iconv, however, that does exactly what the Unix function iconv would do.

> library(memisc)
> foo <- spss.system.file("foobar.sav")
> foo <- Iconv(foo,from="Latin1",to="UTF-8")
> foo <- as.data.frame(as.data.set(foo))
> head(foo$Geschlecht)
[1] weiblich männlich männlich männlich männlich männlich
Levels: männlich weiblich

All the best.