So, I am using constituency data of the German Election 1994 and some observations contain strings that indicate that the value is given in a different row (based on the Scheme "siehe Wkr xxx" - "see constituency xxx"). As an example, the non employement rate in Hamburg-Altona is only collected for Hamburg in general, so the constituency Hamburg-Altona should take the value of the observation Hamburg-Mitte.
example_data <- data.frame(constituency_no = c("001", "002", "003", "004", "005"),
constituency_name = c("Hamburg-Mitte", "Hamburg-Altona", "Hamburg-Nord", "Lübeck", "Pinneberg"),
nonemployementrate = c(0.04, "siehe Wkr 001", "siehe Wkr 001", 0.03, 0.02))
So, I want a function that automatically detects if there is a string beginning with "siehe Wkr " and then replace the value of that string with the value from the constituency number referred to. So in the example I want a function that automatically replaces the value of nonemployementrate with 0.04, as the string for Hamburg-Altona and Hamburg-Nord refers to constituency_no "001".
result <- data.frame(constituency_no = c("001", "002", "003", "004", "005"),
constituency_name = c("Hamburg-Mitte", "Hamburg-Altona", "Hamburg-Nord", "Lübeck", "Pinneberg"),
nonemployementrate = c(0.04, 0.04, 0.04, 0.03, 0.02))
At the risk of overlooking something relevant.
within(example_data, {
i = startsWith(nonemployementrate, "siehe")
nonemployementrate[i] = nonemployementrate[
match(sub("\\D+", "", nonemployementrate[i]), constituency_no)]
rm(i)
})
giving
constituency_no constituency_name nonemployementrate
1 001 Hamburg-Mitte 0.04
2 002 Hamburg-Altona 0.04
3 003 Hamburg-Nord 0.04
4 004 Lübeck 0.03
5 005 Pinneberg 0.02
Edit. A simple function. (You ask for one.)
f = \(X) {
stopifnot(is.data.frame(X),
c("nonemployementrate", "constituency_no") %in% names(X))
i = startsWith(X$nonemployementrate, "siehe")
r = match(sub("\\D+", "", X$nonemployementrate[i]), X$constituency_no)
X$nonemployementrate[i] = X$nonemployementrate[r]
X
}
f(example_data)