rstringfunctionreplacedata-wrangling

Is there a R function that detects a specific string and replaces it by the value of another observation based on a number within the string?


So, I am using constituency data of the German Election 1994 and some observations contain strings that indicate that the value is given in a different row (based on the Scheme "siehe Wkr xxx" - "see constituency xxx"). As an example, the non employement rate in Hamburg-Altona is only collected for Hamburg in general, so the constituency Hamburg-Altona should take the value of the observation Hamburg-Mitte.

example_data <- data.frame(constituency_no = c("001", "002", "003", "004", "005"),
                           constituency_name = c("Hamburg-Mitte", "Hamburg-Altona", "Hamburg-Nord", "Lübeck", "Pinneberg"),
                          nonemployementrate = c(0.04, "siehe Wkr 001", "siehe Wkr 001", 0.03, 0.02))

So, I want a function that automatically detects if there is a string beginning with "siehe Wkr " and then replace the value of that string with the value from the constituency number referred to. So in the example I want a function that automatically replaces the value of nonemployementrate with 0.04, as the string for Hamburg-Altona and Hamburg-Nord refers to constituency_no "001".

result <- data.frame(constituency_no = c("001", "002", "003", "004", "005"),
                           constituency_name = c("Hamburg-Mitte", "Hamburg-Altona", "Hamburg-Nord", "Lübeck", "Pinneberg"),
                          nonemployementrate = c(0.04, 0.04, 0.04, 0.03, 0.02))

Solution

  • At the risk of overlooking something relevant.

    within(example_data, {
      i = startsWith(nonemployementrate, "siehe")
      nonemployementrate[i] = nonemployementrate[
        match(sub("\\D+", "", nonemployementrate[i]), constituency_no)]
      rm(i)
    })
    

    giving

      constituency_no constituency_name nonemployementrate
    1             001     Hamburg-Mitte               0.04
    2             002    Hamburg-Altona               0.04
    3             003      Hamburg-Nord               0.04
    4             004            Lübeck               0.03
    5             005         Pinneberg               0.02
    

    Edit. A simple function. (You ask for one.)

    f = \(X) {
      stopifnot(is.data.frame(X), 
                c("nonemployementrate", "constituency_no") %in% names(X))
      i = startsWith(X$nonemployementrate, "siehe")
      r = match(sub("\\D+", "", X$nonemployementrate[i]), X$constituency_no)
      X$nonemployementrate[i] = X$nonemployementrate[r]
      X
    }
    f(example_data)