rtime-serieslead

How to make an incumbent variable in R


I have this data-set with electoral candidate and I want to make a new variable that indicates if the previous winner of a riding is running again in the same riding. In other words, I want a incumbent variable that takes the value of 1 if the winner of the previous election is running again in the next election and the value of 0 if the candidate in the next election is either new or has not won the previous election. Thus in the sample of my data-set, DICKSON, JOE in AJAX-PICKERING has won election_id 3900 and is back in the election 4000 and thus should be marked as an incumbent. Furthermore, in OSHAWA, there is never an incumbent so the value of 0 is given to all the candidates.

Here is a sample of my data-set

structure(list(full_name = c("WILLERT, CECILE", "CARVALHO, ANDREW", 
"ASHE, KEVIN", "DICKSON, JOE", "THAVARAJASOORIER, BALA", "DELIS, ANDREW", 
"TOMAN, STEVEN", "MCCARTHY, TODD", "DICKSON, JOE", "WISEMAN, EVAN", 
"NARRAWAY, ADAM", "MCCARTHY, TODD", "DICKSON, JOE", "STEWART, KYLE", 
"KING, JERMAINE", "RHODES, BRENDA", "HALL, SARA", "RICHTER, MATT", 
"MILLER, NORM", "WATERS, CINDY", "RICHTER, MATT", "MILLER, NORM", 
"ZYGANIUK, ALEX", "RICHTER, MATT", "MILLER, NORM", "WATERS, DAN", 
"MOBBLEY, CLYDE", "STIVRINS, ANDY", "KEMP, ALEXANDER", "STREUTKER, JEFFREY", 
"OUELLETTE, JERRY", "RYAN, SID", "MENEZES, JACQUIE", "LEADBETTER, STACEY", 
"BELANGER, MATTHEW", "SHIELDS, MIKE", "FUDGE, BEN", "SMIT, BECKY", 
"FRENCH, JENNIFER"), gender = c("female", "male", "male", "male", 
"male", "male", "male", "male", "male", "male", "male", "male", 
"male", "male", "male", "female", "female", "male", "male", "female", 
"male", "male", "male", "male", "male", "male", "male", "male", 
"male", "male", "male", "male", "female", "female", "male", "male", 
"male", "female", "female"), gender_manual = c("", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", ""), gender_probability = c(1, 1, 1, 0.99, 0.92, 1, 1, 
1, 0.99, 0.97, 1, 1, 0.99, 0.99, 0.98, 0.99, 0.99, 1, 1, 1, 1, 
1, 0.87, 1, 1, 0.98, 0.97, 0.95, 1, 1, 0.98, 0.92, 1, 0.98, 1, 
1, 0.99, 1, 1), gender_count = c(427L, 5168L, 5362L, 3679L, 37L, 
5168L, 2600L, 847L, 3679L, 657L, 3957L, 847L, 3679L, 1944L, 80L, 
1816L, 4435L, 4915L, 60L, 2477L, 4915L, 60L, 5856L, 4915L, 60L, 
3240L, 36L, 3139L, 1645L, 932L, 1031L, 71L, 78L, 1034L, 3338L, 
5595L, 3363L, 1447L, 6717L), ballots = c(3067L, 368L, 13898L, 
19857L, 3275L, 299L, 843L, 14718L, 19606L, 5952L, 1589L, 14999L, 
26257L, 301L, 8274L, 9819L, 5015L, 4557L, 17348L, 6537L, 3251L, 
19417L, 6527L, 7484L, 15761L, 10158L, 4999L, 296L, 2474L, 253L, 
15977L, 13482L, 6921L, 1035L, 435L, 14316L, 147L, 1785L, 22232L
), election_date = c("2007-10-10", "2007-10-10", "2007-10-10", 
"2007-10-10", "2007-10-10", "2011-10-06", "2011-10-06", "2011-10-06", 
"2011-10-06", "2011-10-06", "2014-06-12", "2014-06-12", "2014-06-12", 
"2014-06-12", "2014-06-12", "2007-10-10", "2007-10-10", "2007-10-10", 
"2007-10-10", "2011-10-06", "2011-10-06", "2011-10-06", "2011-10-06", 
"2014-06-12", "2014-06-12", "2014-06-12", "2014-06-12", "2014-06-12", 
"2007-10-10", "2007-10-10", "2007-10-10", "2007-10-10", "2011-10-06", 
"2011-10-06", "2011-10-06", "2011-10-06", "2011-10-06", "2014-06-12", 
"2014-06-12"), election_id = c(3900L, 3900L, 3900L, 3900L, 3900L, 
4000L, 4000L, 4000L, 4000L, 4000L, 4100L, 4100L, 4100L, 4100L, 
4100L, 3900L, 3900L, 3900L, 3900L, 4000L, 4000L, 4000L, 4000L, 
4100L, 4100L, 4100L, 4100L, 4100L, 3900L, 3900L, 3900L, 3900L, 
4000L, 4000L, 4000L, 4000L, 4000L, 4100L, 4100L), party = c("GREEN", 
"FAMILY COALITION PARTY OF ONTARIO", "PROGRESSIVE CONSERVATIVE", 
"LIBERAL", "NEW DEMOCRATIC", "ONTARIO LIBERTARIAN PARTY", "GREEN", 
"PROGRESSIVE CONSERVATIVE", "LIBERAL", "NEW DEMOCRATIC", "THE GREEN PARTY OF ONTARIO", 
"PROGRESSIVE CONSERVATIVE PARTY OF ONTARIO", "ONTARIO LIBERAL PARTY", 
"ONTARIO LIBERTARIAN PARTY", "NEW DEMOCRATIC PARTY OF ONTARIO", 
"LIBERAL", "NEW DEMOCRATIC", "GREEN", "PROGRESSIVE CONSERVATIVE", 
"LIBERAL", "GREEN", "PROGRESSIVE CONSERVATIVE", "NEW DEMOCRATIC", 
"THE GREEN PARTY OF ONTARIO", "PROGRESSIVE CONSERVATIVE PARTY OF ONTARIO", 
"ONTARIO LIBERAL PARTY", "NEW DEMOCRATIC PARTY OF ONTARIO", "FREEDOM", 
"GREEN", "FAMILY COALITION PARTY OF ONTARIO", "PROGRESSIVE CONSERVATIVE", 
"NEW DEMOCRATIC", "LIBERAL", "GREEN", "ONTARIO LIBERTARIAN PARTY", 
"NEW DEMOCRATIC", "FREEDOM", "THE GREEN PARTY OF ONTARIO", "NEW DEMOCRATIC PARTY OF ONTARIO"
), party_code = c("", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", ""), party_manual = c("", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", ""), riding = c("AJAX-PICKERING", "AJAX-PICKERING", 
"AJAX-PICKERING", "AJAX-PICKERING", "AJAX-PICKERING", "AJAX-PICKERING", 
"AJAX-PICKERING", "AJAX-PICKERING", "AJAX-PICKERING", "AJAX-PICKERING", 
"AJAX-PICKERING", "AJAX-PICKERING", "AJAX-PICKERING", "AJAX-PICKERING", 
"AJAX-PICKERING", "PARRY SOUND-MUSKOKA", "PARRY SOUND-MUSKOKA", 
"PARRY SOUND-MUSKOKA", "PARRY SOUND-MUSKOKA", "PARRY SOUND-MUSKOKA", 
"PARRY SOUND-MUSKOKA", "PARRY SOUND-MUSKOKA", "PARRY SOUND-MUSKOKA", 
"PARRY SOUND-MUSKOKA", "PARRY SOUND-MUSKOKA", "PARRY SOUND-MUSKOKA", 
"PARRY SOUND-MUSKOKA", "PARRY SOUND-MUSKOKA", "OSHAWA", "OSHAWA", 
"OSHAWA", "OSHAWA", "OSHAWA", "OSHAWA", "OSHAWA", "OSHAWA", "OSHAWA", 
"OSHAWA", "OSHAWA"), riding_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 69L, 69L, 69L, 69L, 69L, 69L, 
69L, 69L, 69L, 69L, 69L, 69L, 69L, 61L, 61L, 61L, 61L, 61L, 61L, 
61L, 61L, 61L, 61L, 61L), vote_share = c(7.57938959594711, 0.909427900654887, 
34.345730878537, 49.0720375633263, 8.09341406153466, 0.721908349026993, 
2.03534695060119, 35.5352745183254, 47.3369066589406, 14.3705635231059, 
3.09023726176585, 29.1695838195255, 51.0637884091793, 0.5853753403345, 
16.0910151691949, 26.7263670758594, 13.6503443207491, 12.403712675903, 
47.2195759274885, 18.2945259151461, 9.09828724952424, 54.3406470390686, 
18.2665397962611, 19.3395007493927, 40.7282030079074, 26.2494185746033, 
12.9179802573776, 0.7648974107189, 7.68657180140434, 0.786056049213944, 
49.6395948549059, 41.8877772944759, 30.2835389866107, 4.52874770280914, 
1.90338671567341, 62.6411131530585, 0.643213441848254, 7.43223549985427, 
92.5677645001457), won = c(FALSE, FALSE, FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, 
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE), winner = c(0, 
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 
0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1)), row.names = c(NA, 
-39L), groups = structure(list(election_id = c(3900L, 3900L, 
3900L, 4000L, 4000L, 4000L, 4100L, 4100L, 4100L), riding_id = c(1L, 
61L, 69L, 1L, 61L, 69L, 1L, 61L, 69L), .rows = structure(list(
    1:5, 29:32, 16:19, 6:10, 33:37, 20:23, 11:15, 38:39, 24:28), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame")) 

Thanks!


Solution

  • I think this is what you're after. I tried to select only relevant columns to verify the result (next time please share only relevant columns to focus on the problem!)

    df %>%
      group_by(riding_id, full_name) %>%
      mutate(incumbent = lag(winner, default = 0, order_by = election_date)) %>%
      ungroup() %>%
      ## delete this last select after verifying this works
      select(riding, election_date, full_name, winner, incumbent)
    # # A tibble: 39 × 5
    #    riding         election_date full_name              winner incumbent
    #    <chr>          <chr>         <chr>                   <dbl>     <dbl>
    #  1 AJAX-PICKERING 2007-10-10    WILLERT, CECILE             0         0
    #  2 AJAX-PICKERING 2007-10-10    CARVALHO, ANDREW            0         0
    #  3 AJAX-PICKERING 2007-10-10    ASHE, KEVIN                 0         0
    #  4 AJAX-PICKERING 2007-10-10    DICKSON, JOE                1         0
    #  5 AJAX-PICKERING 2007-10-10    THAVARAJASOORIER, BALA      0         0
    #  6 AJAX-PICKERING 2011-10-06    DELIS, ANDREW               0         0
    #  7 AJAX-PICKERING 2011-10-06    TOMAN, STEVEN               0         0
    #  8 AJAX-PICKERING 2011-10-06    MCCARTHY, TODD              0         0
    #  9 AJAX-PICKERING 2011-10-06    DICKSON, JOE                1         1
    # 10 AJAX-PICKERING 2011-10-06    WISEMAN, EVAN               0         0
    # # … with 29 more rows
    

    This is checking if the candidate one the previous election in the same riding that they were a candidate in. There's a potential bug if a candidate won, then skipped an election, and then ran again in the same riding--the above code would flag them as an incumbent because they won the previous time they ran, even though that was not the previous election. Correcting for that will be trickier - let me know if that's a concern.