I am having some trouble conducting pattern matching within a data frame. I am working with grepl
function in R.
I have a data frame of 5 local districts in two years (2001 and 2002). I want to check if the district ruling party/ies aligns with the ruling party/ies nationally.
My model should assume that there is alignment when at least one of the nationally ruling parties also rules at the district level.
My initial data looks like this
df.1 <- data.frame(district= rep(c(1000:1004), times=2),
district.party= rep(c("PartyA-PartyB", "PartyA", "PartyB", "PartyC", "PartyA-PartyC"), times=2),
year= rep(2000:2001, each=5),
national.party= rep(c("PartyA|PartyB", "PartyA"), each=5))
> df.1
district district.party year national.party
1 1000 PartyA-PartyB 2000 PartyA|PartyB
2 1001 PartyA 2000 PartyA|PartyB
3 1002 PartyB 2000 PartyA|PartyB
4 1003 PartyC 2000 PartyA|PartyB
5 1004 PartyA-PartyC 2000 PartyA|PartyB
6 1000 PartyA-PartyB 2001 PartyA
7 1001 PartyA 2001 PartyA
8 1002 PartyB 2001 PartyA
9 1003 PartyC 2001 PartyA
10 1004 PartyA-PartyC 2001 PartyA
Ideally, I want my new data frame to look like this
df.1.neat <- data.frame(district= rep(c(1000:1004), times=2),
district.party= rep(c("PartyA-PartyB", "PartyA", "PartyB", "PartyC", "PartyA-PartyC"), times=2),
year= rep(2000:2001, each=5),
national.party= rep(c("PartyA|PartyB", "PartyA"), each=5),
alignment= c("TRUE", "TRUE", "TRUE", "FALSE", "TRUE", "TRUE", "TRUE", "FALSE", "FALSE", "TRUE"))
> df.1.neat
district district.party year national.party alignment
1 1000 PartyA-PartyB 2000 PartyA|PartyB TRUE
2 1001 PartyA 2000 PartyA|PartyB TRUE
3 1002 PartyB 2000 PartyA|PartyB TRUE
4 1003 PartyC 2000 PartyA|PartyB FALSE
5 1004 PartyA-PartyC 2000 PartyA|PartyB TRUE
6 1000 PartyA-PartyB 2001 PartyA TRUE
7 1001 PartyA 2001 PartyA TRUE
8 1002 PartyB 2001 PartyA FALSE
9 1003 PartyC 2001 PartyA FALSE
10 1004 PartyA-PartyC 2001 PartyA TRUE
I am using grepl
and dplyr
df.1.neat.OP <- df.1 %>%
mutate(alignment= grepl(national.coalition, county.party))
> df.1.neat.OP
district county.party year national.coalition alignment
1 1000 PartyA-PartyB 2000 PartyA|PartyB TRUE
2 1001 PartyA 2000 PartyA|PartyB TRUE
3 1002 PartyB 2000 PartyA|PartyB TRUE
4 1003 PartyC 2000 PartyA|PartyB FALSE
5 1004 PartyA-PartyC 2000 PartyA|PartyB TRUE
6 1000 PartyA-PartyB 2001 PartyA TRUE
7 1001 PartyA 2001 PartyA TRUE
8 1002 PartyB 2001 PartyA TRUE
9 1003 PartyC 2001 PartyA FALSE
10 1004 PartyA-PartyC 2001 PartyA TRUE
Note how my command works well for the year 2000 but computes the wrong outcome for district 1002 in 2001. There are loads of mistakes like this in my wider data frame.
any suggestions?
grepl()
is not the right function for this use case. A native tidyverse
solution using stringr::str_dectect()
:
library(dplyr)
library(stringr)
df.1 <- data.frame(district = rep(c(1000:1004), times=2),
district.party = rep(c("PartyA-PartyB", "PartyA", "PartyB", "PartyC", "PartyA-PartyC"), times=2),
year = rep(2000:2001, each=5),
national.party = rep(c("PartyA|PartyB", "PartyA"), each=5))
df.1.neat <- df.1 %>%
mutate(alignment = str_detect(district.party, national.party))
df.1.neat
# district district.party year national.party alignment
# 1 1000 PartyA-PartyB 2000 PartyA|PartyB TRUE
# 2 1001 PartyA 2000 PartyA|PartyB TRUE
# 3 1002 PartyB 2000 PartyA|PartyB TRUE
# 4 1003 PartyC 2000 PartyA|PartyB FALSE
# 5 1004 PartyA-PartyC 2000 PartyA|PartyB TRUE
# 6 1000 PartyA-PartyB 2001 PartyA TRUE
# 7 1001 PartyA 2001 PartyA TRUE
# 8 1002 PartyB 2001 PartyA FALSE
# 9 1003 PartyC 2001 PartyA FALSE
# 10 1004 PartyA-PartyC 2001 PartyA TRUE
or to make grepl()
work:
df.1.neat <- df.1 |>
rowwise() |>
mutate(alignment = grepl(national.party, district.party)) |>
ungroup()