rdataframe

Making five data frames of different lengths all the same length based on same values in one column across data frames


I have five data frames in R, each with annual crime data for the years 2018-2022. There are 40 columns and thousands of observations. Each observation is one county in the U.S. 39 columns have crime counts for their respective county (different crime types), and 1 column has a 5-digit "FIPS" code for the county. Each data frame is of a different length, indicating that the same set of counties did not submit crime data for each year.

I need to determine which observations/counties submitted data for all five years, and filter each of the data frames to include only those observations. I would, in short, like each data frame to have only the observations with FIPS codes that are contained in all of the other data frames. I do not want to merge these data frames.

Thanks for any advice.

I tried setdiff(df1$FIPS , df2$FIPS), and then subset(df1, FIPS!= . . .) but with five data frames and dozens of relevant FIPS codes, this brute force method proved unworkable.

I tried filter(ifelse()), but this didn't seem to be a solution.

I tried intersect(), but that requires that values in every column be identical, not just one column. Is there an intersect() type of function but keyed to just one value in one column, not every value in every column? Or some other solution?

I also used this code for the 2022 data frame, which is not efficient at all:

CrimeAll <- merge(Crime2022 , Crime2021 , by = "FIPS")

CrimeAll2 <- merge(CrimeAll , Crime2020 , by = "FIPS")

CrimeAll3 <- merge(CrimeAll2 , Crime2019 , by ="FIPS")

CrimeAll4 <- merge(CrimeAll3 , Crime2018 , by = "FIPS")

setdiff(Crime2022$FIPS , CrimeAll4$FIPS)

Crime2022_1 <- subset(Crime2022 , FIPS!="01003" & FIPS!="01009" & FIPS!="01015" & FIPS!="01021"& FIPS!= "01027"& FIPS!= "01039"& FIPS!= "01045" & FIPS!="01049"& FIPS!= "01051" & FIPS!="01053"& FIPS!= "01055" & FIPS!="01059" & FIPS!="01069" & FIPS!="01071" & FIPS!="01081" & FIPS!="01083"& FIPS!= "01089" & FIPS!="01095" & FIPS!="01109" & FIPS!="01115" & FIPS!="01117" & FIPS!="01119" & FIPS!="01121" & FIPS!="01123" & FIPS!="01125" & FIPS!="01127" & FIPS!="01133" & FIPS!="04003"& FIPS!= "05059"& FIPS!= "05099" & FIPS!="08061"& FIPS!= "16073"& FIPS!= "18039" & FIPS!="18057" & FIPS!="19003" & FIPS!="19005" & FIPS!="19011" & FIPS!="19043" & FIPS!="19051" & FIPS!="19069"& FIPS!= "19091" & FIPS!="19107" & FIPS!="19115" & FIPS!="19119" & FIPS!="19133" & FIPS!="19151" & FIPS!="19177" & FIPS!="20043" & FIPS!="20063" & FIPS!="20101" & FIPS!="20107" & FIPS!="20147" & FIPS!="20183" & FIPS!="20197" & FIPS!="21063" & FIPS!="22001" & FIPS!="22035" & FIPS!="22047" & FIPS!="22055" & FIPS!="22097" & FIPS!="22111" & FIPS!="22117" & FIPS!="28073" & FIPS!="29015" & FIPS!="30001" & FIPS!="30045" & FIPS!="30069" & FIPS!="30075" & FIPS!="31073" & FIPS!="31161" & FIPS!="38087" & FIPS!="39059" & FIPS!="39067" & FIPS!="39125" & FIPS!="46009" & FIPS!="46015"& FIPS!= "46037" & FIPS!="46067" & FIPS!="46085" & FIPS!="46089" & FIPS!="46109" & FIPS!="48197" & FIPS!="48299" & FIPS!="49029" & FIPS!="53069" & FIPS!="54073" & FIPS!="54091" & FIPS!="54105" & FIPS!="55059")

Is there a better way?


Solution

  • Using the test data in the Note at the end, collect the data frames into a list dfs. Then extract the FIPS column from each and intersect them giving FIPSi (i stands for intersect) which is a vector of common FIPS values. Finally reduce each data frame to just those rows with common FIPS values giving dfsi which is a list like dfs except each data frame in it only has the common FIPS values.

    dfs <- list(df1, df2, df3, df4, df5)
    FPSi <- Reduce(intersect, lapply(dfs, "[[", "FIPS"))
    dfsi <- lapply(dfs, subset, FIPS %in% FIPSi)
    

    Note

    # test data
    df1 <- setNames(BOD, c("FIPS", "x"))
    df2 <- df1[1:3,]
    df3 <- df4 <- df5 <- df1[2:4,]