I have a dataframe that looks like this:
Family | Order | Class | Presence | Year | Site | Location | Lat | Long |
---|---|---|---|---|---|---|---|---|
Aeshnidae | Odonata | Insecta | 0 | 2021 | KAV01 | NASS | -17.4 | 18.5 |
Aeshnidae | Odonata | Insecta | 0 | 2023 | KAV01 | NASS | -17.4 | 18.5 |
Aeshnidae | Odonata | Insecta | 1 | 2021 | KAV02 | NASS | -17.7 | 18.7 |
Aeshnidae | Odonata | Insecta | 0 | 2023 | KAV02 | NASS | -17.7 | 18.7 |
Aeshnidae | Odonata | Insecta | 0 | 2021 | KAV03 | NASS | -17.8 | 19.1 |
Aeshnidae | Odonata | Insecta | 0 | 2023 | KAV03 | NASS | -17.8 | 19.1 |
There are a number of unique families, and I have presence values of 0 and 1, but the family value is not fully covered. This is, not all family values have assigned a value of 0 or 1 of presence. Let me explain more, in all the dataset there is a unique number of families = 100, yet the site KAV03 has values of presence/absence for 90. What I want is to make sure that the 10 remaining families are also present, with, of course, a presence value of 0. With this expansion of the data I also want to keep the remaining variables as they are.
I hope I explained well enough, let me know if you need more info. Here is a code I have tried but failed:
MorphoData <- expand.grid(
Site = unique(MorphoData$Site),
Family = unique(MorphoData$Family),
Year = unique(MorphoData$Year)
) %>%
left_join(MorphoData, by = c("Site", "Family", "Year")) %>%
group_by(Site, Family, Year) %>%
mutate(
Presence = replace_na(Presence, 0) # Ensure missing Presence values are 0
) %>%
group_by(Site, Year) %>%
fill(everything(.), .direction = "downup") %>% # Fill missing taxonomy/spatial data
ungroup()
Here is an hypothetical example of what I have:
data <- data_frame(Site = c("KAV01", "KAV01", "KAV01", "KAV01", "KAV01", "KAV01", "KAV01",
"KAV02", "KAV02", "KAV02", "KAV02", "KAV02", "KAV02", "KAV02",
"KAV03", "KAV03", "KAV03", "KAV03", "KAV03", "KAV03", "KAV03"),
Family = sample(c("Fam1", "Fam2", "Fam3", "Fam4", "Fam5", "Fam6"), 21, replace = TRUE),
Year = sample(c(2021, 2022, 2023), 21, replace = TRUE),
Presence = sample(c(0, 1), 21, , replace = TRUE),
Lon = rnorm(n = 21, mean = 5, sd = 1),
Lat = rnorm(n = 21, mean = 2, sd = 0.3))
You will see that the sites will not have values for all families. I want the families that are not included for a given site to be so with a presence value of 0.
Thank you,
Try complete
library(tidyr)
complete(data, Site, Family, Year, fill = list(Presence = 0))