I have a large dataset of different sites inside and out of Natura 2000 network.
An example:
df
Name | Total_Surface | N2000 | SurfaceN2000 |
---|---|---|---|
A | 1 | Yes | 0.5 |
B | 5 | No | NA |
C | 11 | No | NA |
D | 10 | Yes | 5 |
I want to have the sum of surfaces grouped by the N2000 factor (one total surface for "Yes", another for "No"). However, if the site is inside the N2000 network (N2000=Yes), I want to take the data from the column "SurfaceN2000".
How can I make a sum that depending on the N2000 factor it takes the values from Total_Surface or SurfaceN2000?
I know I could create a column with the sum of Total_Surface and another one with SurfaceN2000 but I was wondering if there was a more automatic solution.
I tried a number of things but they didn't work. The last thing I tried was
df %>%
group_by(N2000) %>%
summarise(surface=
case_when(N2000== "No" ~ sum(Total_Surface,na.rm = TRUE),
N2000== "Yes" ~ sum(SurfaceN2000,na.rm = TRUE)))
I also tried with filter() or rbind() but it didn't work either.
I was expecting an output table like:
N2000 | Surface |
---|---|
Yes | 5.5 |
No | 16 |
Thank you very much in advance! :)
Consider pivoting your data (according to a more sophisticated rule than demonstatred here).
tidyr::pivot_longer(d0, cols = c(Total_Surface, SurfaceN2000)) |>
dplyr::summarise(Surface = sum(value, na.rm = TRUE), .by = (c(N2000, name)))
# A tibble: 4 × 3
N2000 name Surface
<chr> <chr> <dbl>
1 Yes Total_Surface 11
2 Yes SurfaceN2000 5.5
3 No Total_Surface 16
4 No SurfaceN2000 0
You could add a filter
to return the desired subset.
Data.
> dput(d0)
structure(list(Name = c("A", "B", "C", "D"), Total_Surface = c(1L,
5L, 11L, 10L), N2000 = c("Yes", "No", "No", "Yes"), SurfaceN2000 = c(0.5,
NA, NA, 5)), class = "data.frame", row.names = c(NA, -4L))