During fieldwork, we collected data for counts of dolphins at coral reefs per month, per year. I have split my data into seasons for winter and summer.
This is my method for dplyr:
Step 1: Calculate the total sightings and average group size per reef and season
Step 1: The data for total sightings needs to be normalised by season as they have an uneven amount of months. Winter is 7 months and summer is 5 months.
We should obtain these two new columns
I can't share the original dataframe due to ownership issues
Many thanks if you can help.
However, when I run my code, I get this error message:
n length
`summarise()` has grouped output by 'Reef_Code'. You can override using the `.groups` argument.
Error in `mutate()`:
ℹ In argument: `Normalized_Sightings = Total_Sightings/season_months[Season]`.
ℹ In group 1: `Reef_Code = 1`.
Caused by error in `Total_Sightings / season_months[Season]`:
! non-numeric argument to binary operator
Run `rlang::last_trace()` to see where the error occurred.
R-code:
library(dplyr)
#Normalization for season, simple normalization based on length, 7 months for Winter and 5 for Summer
# Define months per season
season_months <- list("Winter" = 7, "Summer" = 5)
#Group by reef and season, then calculate total sightings,
#normalize these for each season and calculate the average group size
# Group by Reef_Code and Season, then normalize
result <- MyDf %>%
group_by(Reef_Code, Season) %>%
summarize(
Total_Sightings = n(), # Count of sightings per reef and season
Avg_Group_Size = mean(Group_Size, na.rm = TRUE)) %>% # Average group size
mutate(Normalized_Sightings = Total_Sightings / season_months[Season]) # Normalize by season length
Dummy Dataframe
structure(list(Reef_Code = c(1L, 2L, 3L, 1L, 1L, 3L, 2L, 4L,
2L, 5L, 4L, 2L, 3L, 6L, 5L, 3L, 6L, 6L, 4L, 2L, 5L, 4L, 1L, 2L,
3L, 4L, 6L, 1L, 1L, 2L, 3L, 6L, 5L, 3L, 6L, 6L, 4L, 2L, 5L, 4L,
3L, 1L, 1L, 3L, 2L, 4L, 2L, 5L, 4L, 2L, 3L, 6L, 5L, 3L, 5L, 4L,
2L, 3L, 6L), Season = c("Summer", "Summer", "Summer", "Summer",
"Summer", "Summer", "Summer", "Summer", "Winter", "Winter", "Winter",
"Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter",
"Summer", "Summer", "Summer", "Summer", "Summer", "Summer", "Winter",
"Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Winter",
"Winter", "Summer", "Summer", "Summer", "Summer", "Summer", "Summer",
"Winter", "Summer", "Summer", "Summer", "Summer", "Summer", "Winter",
"Winter", "Winter", "Winter", "Winter", "Winter", "Winter", "Summer",
"Summer", "Summer", "Summer", "Summer", "Summer", "Winter"),
Group_Size = c(7L, 11L, 1L, 14L, 16L, 2L, 5L, 5L, 5L, 8L,
8L, 6L, 6L, 1L, 8L, 8L, 4L, 5L, 1L, 5L, 5L, 14L, 8L, 7L,
7L, 18L, 25L, 2L, 5L, 5L, 8L, 8L, 6L, 6L, 1L, 8L, 8L, 5L,
14L, 8L, 7L, 7L, 18L, 25L, 2L, 5L, 5L, 8L, 8L, 6L, 6L, 1L,
8L, 7L, 8L, 8L, 6L, 6L, 1L)), class = "data.frame", row.names = c(NA,
-59L))
I'd suggest using these two lines in place of the last mutate
line:
...
left_join(data.frame(Season = c("Winter", "Summer"),
season_months = c(7,5))) |>
mutate(Normalized_Sightings = Total_Sightings / season_months)
or
mutate(Normalized_Sightings = Total_Sightings / if_else(Season == "Winter", 7, 5))
or
mutate(season_months = case_match(Season,
"Winter" ~ 7,
"Summer" ~ 5)) |>
mutate(Normalized_Sightings = Total_Sightings / season_months)
Note also that summarize
's default will just remove the most recent grouping, so the output is still grouped by Reef_Code. This could potentially lead to unexpected results later if you expect the calculations will done in the context of the whole ungrouped data.
To remove that grouping, you could add |> ungroup()
, or add .groups = "drop"
at the end of the summarize()
. Or, my preference, skip the group_by
and instead use .by = c(Reef_Code, Season)
at the end of the summarize()
. This will apply that grouping to the summarize()
step alone, saving you the need to keep track of it.