I have a set of data that I am attempting to manipulate to generate occurrence data that has associated time bins and latitude bins. My data consists of 19 latitudinal bins in intervals of 10 from -90 to 90. (Labeled at midpoints so latbin 15 = latitudinal range 10-20) There are also 54 time bins in intervals of 10 from 0 to 540. (Labeled at midpoints so timebin 25 = time range 20-30)
I have a number of genera that each have an associated latitude bin and time bin currently formatted as such:
genus | latbin | timebin |
---|---|---|
genus a | 15 | 95 |
genus a | 20 | 85 |
genus a | 20 | 95 |
genus b | 10 | 75 |
genus b | 10 | 75 |
genus b | 15 | 75 |
genus c | 10 | 85 |
genus c | 15 | 75 |
I would like to somehow transform this so that both the latbin and timebin columns are now rows and read across from 5-535 for timebin, latbin reads across from -85 - 85, and genera are still in column format. The table would then be filled with the count of occurrences a specific genus has in time bins and latitude bins.
(I would also like no header, but I had to include one for stackoverflow to let me post this.)
This is the desired structure of my dataset:
header | header | header | header | header | header | header | header | header | header | header |
---|---|---|---|---|---|---|---|---|---|---|
timebin | ... | 75 | 75 | 75 | 85 | 85 | 85 | 95 | 95 | 95 |
latbin | ... | 10 | 15 | 20 | 10 | 15 | 20 | 10 | 15 | 20 |
genus a | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
genus b | ... | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
genus c | ... | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
I have looked through dcast and melt documentation, but I don't believe it will get the data structure I need. I am not trying to just reshape from long to wide, I am trying to produce summary counts for each genera that occur in each latitude bin and during each time bin.
If anyone has any ideas, that would be great.
You can count, complete, and pivot using the usual tidyverse tools, then transpose:
library(dplyr)
library(tidyr)
dat |>
count(genus, timebin, latbin) |>
complete(
genus,
timebin = seq(75, 95, by = 10),
latbin = seq(10, 20, by = 5),
fill = list(n = 0)
) |>
pivot_wider(names_from = genus, values_from = n) |>
t()
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
timebin 75 75 75 85 85 85 95 95 95
latbin 10 15 20 10 15 20 10 15 20
genus a 0 0 0 0 0 1 0 1 1
genus b 2 1 0 0 0 0 0 0 0
genus c 0 1 0 1 0 0 0 0 0
This gives you a matrix, which you can pass to as.data.frame()
if desired.