I have been looking for a solution to my problem but have not been successful. I assume because it is an uncommon wish in terms of data analysis, good practice, and how to properly store data. I need the data in the specified format for further analysis in Primer. With small datasets, the second step (adding the factors) is easy to do by hand but with bigger ones it becomes more than tedious. Therefore, I would like to automate it.
The overall goal is to take long data and transform it to wide data, while additionally adding rows at the bottom which represent factors and are based on columns in the long data.
I have a data frame that looks like the following:
data <- data.frame(nr = c(1, 2, 3, 4, 5, 6, 7, 8),
year = c(2013, 2013, 2013, 2013, 2022, 2022, 2022, 2022),
depth = c(35, 35, 50, 50, 35, 35, 50, 50),
species = c("A", "B", "A", "D", "C", "B", "D", "A"),
area = c(1.0, 0.5, 3.2, 4.3, 2.0, 5.6, 1.8, 2.3))
The output that I am trying to achieve should look like this:
The first row represents names that are a combination of the factors used (year, depth) and the replicate number (nr). The following rows are the wide format of the species with their respective values (area). At the bottom are rows with the factors (year and depth), as well as the interaction between the two (year x depth).
With the following code, I transformed the data to wide format and included the correct names per column. It is missing the factors at the bottom though.
primer <- data %>%
pivot_wider(names_from = c(year, depth, nr), values_from = area) %>%
mutate(across(.cols = everything(), ~replace_na(.x, 0))) %>%
as.data.frame()
Since the factors are also included in the names, I was thinking to extract them and save them as a vector. Afterwards you could extract only the parts that you need (one time the year, and one time the depth) and row bind it to the "primer" data frame. While this might work (not sure how), it does not seem like the best option since the function would have to be adjusted each time to accommodate for different factors and also different numbers of factors.
I am looking for a more universal function to solve my problem.
Thank you for your help!
The trick is to mutate
"further" columns, pivot_wider
area and than transpose the dataframe.
Caution: all column types are character
library(dplyr)
library(tidyr)
data %>%
mutate(col_name=paste(year,depth,nr,sep="_"),
year_x_depth=paste(year,depth,sep="_")) %>%
pivot_wider(names_from=species,
values_from=area,
values_fill=0) %>%
tibble::column_to_rownames(var="col_name") %>%
t %>%
as.data.frame
2013_35_1 2013_35_2 2013_50_3 2013_50_4 2022_35_5 2022_35_6
nr 1 2 3 4 5 6
year 2013 2013 2013 2013 2022 2022
depth 35 35 50 50 35 35
year_x_depth 2013_35 2013_35 2013_50 2013_50 2022_35 2022_35
A 1.0 0.0 3.2 0.0 0.0 0.0
B 0.0 0.5 0.0 0.0 0.0 5.6
D 0.0 0.0 0.0 4.3 0.0 0.0
C 0 0 0 0 2 0
2022_50_7 2022_50_8
nr 7 8
year 2022 2022
depth 50 50
year_x_depth 2022_50 2022_50
A 0.0 2.3
B 0.0 0.0
D 1.8 0.0
C 0 0
Updates (reorder column):
data %>%
mutate(col_name=paste(year,depth,nr,sep="_"),
year_x_depth=paste(year,depth,sep="_")) %>%
pivot_wider(names_from=species,
values_from=area,
values_fill=0) %>%
relocate(any_of(c("A","B","D","C")))%>%
tibble::column_to_rownames(var="col_name") %>%
t %>%
as.data.frame
2013_35_1 2013_35_2 2013_50_3 2013_50_4 2022_35_5 2022_35_6 2022_50_7 2022_50_8
A 1.0 0.0 3.2 0.0 0.0 0.0 0.0 2.3
B 0.0 0.5 0.0 0.0 0.0 5.6 0.0 0.0
D 0.0 0.0 0.0 4.3 0.0 0.0 1.8 0.0
C 0 0 0 0 2 0 0 0
nr 1 2 3 4 5 6 7 8
year 2013 2013 2013 2013 2022 2022 2022 2022
depth 35 35 50 50 35 35 50 50
year_x_depth 2013_35 2013_35 2013_50 2013_50 2022_35 2022_35 2022_50 2022_50