I'm trying to transform a complex dataframe such as (:
olddata_long =
A B C SAMPLE_ID 1 2 3 4 5
X1 Y1 Z1 SAMPLE1 G1 H1 J1 K1 L1
X1 Y1 Z1 SAMPLE2 G2 H2 J2 K2 L2
X2 Y2 Z2 SAMPLE1 G3 H3 J3 K3 L3
X2 Y2 Z2 SAMPLE2 G4 H4 J4 K4 L4
Where 1-5 are unique to each sample. I'm trying to transform this to a "wide" format, such as:
A B C SAMPLE1 SAMPLE2
X1 Y1 Z1 1=G1;2=H1;3=J1;4=K1;5=L1 1=G2;2=H2;3=J2;4=K2;5=L2
X2 Y2 Z2 1=G3;2=H3;3=J3;4=K3;5=L3 1=G4;2=H4;3=J4;4=K4;5=L4
I've messed with transforming to long formats, but can't think of anyway of doing this for multiple variables.
data_wide <- spread(olddata_long, SAMPLE_ID, c(1,2,3,4,5))
Maybe it is easier to concatenate 1:5 then transform to wide format?
Edit: Thank you for your responses!
I tested all three version and all of them worked fine. I also microbenchmarked them in a real dataset which is way bigger (13mb txt file)
min lq mean median uq max neval
MrFlick 161.9813 229.5798 256.5784 240.3421 250.6077 511.7215 20
stefan 108.9270 120.2335 160.4594 174.6441 185.0129 196.7983 20
TarJae 149.7109 163.1959 226.3037 207.8249 238.0539 420.1584 20
This is just to have an idea. I will give the correct answer to the one that was faster (to respond)
You can first make it longer, then do the transformation, then make it wider. For example
library(dplyr, warn=FALSE)
library(tidyr)
dd %>%
pivot_longer(`1`:`5`) %>%
mutate(val=paste0(name, "=", value), name=NULL, value=NULL) %>%
pivot_wider(names_from=SAMPLE_ID, values_from=val, values_fn=function(x) paste(x, collapse=";"))
# A B C SAMPLE1 SAMPLE2
# <chr> <chr> <chr> <chr> <chr>
# 1 X1 Y1 Z1 1=G1;2=H1;3=J1;4=K1;5=L1 1=G2;2=H2;3=J2;4=K2;5=L2
# 2 X2 Y2 Z2 1=G3;2=H3;3=J3;4=K3;5=L3 1=G4;2=H4;3=J4;4=K4;5=L4
Tested with
dd <- read.table(text="A B C SAMPLE_ID 1 2 3 4 5
X1 Y1 Z1 SAMPLE1 G1 H1 J1 K1 L1
X1 Y1 Z1 SAMPLE2 G2 H2 J2 K2 L2
X2 Y2 Z2 SAMPLE1 G3 H3 J3 K3 L3
X2 Y2 Z2 SAMPLE2 G4 H4 J4 K4 L4", header=T, check.names=F)