I want to convert a pairwise distance table (observations in 2 columns) into a table with individuals listed instead (observations in 1 column). Essentially the info on pairwise relationships will be lost (which is irrelevant to my analysis anyways) and the distance value will need to be doubled for their respective rows.
I can separate the strings with this code:
pairwise_readout <- str_split_fixed(pairwise[,1], " ", 4) #splits strings apart
pairwise_readout <- data.frame(pairwise_readout,pairwise$dist) #places distance again
But have no idea how to continue with re-arranging the table into fewer columns. All search results bring up only pairwise table related solutions.
Here is an example dataset:
Important to note is that I'm also interested in the 'gr#' contained within the string for each observation.
pairwise <- data.frame(ind_comp = c("OP2645ii_d gr3 OP5048___g gr2","OP5046___e gr5 OP5048___g gr2","OP2413iiia gr1 OP5048___g gr2","OP5043___b gr1 OP5048___g gr2", "OP3088i___a gr1 OP5048___g gr2","OP5046___a gr5 OP5048___g gr2", "OP5048___b gr5 OP5048___g gr2", "OP5043___a gr3 OP5048___g gr2", "OP2645ii_d gr3 OP5048___g gr2", "OP2645ii_d gr3 OP5044___c gr2", "OP2413iiib gr4 OP5048___g gr2", "OP5046___c gr1 OP5048___g gr2"), dist = c(7.590363,6.449676,6.419955,6.349918,6.182623,6.162655,6.154232,6.140147,6.058633,5.962923,5.943956,5.863753))
Essentially I want a table that follows this form:
pairwise_table_less_columns <- data.frame(ind_comp = c("OP2645ii_d","OP5048___g","OP5046___e", "OP5048___g", "OP2413iiia", "OP5048___g", "OP5043___b", "OP5048___g", "OP3088i___a", "OP5048___g", "OP5046___a", "OP5048___g", "OP5048___b", "OP5048___g", "OP5043___a", "OP5048___g", "OP2645ii_d", "OP5048___g", "OP2645ii_d", "OP5044___c", "OP2413iiib", "OP5048___g", "OP5046___c", "OP5048___g"), gr = c("gr3","gr2","gr5", "gr2", "gr1", "gr2", "gr1", "gr2", "gr1", "gr2", "gr5", "gr2", "gr5", "gr2", "gr3", "gr2", "gr3", "gr2", "gr3", "gr2", "gr4", "gr2", "gr1", "gr2"), dist = c(7.590363,7.590363,6.449676,6.449676,6.419955,6.419955,6.349918,6.349918,6.182623,6.182623,6.162655,6.162655,6.154232,6.154232,6.140147,6.140147,6.058633,6.058633,5.962923,5.962923,5.943956,5.943956,5.863753,5.863753))
Another idea is to replace the second space with another delimeter, and split on that, i.e.
library(dplyr)
library(tidyr)
pairwise %>%
mutate(ind_comp = gsub('([^ ]+ [^ ]+) ', '\\1|', ind_comp)) %>%
separate_rows(ind_comp, sep = '[|]')
which gives,
ind_comp dist 1 OP2645ii_d gr3 7.590363 2 OP5048___g gr2 7.590363 3 OP5046___e gr5 6.449676 4 OP5048___g gr2 6.449676 5 OP2413iiia gr1 6.419955 6 OP5048___g gr2 6.419955 7 OP5043___b gr1 6.349918 8 OP5048___g gr2 6.349918 9 OP3088i___a gr1 6.182623 10 OP5048___g gr2 6.182623 11 OP5046___a gr5 6.162655 12 OP5048___g gr2 6.162655 13 OP5048___b gr5 6.154232 14 OP5048___g gr2 6.154232 15 OP5043___a gr3 6.140147 16 OP5048___g gr2 6.140147 17 OP2645ii_d gr3 6.058633 18 OP5048___g gr2 6.058633 19 OP2645ii_d gr3 5.962923 20 OP5044___c gr2 5.962923 21 OP2413iiib gr4 5.943956 22 OP5048___g gr2 5.943956 23 OP5046___c gr1 5.863753 24 OP5048___g gr2 5.863753