how the data I work with looks(it is a SNP data):
AA CC CA GG
GA CA CC GG
GG CCCC CAA GG
CA GG CC GC
How I want it to become after case 2(row 3 is removed due to multiple characters column 2 and all columns are split into 2)
A A C C C A G G
G A C A C C G G
C A G G C C G C
case 1
what I use in the moment
mydata <- mydata[which(!nchar(as.character(mydata[,5]))>2),]
mydata <- mydata[which(!nchar(as.character(mydata[,6]))>2),]
mydata <- mydata[which(!nchar(as.character(mydata[,7]))>2),]
i want it to be
mydata <- mydata[which(!nchar(as.character(mydata[,5:7]))>2),]
the problem is that the function is counting all columns 5:7 and deleting every row. I want the same, but with doing it for each column, not for them together.
case 2
my code
this uses libraries
library(dplyr)
library(splitstackshape)
run for each column splits the cells this is for column 6
data2$V6 = as.character(data2$V6)
data2 <- cSplit(data.frame(data2 %>% rowwise() %>%
mutate(V6 = V6, V6n = paste(unlist(strsplit(V6, "")),
collapse = ','))), "V6n", ",")
data2$V5 <- NULL
I do the same for all columns problem i want to do it for all columns potential solution: different types of loops, but I can't make it work. Any help will be appreciated
Here's a fully vectorized solution in order to reach your desired ouput
## Convert all the rows into a single vectors
tmp <- do.call(paste0, mydata)
## Remove too long rows, split and rbind
do.call(rbind, strsplit(tmp[nchar(tmp) == 2 * ncol(mydata)], "", fixed = TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] "A" "A" "C" "C" "C" "A" "G" "G"
# [2,] "G" "A" "C" "A" "C" "C" "G" "G"
# [3,] "C" "A" "G" "G" "C" "C" "G" "C"
This will result in a matrix
but could be easily converted to a data.frame
if needed