rcsvdataframebinary-matrix

How would I find (and output) the position of the first value of 1 and the last value of 1 by row in a number of csv files at once?


I am trying to output the position of the first value of 1 and the last value of 1 by row in a number of binary matrices stored in multiple csv files at once?

I have the following used to read in all tab-delimated csv files in the working directory...

csvs <- list.files(pattern="*.csv")
files <- lapply(csvs, read.delim)

First of all, I have tried...

first_1 <- sapply(files, function(x) min(which(x == 1)))

But this isn't given me the right answer. For example in a csv file with a binary matrix of

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   1   1   1   0   0   1   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   1   1   1   1   1   0   0   0   1   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   1   1   0   0   0   0   0   0

0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   1   1   1   1   0   0   0   0   0

0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0

0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0

0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0

0   0   0   0   0   0   0   1   0   0   0   1   1   1   1   0   0   0   1   0   0   0   0   0   0

0   0   0   0   0   0   0   1   0   0   0   1   0   0   0   1   0   1   1   0   0   0   0   0   0

0   0   0   0   0   0   0   1   0   0   0   1   0   0   0   1   1   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

the sapply is outputting 152 when it should be outputting 135. Can someone help?

50 x 50 data frame


Solution

  • You are reading the data and creating data frames not matrices. That could affect your results down the line, but not here since data frames and matrices are both processed by R column-wise so you are getting the correct answer to your question just not the one you want. The simplest approach is to use t(). I've created a data frame from your example called dta:

    min(which(dta == 1))
    # [1] 159
    min(which(t(dta) == 1))
    # [1] 135
    

    Larger matrices work just fine (response to comment below). First create a reproducible matrix.

    dta <- matrix(0, 50, 50)
    ones <- structure(c(25L, 22L, 27L, 9L, 31L, 38L, 32L, 2L, 9L, 50L, 7L, 
    19L, 40L, 47L, 26L, 1L, 47L, 34L, 16L, 23L, 39L, 3L, 30L, 50L, 
    11L, 3L, 41L, 28L, 22L, 15L, 50L, 31L, 28L, 38L, 16L, 25L, 14L, 
    22L, 12L, 11L, 40L, 44L, 1L, 38L, 7L, 39L, 1L, 39L, 33L, 50L, 
    16L, 15L, 4L, 37L, 25L, 25L, 18L, 9L, 21L, 32L, 47L, 49L, 17L, 
    48L, 26L, 7L, 4L, 47L, 16L, 11L, 35L, 17L, 25L, 23L, 24L, 4L, 
    12L, 23L, 8L, 38L, 19L, 32L, 8L, 35L, 1L, 48L, 42L, 45L, 43L, 
    45L, 30L, 41L, 5L, 5L, 49L, 37L, 19L, 20L, 48L, 43L), .Dim = c(50L, 
    2L), .Dimnames = list(NULL, c("row", "col")))
    dta[ones] <- 1
    dim(dta)  # Show the number of rows and columns
    # [1] 50 50
    

    You can browse the matrix with View(dta) before you use the following code:

    min(which(dta == 1))  # By columns
    # [1] 16
    min(which(t(dta) == 1))  # By rows
    # [1] 5