rdataframecsvexpss

How to return the number of values in .csv column within a range in r


I have a file in which the second column contains values of interest. I'm attempting to loop through a set of numbers (n) and get it to count the number of times a value within the range n-5:n+5 occurs in that column using R. I then wish to output that to a .csv (but haven't written that into the code so far).

I've been using:

library(expss)
SNP_file <- read.csv("testdata.csv", header = FALSE, sep = "\t")
for (n in 31130:31150) {
  SNP_Number <- 0
  SNP_Number <- count_if(n-5:n+5, SNP_file$V2)
  df <- data.frame(column1 = c(n), column2 = c(SNP_Number))
  print(df)
  
}

In testdata.csv there are values of 31140 and 31141 in the second column.

This returns an output like:


  column1 column2
1   31130       1
  column1 column2
1   31131       1
  column1 column2
1   31132       1
  column1 column2
1   31133       1
  column1 column2
1   31134       1
  column1 column2
1   31135       1
  column1 column2
1   31136       1
  column1 column2
1   31137       1
  column1 column2
1   31138       1
  column1 column2
1   31139       1
  column1 column2
1   31140       2
  column1 column2
1   31141       3
  column1 column2
1   31142       3
  column1 column2
1   31143       3
  column1 column2
1   31144       3
  column1 column2
1   31145       3
  column1 column2
1   31146       3
  column1 column2
1   31147       3
  column1 column2
1   31148       3
  column1 column2
1   31149       3
  column1 column2
1   31150       3

But this starts out by erroneously recording an initial value of 1 where should be a value of 0, which just increases each time n goes more than another value in testdata.csv, and does not drop back to 0 once n has no values within the range n-5:n+5.

So it should look like:


  column1 column2
1   31130       0
  column1 column2
1   31131       0
  column1 column2
1   31132       0
  column1 column2
1   31133       0
  column1 column2
1   31134       0
  column1 column2
1   31135       1
  column1 column2
1   31136       2
  column1 column2
1   31137       2
  column1 column2
1   31138       2
  column1 column2
1   31139       2
  column1 column2
1   31140       2
  column1 column2
1   31141       2
  column1 column2
1   31142       2
  column1 column2
1   31143       2
  column1 column2
1   31144       2
  column1 column2
1   31145       2
  column1 column2
1   31146       1
  column1 column2
1   31147       0
  column1 column2
1   31148       0
  column1 column2
1   31149       0
  column1 column2
1   31150       0

What am I doing wrong here?


Solution

  • Try this:

    vec <- c(22761L, 31140L, 31141L, 36701L, 44108L, 46917L, 51958L, 53661L,  119844L, 119845L, 184836L, 195026L, 249733L, 251024L, 271357L,  287257L, 360638L, 382559L, 384590L, 399027L)
    
    sapply(31130:31150, function(z) sum( (z-5) <= vec & vec <= (z+5) ))
    #  [1] 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 2 1 0 0 0 0