shellextractfile-handling

How to extract specific rows based on row number from a file


I am working on a RNA-Seq data set consisting of around 24000 rows (genes) and 1100 columns (samples), which is tab separated. For the analysis, I need to choose a specific gene set. It would be very helpful if there is a method to extract rows based on row number? It would be easier that way for me rather than with the gene names.

I need to select 230 rows that are shuffled across 24000 rows.

Below is an example of the data (4X4) -

gene    Sample1    Sample2    Sample3
A1BG     5658    5897      6064
AURKA    3656    3484      3415
AURKB    9479    10542    9895

From this, say for example, I want row 1, 3 and 4, without a specific pattern.

I have also asked on biostars.org.


Solution

  • You may use a for-loop to build the sed options like below

    var=-n
    for i in 1 3,4 # Put your space separated ranges here
    do
     var="${var} -e ${i}p"
    done
    sed $var filename
    

    Note: In any case the requirement mentioned here would still be pain as it involves too much typing.