I am working on a RNA-Seq data set consisting of around 24000 rows (genes) and 1100 columns (samples), which is tab separated. For the analysis, I need to choose a specific gene set. It would be very helpful if there is a method to extract rows based on row number? It would be easier that way for me rather than with the gene names.
I need to select 230 rows that are shuffled across 24000 rows.
Below is an example of the data (4X4) -
gene Sample1 Sample2 Sample3
A1BG 5658 5897 6064
AURKA 3656 3484 3415
AURKB 9479 10542 9895
From this, say for example, I want row 1, 3 and 4, without a specific pattern.
I have also asked on biostars.org.
You may use a for-loop to build the sed
options like below
var=-n
for i in 1 3,4 # Put your space separated ranges here
do
var="${var} -e ${i}p"
done
sed $var filename
Note: In any case the requirement mentioned here would still be pain as it involves too much typing.