I am working on a project that require me to take some .bed in input, extract one column from each file, take only certain parameters and count how many of them there are for each file. I am extremely inexperienced with bash so I don't know most of the commands. But with this line of code it should do the trick.
for FILE in *; do cat $FILE | awk '$9>1.3'| wc -l ; done>/home/parallels/Desktop/EP_Cell_Type.xls
I saved those values in a .xls since I need to do some graphs with them. Now I would like to take the filenames with -ls and save them in the first column of my .xls while my parameters should be in the 2nd column of my excel file. I managed to save everything in one column with the command:
ls>/home/parallels/Desktop/EP_Cell_Type.xls | for FILE in *; do cat $FILE | awk '$9>1.3'-x| wc -l ; done >>/home/parallels/Desktop/EP_Cell_Type.xls
My sample files are:A549.bed, GM12878.bed, H1.bed, HeLa-S3.bed, HepG2.bed, Ishikawa.bed, K562.bed, MCF-7.bed, SK-N-SH.bed and are contained in a folder with those files only.
The output is the list of all filenames and the values on the same column like this:
Column 1 |
---|
A549.bed |
GM12878.bed |
H1.bed |
HeLa-S3.bed |
HepG2.bed |
Ishikawa.bed |
K562.bed |
MCF-7.bed |
SK-N-SH.bed |
4536 |
8846 |
6754 |
14880 |
25440 |
14905 |
22721 |
8760 |
28286 |
but what I need should be something like this:
Filenames | #BS |
---|---|
A549.bed | 4536 |
GM12878.bed | 8846 |
H1.bed | 6754 |
HeLa-S3.bed | 14880 |
HepG2.bed | 25440 |
Ishikawa.bed | 14905 |
K562.bed | 22721 |
MCF-7.bed | 8760 |
SK-N-SH.bed | 28286 |
Assuming OP's awk
program (correctly) finds all of the desired rows, an easier (and faster) solution can be written completely in awk
.
One awk
solution that keeps track of the number of matching rows and then prints the filename and line count:
awk '
FNR==1 { if ( count >= 1 ) # first line of new file? if line counter > 0
printf "%s\t%d\n", prevFN, count # then print previous FILENAME + tab + line count
count=0 # then reset our line counter
prevFN=FILENAME # and save the current FILENAME for later printing
}
$9>1.3 { count++ } # if field #9 > 1.3 then increment line counter
END { if ( count >= 1 ) # flush last FILENAME/line counter to stdout
printf "%s\t%d\n", prevFN, count
}
' * # * ==> pass all files as input to awk
For testing purposes I replaced $9>1.3
with /do/
(match any line containing the string 'do'
) and ran against a directory containing an assortment of scripts and data files. This generated the following tab-delimited output:
bigfile.txt 7
blocker_tree.sql 4
git.bash 2
hist.bash 4
host.bash 2
lines.awk 2
local.sh 3
multi_file.awk 2