I have two lists of IDs that I am comparing with comm
command. My problem is that output looks like this:
YAL002W
YAL003W
YAL004W
YAL005C
YAL008W
YAL011W
All I want to do is try to pipe it somehow so the file is written with out the empty spcaces, that translate into white cell when I open this files in excel
. I have tried every possible combination I have found of grep, awk and sed to remove blank spaces without luck...
So I have came to the conclusion that columns are separated by one or two tabs respectively, therefore I can not remove them as easily as removing blank spaces without removing the formating of the file.
any help or suggestion will be welcomed. Thanks
EDIT:
I want my output to be three columns, tab delimited without the blank spaces
YAL002W YAL004W YAL008W
YAL003W YAL005C
YAL011W
EDIT2 to avoit XY Problem as referenced:
Original problem (X): I have to lists and I want to find common and unique words between both lists (To generate a Venn diagram later on). So comm
seemed like the perfect solution since I get all three lists at the same time, which I can later on import into excel easily.
Secondary problem (Y): The three columns that are generated are not three columns (or so I am starting to think) since I can't cut -f
them, nor I can't remove the blank spaces with usual awk 'NF'
or grep .
(for example).
Given this input and comm
output:
$ cat file1
YAL002W
YAL003W
YAL008W
$ cat file2
YAL004W
YAL005C
YAL008W
YAL011W
$ comm file1 file2
YAL002W
YAL003W
YAL004W
YAL005C
YAL008W
YAL011W
This will do what you asked for:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
colNr = NF
rowNr = ++rowNrs[colNr]
val[rowNr,colNr] = $NF
numCols = (colNr > numCols ? colNr : numCols)
numRows = (rowNr > numRows ? rowNr : numRows)
}
END {
for (rowNr=1; rowNr<=numRows; rowNr++) {
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%s", val[rowNr,colNr], (colNr<numCols ? OFS : ORS)
}
}
}
.
$ comm file1 file2 | awk -f tst.awk
YAL002W YAL004W YAL008W
YAL003W YAL005C
YAL011W
but of course you could just skip the call to comm
and use awk right off the bat:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==FNR {
file1[$0]
next
}
{
if ($0 in file1) {
colNr = 3
delete file1[$0]
}
else {
colNr = 2
}
rowNr = ++rowNrs[colNr]
val[rowNr,colNr] = $0
}
END {
for (v in file1) {
colNr = 1
rowNr = ++rowNrs[colNr]
val[rowNr,colNr] = v
}
numRows = (rowNrs[1] > rowNrs[2] ? rowNrs[1] : rowNrs[2])
numRows = (numRows > rowNrs[3] ? numRows : rowNrs[3])
numCols = 3
for (rowNr=1; rowNr<=numRows; rowNr++) {
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%s", val[rowNr,colNr], (colNr<numCols ? OFS : ORS)
}
}
}
.
$ awk -f tst.awk file1 file2
YAL002W YAL004W YAL008W
YAL003W YAL005C
YAL011W