awkprotein-database

Keeping format of a pdb file with conditionals


I'm new in awk, and I'm trying to modify column 3 (with numeration about NR) if column 1 has the word HETATM.

My input file is:

HETATM   25  O   UNL     1      86.047  83.059 103.165  1.00  0.00           O
HETATM   26  N   UNL     1      87.071  82.457 102.433  1.00  0.00           N
HETATM   27  C   UNL     1      91.764  77.729  97.523  1.00  0.00           C
HETATM   28  O   UNL     1      92.740  78.174  98.137  1.00  0.00           O
HETATM   29  H   UNL     1      90.477  80.552  97.677  1.00  0.00           H
CONECT    1    2
CONECT    2    1    3
CONECT    3    2    4    7

The output that I want, it's:

HETATM   25  O25   UNL     1      86.047  83.059 103.165  1.00  0.00           O
HETATM   26  N26   UNL     1      87.071  82.457 102.433  1.00  0.00           N
HETATM   27  C27   UNL     1      91.764  77.729  97.523  1.00  0.00           C
HETATM   28  O28   UNL     1      92.740  78.174  98.137  1.00  0.00           O
HETATM   29  H29   UNL     1      90.477  80.552  97.677  1.00  0.00           H
CONECT    1    2
CONECT    2    1    3
CONECT    3    2    4    7

I'm using this command to maintain the format of the file but I could not. Can you help me please?

awk 'BEGIN{FS=OFS="\t";}{if($1=="HETATM"){$3=$3NR};print $0}' file.pdb

Thanks a lot.


Solution

  • Using any sed:

    $ sed 's/^HETATM *\([^ ]*\) *[^ ]*/&\1/' file
    HETATM   25  O25   UNL     1      86.047  83.059 103.165  1.00  0.00           O
    HETATM   26  N26   UNL     1      87.071  82.457 102.433  1.00  0.00           N
    HETATM   27  C27   UNL     1      91.764  77.729  97.523  1.00  0.00           C
    HETATM   28  O28   UNL     1      92.740  78.174  98.137  1.00  0.00           O
    HETATM   29  H29   UNL     1      90.477  80.552  97.677  1.00  0.00           H
    CONECT    1    2
    CONECT    2    1    3
    CONECT    3    2    4    7
    

    Original answer:

    Assuming your input really is tab-separated as you indicate in your script, you were very, very close:

    $ awk 'BEGIN{FS=OFS="\t"} $1=="HETATM"{$3=$3 $2} 1' file
    HETATM  25      O25     UNL     1       86.047  83.059  103.165 1.00    0.00    O
    HETATM  26      N26     UNL     1       87.071  82.457  102.433 1.00    0.00    N
    HETATM  27      C27     UNL     1       91.764  77.729  97.523  1.00    0.00    C
    HETATM  28      O28     UNL     1       92.740  78.174  98.137  1.00    0.00    O
    HETATM  29      H29     UNL     1       90.477  80.552  97.677  1.00    0.00    H
    CONECT  1       2
    CONECT  2       1       3
    CONECT  3       2       4       7