I'm looking to generate a folder with pdb file of every peptide of 7 (lentgh) specific amino acids. I was thinking to firstly making a simple linux script to generate a file with all 7 letter combination like this :
AAAAAAA
AAAAAAB
AAAAABA
AAAABAA
AAABAAA
AABAAAA
ABAAAAA
BAAAAAA
AAAAABB
AAAABAB
...
I think this script can work but I'm not sure :
for c1 in {A,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}
do
for c2 in {A,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}
do
for c3 in {A,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}
do
for c4 in {A,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}
do
for c5 in {A,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}
do
printf "%s\n" "$c1$c2$c3$c4$c5"
done
done
done
done
done
And then using and other simple script which every row of the last file generate a peptide with pymol with this command :
for aa in "row1": cmd._alt(string.lower(aa))
save row1.pdb, all
I'm new in scripting to linux. Is anyone can help me please? Thanks
Here's a technique which produces the answer 'fairly fast'. Basically, it starts with a file containing a single newline, and the list of amino acid letters.
It generates a sed
script (using sed
, of course) that successively adds an amino acid letter to the end of a line, prints it, removes it, and moves on to the next letter.
printf "%s\n" A D E F G H I K L M N P Q R S T V W Y |
sed 's%.%s/$/&/p;s/&$//%' > peptides.sed
echo > peptides.0A # Bootstrap the process
sed -n -f peptides.sed peptides.0A > peptides.1A
sed -n -f peptides.sed peptides.1A > peptides.2A
sed -n -f peptides.sed peptides.2A > peptides.3A
timecmd sed -n -f peptides.sed peptides.3A > peptides.4A
timecmd sed -n -f peptides.sed peptides.4A > peptides.5A
timecmd sed -n -f peptides.sed peptides.5A > peptides.6A
timecmd sed -n -f peptides.sed peptides.6A > peptides.7A
You can think of 'timecmd' as a variant of time
. It prints the start time, the command, then runs it, and then prints the end time and the elapsed time (wall-clock time only).
Sample output:
$ bash peptides-A.sh
2015-10-16 15:25:24
+ exec sed -n -f peptides.sed peptides.3A
2015-10-16 15:25:24 - elapsed: 00 00 00
2015-10-16 15:25:24
+ exec sed -n -f peptides.sed peptides.4A
2015-10-16 15:25:27 - elapsed: 00 00 03
2015-10-16 15:25:27
+ exec sed -n -f peptides.sed peptides.5A
2015-10-16 15:26:16 - elapsed: 00 00 49
2015-10-16 15:26:16
+ exec sed -n -f peptides.sed peptides.6A
2015-10-16 15:42:47 - elapsed: 00 16 31
$ ls -l peptides.?A; rm -f peptides-?A
-rw-r--r-- 1 jleffler staff 1 Oct 16 15:25 peptides.0A
-rw-r--r-- 1 jleffler staff 38 Oct 16 15:25 peptides.1A
-rw-r--r-- 1 jleffler staff 1083 Oct 16 15:25 peptides.2A
-rw-r--r-- 1 jleffler staff 27436 Oct 16 15:25 peptides.3A
-rw-r--r-- 1 jleffler staff 651605 Oct 16 15:25 peptides.4A
-rw-r--r-- 1 jleffler staff 14856594 Oct 16 15:25 peptides.5A
-rw-r--r-- 1 jleffler staff 329321167 Oct 16 15:26 peptides.6A
-rw-r--r-- 1 jleffler staff 7150973912 Oct 16 15:42 peptides.7A
$
I used the script from the question to create peptides.5B
(the script was called peptides-B.sh
on my disk), and checked that peptides.5A
and peptides.5B
were identical.
Test environment: 13" MacBook Pro, 2.7 GHz Intel Core i5, 8 GiB RAM, SSD storage.
Editing the start of the line instead of the end of the line yields approximately a 20% performance improvement.
Code:
printf "%s\n" A D E F G H I K L M N P Q R S T V W Y |
sed 's%.%s/^/&/p;s/^&//%' > peptides.sed
echo > peptides.0A # Bootstrap the process
sed -n -f peptides.sed peptides.0A > peptides.1A
sed -n -f peptides.sed peptides.1A > peptides.2A
sed -n -f peptides.sed peptides.2A > peptides.3A
timecmd sed -n -f peptides.sed peptides.3A > peptides.4A
timecmd sed -n -f peptides.sed peptides.4A > peptides.5A
timecmd sed -n -f peptides.sed peptides.5A > peptides.6A
timecmd sed -n -f peptides.sed peptides.6A > peptides.7A
Timing:
$ bash peptides-A.sh; ls -l peptides.?A; wc peptides.?A; rm -f peptides.?A
2015-10-16 16:05:48
+ exec sed -n -f peptides.sed peptides.3A
2015-10-16 16:05:48 - elapsed: 00 00 00
2015-10-16 16:05:48
+ exec sed -n -f peptides.sed peptides.4A
2015-10-16 16:05:50 - elapsed: 00 00 02
2015-10-16 16:05:50
+ exec sed -n -f peptides.sed peptides.5A
2015-10-16 16:06:28 - elapsed: 00 00 38
2015-10-16 16:06:28
+ exec sed -n -f peptides.sed peptides.6A
2015-10-16 16:18:51 - elapsed: 00 12 23
-rw-r--r-- 1 jleffler staff 1 Oct 16 16:05 peptides.0A
-rw-r--r-- 1 jleffler staff 38 Oct 16 16:05 peptides.1A
-rw-r--r-- 1 jleffler staff 1083 Oct 16 16:05 peptides.2A
-rw-r--r-- 1 jleffler staff 27436 Oct 16 16:05 peptides.3A
-rw-r--r-- 1 jleffler staff 651605 Oct 16 16:05 peptides.4A
-rw-r--r-- 1 jleffler staff 14856594 Oct 16 16:05 peptides.5A
-rw-r--r-- 1 jleffler staff 329321167 Oct 16 16:06 peptides.6A
-rw-r--r-- 1 jleffler staff 7150973912 Oct 16 16:18 peptides.7A
1 0 1 peptides.0A
19 19 38 peptides.1A
361 361 1083 peptides.2A
6859 6859 27436 peptides.3A
130321 130321 651605 peptides.4A
2476099 2476099 14856594 peptides.5A
47045881 47045881 329321167 peptides.6A
893871739 893871739 7150973912 peptides.7A
943531280 943531279 7495831836 total
$
I tarted up the output from wc
so it was 'properly columnar' (adding spaces, in other words). The original started going wonky when the numbers contained 8 digits.