I have a list of chromosome numbers and positions which I have obtained from the output of MAC of my sample. The format looks like this:
chr1.12661672.G.A,chr1.12661721.C.T 1/6 11 2 Mutant
chr1.157640161.C.T,chr1.157640277.G.A 1/6 11 2 Mutant
chr1.180806049.T.A,chr1.180806061.G.C 1/6 11 11 Mutant
chr1.205929901.G.A,chr1.205930053.C.T 1/7 10 5 Mutant
...
For example for the first row I have G on position 12661672 and C on position 12661721 but I want to find out what bases are between 12661672 and 12661721. Are there any tools for bases for custom positions? I am using the GRCh38 reference genome.
The best I could think of was looking through Ensembl and finding the bases manually. Obviously this is incredibly time consuming so I would something more automated.
I am looking to build a VCF of the MNVs, so something along the lines of:
chr1 180806049 TAGTGAACAAGG AAGTGAACAAGC . PASS
Where all the positions between the identified bases are filled in.
Extract chromosome, start and end into a bed format file. If this is for one analysis only I would use excel otherwise I would use python. Anyway the result should luke like this
chr1 12661672 12661721
chr1 157640161 157640277
chr1 180806049 180806061
chr1 205929901 205930053
Remember that a bed file is a tab-separated-values format
Then, one alternative could be bedtools getfasta
$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
$ cat test.bed
chr1 5 10
$ bedtools getfasta -fi test.fa -bed test.bed
>chr1:5-10
AAACC
# optionally write to an output file
$ bedtools getfasta -fi test.fa -bed test.bed -fo test.fa.out
$ cat test.fa.out
>chr1:5-10
AAACC
Copied from here https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html