I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this:
ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1
ID2 chr2:87432_A:T_1/1
ID3 chr11:432434:T:G chr14:34234234:C:G chr20:34324234:T:C
This is to then read into R
I have tried combinations of:
bcftools query -f '[%SAMPLE\t] %CHROM:%POS:%REF:%ALT[%GT]\n'
but I keep getting sample IDs overlapping on the same line and I can't quite figure out the sytnax.
Your help would be much appreciated
You cannot achieve what you want with a single BCFtools command. BCFtools parses one VCF variant at a time. However, you can use a command like this to extract what you want:
bcftools +split -i 'GT="0/1" | GT="1/1"' -Ob -o DIR input.vcf
This will create one small .bcf file for each sample and you can then run multiple instance of bcftools query to get what you want