vcftoolsbcftools

creating a per sample table from a vcf using bcftools


I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this:

ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1
ID2 chr2:87432_A:T_1/1 
ID3 chr11:432434:T:G chr14:34234234:C:G chr20:34324234:T:C

This is to then read into R

I have tried combinations of:

bcftools query -f '[%SAMPLE\t] %CHROM:%POS:%REF:%ALT[%GT]\n' but I keep getting sample IDs overlapping on the same line and I can't quite figure out the sytnax.

Your help would be much appreciated


Solution

  • You cannot achieve what you want with a single BCFtools command. BCFtools parses one VCF variant at a time. However, you can use a command like this to extract what you want:

    bcftools +split -i 'GT="0/1" | GT="1/1"' -Ob -o DIR input.vcf
    

    This will create one small .bcf file for each sample and you can then run multiple instance of bcftools query to get what you want