rread.tablevcf-variant-call-format

Read table with comment lines starting with "##"


I'm struggling to read my tables in Variant Call Format (VCF) with R. Each file has some comment lines starting with ##, and then the header starting with #.

## contig=<ID=OTU1431,length=253>
## contig=<ID=OTU915,length=253>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  /home/sega/data/bwa/reads/0015.2142.fastq.q10sorted.bam
Eubacterium_ruminantium_AB008552    56  .   C   T   228 .   DP=212;AD=0,212;VDB=0;SGB=-0.693147;MQ0F=0;AC=2;AN=2;DP4=0,0,0,212;MQ=59    GT:PL   1/1:255,255,0

How can I read such table without missing a header? Using read.table() with comment.char = "##" returns an error: "invalid 'comment.char' argument"


Solution

  • If you want to read VCF, you can also just try to use readVcf from VariantAnnotation in Bioconductor. https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html

    Otherwise, I can highly recommend fread function in data.table package. It allows you to use the skip argument to allow it to start importing when a substring has been found.

    e.g.

    fread("test.vcf", skip = "CHROM")
    

    should work.