I am trying to merge 3000 bacterial bcf files using bcftools. The vcf files has been generated using GATK and converted to bcf and indexed by bcftools. The bcftools proceeds to analyze 20% of the data but it keeps terminating premature and produces a merged bcf files only for a portion of variants ( up to 500kb from 2M bacterial genome). The code I am using is like this:
bcftools1.7/bcftools merge -l VarList.txt -0 --missing-to-ref --threads 1 -O b > CombinedVCF
The output error is :
/bin/sh: line 1: 17041 Segmentation fault (core dumped) bcftools/bcftools merge -l VarList.txt -0 --missing-to-ref --threads 1 -O b > CombinedVCF
Previously I tried the same command for 400 samples without any problem.
Searching online, "A segfault occurs when a reference to a variable falls outside the segment where that variable resides, or when a write is attempted to a location that is in a read-only segment". The command is running on a cluster with 80Gb of available RAM for the specific job. I am not sure whether this error is due to a problem with the bcftools software itself or because of the limitation of system which is running the command?
Here is the sample bcf files to replicate the error (https://figshare.com/articles/BCF_file_segfault/7412864). The error appears only for large sample sizes so I could not reduce the size any further.
It was a bug in bcftools and the author kindly fixed it after notification:
https://github.com/samtools/bcftools/issues/929#issuecomment-443614761