referencedna-sequencegenome

Phasing genomes with shapeit5


I intend to use shapeit5 for phasing. I downloaded the reference dataset from 1000G:

https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html

However, while running shapeit5 I got this error: Opening 1000GP_Phase3_chr22.hap.gz: file format not supported by HTSlib.

Could you please help with this issue? Thanks!


Solution

  • The dataset you downloaded is a 'Phased haplotype file in IMPUTE -h format (compressed by gzip software)'. However, SHAPEIT works with VCF and/or BCF files. You need to use another format of file.

    If you want to download VCF files from 1000G, you can find them here: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/