rbioinformaticsvcf-variant-call-format

R: (Pegas) problems with haplotypes - (error: 'h' must be of class 'haplotype')


I've recently started looking in to haplotype data and I'm messing around with data from the 1000 genomes project and trying to manipulate it with the Pegas package in R. So far I've come this far:

library(pegas)
a <- "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502"
b <- "ALL.chrY.phase3_integrated_v1b.20130502.genotypes.vcf.gz"
url <- paste(a, b, sep = "/")
download.file(url, "chrY.vcf.gz")
(info <- VCFloci("chrY.vcf.gz"))
SNP <- is.snp(info)
X.SNP <- read.vcf("chrY.vcf.gz", which.loci = which(SNP))
h <- haplotype(X.SNP, 6020:6030)
net <- haploNet(h)
plot(net)

I would like to plot a haplotype net but it doesn't execute it. I get the following message: 'h' must be of class 'haplotype'

If I print out h I get:

> h
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
. "C"  "C"  "C"  "C"  "C"  "C"  "C"  "C"  "C"  "C"   "C"   "C"   "C"   "C"   "C"   "C"   "T"   "C"   "C"  
. "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"   "G"   "G"   "G"   "G"   "G"   "G"   "G"   "A"   "G"  
. "C"  "C"  "C"  "C"  "C"  "C"  "T"  "C"  "C"  "C"   "C"   "C"   "C"   "C"   "C"   "C"   "C"   "C"   "C"  
. "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"   "C"   "T"   "T"   "T"   "T"   "T"   "T"   "T"   "T"  
. "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"  "G"   "G"   "G"   "G"   "A"   "G"   "G"   "G"   "G"   "G"  
. "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"   "T"   "C"   "T"   "T"   "T"   "T"   "T"   "T"   "T"  
. "A"  "A"  "A"  "A"  "A"  "A"  "A"  "A"  "A"  "C"   "A"   "A"   "A"   "A"   "A"   "A"   "A"   "A"   "A"  
. "G"  "G"  "G"  "."  "G"  "G"  "G"  "G"  "G"  "G"   "G"   "G"   "A"   "G"   "G"   "G"   "G"   "G"   "G"  
. "."  "T"  "C"  "T"  "T"  "C"  "T"  "."  "."  "."   "T"   "T"   "T"   "T"   "C"   "T"   "T"   "T"   "T"  
. "."  "A"  "."  "A"  "."  "C"  "A"  "A"  "C"  "."   "A"   "A"   "A"   "A"   "A"   "C"   "A"   "A"   "A"  
. "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"  "T"   "T"   "T"   "T"   "T"   "T"   "T"   "T"   "T"   "C"  
attr(,"class")
[1] "haplotype.loci"
attr(,"freq")
 [1]   18 1142    2    5   25    6    1    4    2    1    2    5    1    9    1    3    1    4    1

It obviously assigned 19 haplotypes. Something must be wrong with the way the data is presented. Any advice? Also there is very little material on Pegas and how to manipulate with VCF files with the use of Pegas. Does anybody know a good resource (web page or book) for getting information on how to manipulate with haplotypes from VCF files, it doesn't even have to be for Pegas, any R library will do, or Python... anything really.

Thank you for the help, Peter


Solution

  • That's an expected result: for the moment haploNet() works only for the class "haplotype" which is generated from DNA seqs (class "DNAbin"). The output of read.vcf() is of class "loci" and haplotype() is a generic function working on both classes.

    If you work on SNPs only, you can avoid this with:

    class(h) <- NULL
    h <- as.DNAbin(h)
    

    The (ultimate) goal is to have haploNet() works also with the class "haplotype.loci" (which is still in development) and maybe others.

    Cheers, Emmanuel