rggplot2bioinformaticsgenetics

Good way to graph allele frequency of different SNPs along chromosomes


I have a set of SNPs from different parts of the genome and their allele frequencies in various populations and metapopulations of interest. I want to plot the allele frequencies along the SNPs' genomic coordinates for all 22 autosomes.

Basically, I want to generate something like this Figure 1A from Sankararaman et al. (2014) (http://www.nature.com/nature/journal/v507/n7492/fig_tab/nature12961_F1.html) except the Y-axis would be frequency, all populations would be on the same graph (not separated), and I would have colored points instead of spikes.

My data is formatted as such (MAF = minor allele frequency, which is what I want to graph)

CHR    SNP        COORD   CLST   A1   A2    MAF    MAC  NCHROBS
1   rs###  2####  Region  G    A   0.4   400     1000

(It goes through all the SNPs for on region, and then it does them for the next region, and so forth)

Any suggestions on how to do this in R? Thanks!


Solution

  • For a simple plot of the coordinates versus frequency here's an example:

    #Example data:
    MAF=runif(1000,min=0,max=1)
    COORD=runif(1000,min=0,max=100000)
    test.df=data.frame(COORD,MAF)
    
    #plot
    plot(test.df$COORD,test.df$MAF)
    

    In the plot you won't need the example data, but will need to substitute your table name in for test.df.

    If you need to beautify it with colors/labels etc. that can be done too:

    plot(test.df$COORD,test.df$MAF, col="red", pch=18)
    

    OR

    library(ggplot2)
    p=ggplot(test.df,aes(COORD,MAF))
    p + geom_point()