I have a set of SNPs from different parts of the genome and their allele frequencies in various populations and metapopulations of interest. I want to plot the allele frequencies along the SNPs' genomic coordinates for all 22 autosomes.
Basically, I want to generate something like this Figure 1A from Sankararaman et al. (2014) (http://www.nature.com/nature/journal/v507/n7492/fig_tab/nature12961_F1.html) except the Y-axis would be frequency, all populations would be on the same graph (not separated), and I would have colored points instead of spikes.
My data is formatted as such (MAF = minor allele frequency, which is what I want to graph)
CHR SNP COORD CLST A1 A2 MAF MAC NCHROBS
1 rs### 2#### Region G A 0.4 400 1000
(It goes through all the SNPs for on region, and then it does them for the next region, and so forth)
Any suggestions on how to do this in R? Thanks!
For a simple plot of the coordinates versus frequency here's an example:
#Example data:
MAF=runif(1000,min=0,max=1)
COORD=runif(1000,min=0,max=100000)
test.df=data.frame(COORD,MAF)
#plot
plot(test.df$COORD,test.df$MAF)
In the plot you won't need the example data, but will need to substitute your table name in for test.df
.
If you need to beautify it with colors/labels etc. that can be done too:
plot(test.df$COORD,test.df$MAF, col="red", pch=18)
OR
library(ggplot2)
p=ggplot(test.df,aes(COORD,MAF))
p + geom_point()