Does anyone know how to extract minor allele data from grch38 ? the following gives NAs: library("biomaRt"); snp.db3 <- useMart(host = "https://www.ensembl.org/", biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp"); nt.biomart3 <- getBM(attributes = c("refsnp_id", "minor_allele", "minor_allele_freq", "chrom_start", "chrom_strand", "associated_gene"), filters = c("snp_filter"), values = c("rs3762444", "rs284262", "rs655598", "rs12089815", "rs12140153", "rs788163", "rs1064213", "rs1106090", "rs7557796", "rs16825008"), mart = snp.db3, uniqueRows = TRUE); nt.biomart3
The data is apparently available on Ensembl (e.g. https://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=10:120308754-120309754;v=rs10788066;vdb=variation;vf=654619804) but the data extraction method only works if I use host = "https://grch37.ensembl.org" - otherwise it gives NAs for minor allele and minor allele frequency - is there something else I should do ? Or is there a better method ? Or is the minor allele data from grch37 actually up-to-date ?
Online Biomart (Martview) from Ensembl (https://www.ensembl.org/) doesn't show either this information :
That explains why you're getting "NA" with "biomaRt".
Online Biomart (Martview) from Ensembl GRCh37 (https://grch37.ensembl.org/) displays it :
Grch37 seems up to date (Release 112 (May 2024)).
To produce the preceding screenshots :
If you want data from Grch38
, you can try to scrape the data directly from "Ensembl.org".
### Packages
library(rvest)
library(purrr)
library(dplyr)
### SNPs to look for
look=c("rs3762444", "rs284262", "rs655598", "rs12089815", "rs12140153", "rs788163", "rs1064213", "rs1106090", "rs7557796", "rs16825008")
### Function to get the data
biomart=function(x){
z=read_html(paste0("http://www.ensembl.org/homo_sapiens/Variation/Explore?v=",x))
a= z %>%
html_element(xpath = '//span[.="Highest population MAF"]/following-sibling::span/b') %>%
html_text2()
if (is.na(a)) {b<-NA_character_}else{
b = z %>%
html_element(xpath = '//span[.="Highest population MAF"]/following-sibling::span') %>%
html_attr("title") %>%
read_html() %>% html_text2()}
data.frame(SNP=x,MAF=a,source=b)
}
### Map operation
out=map(look,biomart,.progress = TRUE)
### Build the dataframe
output=bind_rows(out)
Output :
SNP MAF source
1 rs3762444 <NA> <NA>
2 rs284262 0.49 T in 1000GENOMES:phase_3:SAS
3 rs655598 0.49 A in 1000GENOMES:phase_3:FIN
4 rs12089815 0.50 G in 1000GENOMES:phase_3:EUR
5 rs12140153 0.18 T in gnomADg:ami
6 rs788163 0.42 C in 1000GENOMES:phase_3:LWK
7 rs1064213 0.50 A in gnomADe:nfe
8 rs1106090 0.50 A in gnomADg:amr
9 rs7557796 0.48 T in 1000GENOMES:phase_3:SAS
10 rs16825008 0.49 A in 1000GENOMES:phase_3:ITU
Notes :