rrentrez

How to retrieve data using the rentrez package by giving a list of query names instead of a single one?


So I'm trying to use the rentrez package to retrieve DNA sequence data from GenBank, giving as input a list of species. What I've done is create a vector for the species I want to query, followed by creating a term where I specify the types of sequence data I want to retrieve, then creating a search that retrieves all the occurrences that match my query, and finally I create data where I retrieve the actual sequence data in fasta file.

library(rentrez)

species<-c("Ablennes hians","Centrophryne spinulosa","Doratonotus megalepis","Entomacrodus cadenati","Katsuwonus pelamis","Lutjanus fulgens","Pagellus erythrinus")

for (x in species){
term<-paste(x,"[Organism] AND (((COI[Gene] OR CO1[Gene] OR COXI[Gene] OR COX1[Gene]) AND (500[SLEN]:3000[SLEN])) OR complete genome[All Fields] OR mitochondrial genome[All Fields])",sep='',collapse = NULL)
search<-entrez_search(db="nuccore",term=term,retmax=99999)
data<-entrez_fetch(db="nuccore",id=search$ids,rettype="fasta")
}

Basically what I'm trying to do is concatenate the results of the queries for each species into a single variable. I began using a for cycle but I see it makes no sense in this form because the data of each new species that is being queried is just replacing the previous one in data.

For some elements of species, there will be no data to retrieve and R shows this error:

Error: Vector of IDs to send to NCBI is empty, perhaps entrez_search or entrez_link found no hits?

In the cases where this error is shown and therefore there is no data for that particular species, I wanted the code to just keep going and ignore that.

My output would be a variable data which would include the sequence data retrived, from all the names in species.


Solution

  • library(rentrez)
    
    species<-c("Ablennes hians","Centrophryne spinulosa","Doratonotus megalepis","Entomacrodus cadenati","Katsuwonus pelamis","Lutjanus fulgens","Pagellus erythrinus")
    
    data <- list()
    
    for (x in species){
      term<-paste(x,"[Organism] AND (((COI[Gene] OR CO1[Gene] OR COXI[Gene] OR COX1[Gene]) AND (500[SLEN]:3000[SLEN])) OR complete genome[All Fields] OR mitochondrial genome[All Fields])",sep='',collapse = NULL)
      search<-entrez_search(db="nuccore",term=term,retmax=99999)
      data[x] <- tryCatch({entrez_fetch(db="nuccore",id=search$ids,rettype="fasta")},
                          error = function(e){NA})
    }