rpubmed

Problems with RISmed and large(ish) data sets


RISmed (well, technically EUtilsGet) has decided to be problematic. Trying to download the titles from ~3000 PMIDs results in an error:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'object' in selecting a method for function 'YearPubmed': cannot open the connection to 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?

I'm not including example code because the problem is sporadic (and if the issue is due to too many queries i don't want to trigger an onslaught of folks trying to replicate and saturating the NCBI servers!)

I suspect it -might- be due to trying to push through so many PMIDs in a single request, but like i said it's sporadic....it does work sometimes, less frequently now than in the past.

An alternative might be to download in chunks, or to use entrez and its web history function, but without knowing exactly why the RISmed EUtilsGet is choking I have no confidence that those approaches wouldn't fail either.


Solution

  • The API doc says:

    There is no set maximum for the number of UIDs that can be passed to ESummary, but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method.

    But that is not honored in the code of the RISmed package. The EUtilsGet function calls EUtilsSubGet and it just collapses the IDs. Here's the beginning of thecode:

    RISmed:::EUtilsSubGet
    # the ::: operator will sometimes retrieve non-exported code.
    function (ids, type = "efetch", db = "pubmed") 
    {
        FetchURL <- EUtilsURL(type, db = db)
        IDStr <- collapse("id=", paste(ids, collapse = ","))
        EUtilsFetch <- collapse(FetchURL, IDStr)
     Remainer snipped
    

    So you have perhaps unwittingly been sending requests that are more than 10 times as long as the API promises to handle. You can either ask the maintainer for an enhancement that follows the API of search for another package that does adhere to the published parameters.