rdatabaseextract

Obtain all attributes from an rgbif occurrence download


I have downloaded occurrence data using the rgbif package successfully. However, the attributes that are available seem restricted. Specifically, I would like: Habitat, Identification remarks and Occurrence remarks. Is there a way to obtain that information via R?

library(rgbif)
# Request data 
occ_download(pred_and(pred("phylumKey", 35), pred("gadm","ETH")),
             format = "SIMPLE_CSV")

# Check download status using occurrence number,
# see result in console from previous step
occ_download_wait("download occurrence number")

# Load data
df <- occ_download_get("download occurrence number") %>%
  occ_download_import()

Explore data : no attributes for habitat, etc

names(df)
 [1] "gbifID"                           "datasetKey"                       "occurrenceID"                     "kingdom"                         
 [5] "phylum"                           "class"                            "order"                            "family"                          
 [9] "genus"                            "species"                          "infraspecificEpithet"             "taxonRank"                       
[13] "scientificName"                   "verbatimScientificName"           "verbatimScientificNameAuthorship" "countryCode"                     
[17] "locality"                         "stateProvince"                    "occurrenceStatus"                 "individualCount"                 
[21] "publishingOrgKey"                 "decimalLatitude"                  "decimalLongitude"                 "coordinateUncertaintyInMeters"   
[25] "coordinatePrecision"              "elevation"                        "elevationAccuracy"                "depth"                           
[29] "depthAccuracy"                    "eventDate"                        "day"                              "month"                           
[33] "year"                             "taxonKey"                         "speciesKey"                       "basisOfRecord"                   
[37] "institutionCode"                  "collectionCode"                   "catalogNumber"                    "recordNumber"                    
[41] "identifiedBy"                     "dateIdentified"                   "license"                          "rightsHolder"                    
[45] "recordedBy"                       "typeStatus"                       "establishmentMeans"               "lastInterpreted"                 
[49] "mediaType"                        "issue"     

Solution

  • Thanks to the comment by @Grzegorz Sapijaszko I was able to resolve the issue. I had requested SIMPLE_CSV as the file format for download (as in many of the tutorials I had followed). However, the number of attributes available from a SIMPLE_CSV is limited. To get all available data, use DWCA as a method of download. This downloads data into multiple separate text files into a folder. Here is the updated code (also a simple tweak to make it more easily reproducible by saving the download number into a res object and changed the phylum to one with fewer species to speed up the example).

    library(rgbif)
    # Request data 
    res<-occ_download(pred_and(pred("phylumKey", 13), pred("gadm","ETH")),
                 format = "DWCA")#!!!! change SIMPLE_CSV to DWCA
    
    # Check download status using occurrence number,
    # see result in console from previous step
    occ_download_wait(res)
    
    # Load data
    df <- occ_download_get(res) %>%
      occ_download_import()