rkmlr-sf

How to read in KML file properly in R, or separate out lumped variables into columns


I read in a KML file, using the following:

clinics = st_read(dsn = "Data/clinics-kml.kml","CLINICS")

However, all my variables (except for the coordinates) got lumped into 1 column under Description (see below link).

What's the best way to separate the variables out? Alternatively, is there a way to import KML files properly to avoid this issue? You may view the screenshot of the problem here.


Solution

  • The problem (or maybe not) is that the Description column has an html table as a string for each observation. That is fine if you want to parse that html string and get a pretty table for example when creating an interactive web map. But it can be a headache if you just want the data inside.

    So, it's possible to do all the process within R just following these steps:

    1. Download the KML file from internet
    2. Unzip the downloaded file
    3. Read the KML file as a Spatial object
    4. Get the attributes for each observation
    5. Bind the attributes to each observation as new columns

    All the code is commented, see below:

    library(tidyverse)
    library(sf)
    library(mapview)
    library(rvest)
    library(httr)
    
    # 1) Download the kml file
    moh_chas_clinics <- GET("https://data.gov.sg/dataset/31e92629-980d-4672-af33-cec147c18102/download",
                            write_disk(here::here("moh_chas_clinics.zip"), overwrite = TRUE))
    
    # 2) Unzip the downloaded zip file 
    unzip(here::here("moh_chas_clinics.zip"))
    
    # 3) Read the KML file as a Spatial object
    moh_chas_clinics <- read_sf(here::here("chas-clinics-kml.kml"))
    
    # Watch data
    moh_chas_clinics %>%
      glimpse()
    
    # See map
    mapview(moh_chas_clinics)
    
    # 4) Get the attributes for each observation
    
    # Option a) Using a simple lapply
    attributes <- lapply(X = 1:nrow(moh_chas_clinics), 
                         FUN = function(x) {
    
                           moh_chas_clinics %>% 
                             slice(x) %>%
                             pull(Description) %>%
                             read_html() %>%
                             html_node("table") %>%
                             html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
                             as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>% 
                             pivot_wider(names_from = Attribute, values_from = Value)
    
                         })
    
    # Option b) Using a Parallel lapply (faster)
    future::plan("multisession")
    
    attributes <- future.apply::future_lapply(X = 1:nrow(moh_chas_clinics), 
                                              FUN = function(x) {
    
                                                moh_chas_clinics %>% 
                                                  slice(x) %>%
                                                  pull(Description) %>%
                                                  read_html() %>%
                                                  html_node("table") %>%
                                                  html_table(header = TRUE, trim = TRUE, dec = ".", fill = TRUE) %>%
                                                  as_tibble(.name_repair = ~ make.names(c("Attribute", "Value"))) %>% 
                                                  pivot_wider(names_from = Attribute, values_from = Value)
    
                                              })
    
    # 5) Bind the attributes to each observation as new columns
    moh_chas_clinics_attr <- 
      moh_chas_clinics %>%
      bind_cols(bind_rows(attributes)) %>%
      select(-Description)
    
    # Watch new data
    moh_chas_clinics_attr %>%
      glimpse()
    
    # New map
    mapview(moh_chas_clinics_attr, 
            zcol = "CLINIC_PROGRAMME_CODE", 
            layer.name = "Clinic Programme Code")
    

    A final map as an example showing all the attributes for a point and coloured by "Clinic Programme Code":

    enter image description here