rggplot2giskml

Mapping data from Open Tracks KML file using sf


I'm using OpenTracks on my phone to record business driving for tax reporting. I'm exporting the KML files and then trying to create a map using ggplot2, and the sf package which I think is the right tool (although I can be convinced to change).

The data looks something like this:

<?xml version="1.0" encoding="UTF-8"?>

<kml xmlns="http://www.opengis.net/kml/2.3"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:opentracks="http://opentracksapp.com/xmlschemas/v1"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.opengis.net/kml/2.3 http://schemas.opengis.net/kml/2.3/ogckml23.xsd
                         http://opentracksapp.com/xmlschemas/v1 http://opentracksapp.com/xmlschemas/OpenTracks_v1.xsd">

  <Document>
    <open>1</open>
    <visibility>1</visibility>
    <name><![CDATA[2025-04-10T15:46-04]]></name>
    <atom:generator><![CDATA[OpenTracks]]></atom:generator>
    <Style id="track">
      <LineStyle><color>7f0000ff</color><width>4</width></LineStyle>
      <IconStyle>
        <scale>1.3</scale>
        <Icon />
      </IconStyle>
    </Style>
    <Style id="waypoint"><IconStyle>
      <Icon />
    </IconStyle></Style>
    <Schema id="schema">
      <SimpleArrayField name="speed" type="float">
        <displayName><![CDATA[Speed (m/s)]]></displayName>
      </SimpleArrayField>
      <SimpleArrayField name="power" type="float">
        <displayName><![CDATA[Power (W)]]></displayName>
      </SimpleArrayField>
      <SimpleArrayField name="cadence" type="float">
        <displayName><![CDATA[Cadence (rpm)]]></displayName>
      </SimpleArrayField>
      <SimpleArrayField name="heartrate" type="float">
        <displayName><![CDATA[Heart rate (bpm)]]></displayName>
      </SimpleArrayField>
    </Schema>
    <Placemark>
      <name><![CDATA[2025-04-10T15:46-04]]></name>
      <description><![CDATA[]]></description>
      <opentracks:trackid>30846d55-38a5-44b0-82ab-604bdb15df1f</opentracks:trackid>
      <styleUrl>#track</styleUrl>
      <ExtendedData>
        <Data name="activityType"><value><![CDATA[driving]]></value></Data>
      </ExtendedData>
      <ExtendedData>
        <Data name="type"><value><![CDATA[driving]]></value></Data>
      </ExtendedData>
      <MultiTrack>
        <altitudeMode>absolute</altitudeMode>
        <interpolate>1</interpolate>
        <Track>
          <when>2025-04-10T15:46:50.282-04:00</when>
          <coord/>
          <when>2025-04-10T17:34:56.425-04:00</when>
          <coord>-85.996391 39.897292 192.9928436279297</coord>
          <when>2025-04-10T17:34:57.42-04:00</when>
          <coord>-85.996294 39.897434 193.75526428222656</coord>
          <when>2025-04-10T17:34:58.427-04:00</when>
          <coord>-85.996197 39.897569 193.9382781982422</coord>
          <when>2025-04-10T17:34:59.419-04:00</when>
          <coord>-85.996101 39.897698 194.16091918945312</coord>
          <ExtendedData>
            <SchemaData schemaUrl="#schema">
              <SimpleArrayData name="trackpoint_type">
                <value>SEGMENT_START_MANUAL</value>
                <value>TRACKPOINT</value>
                <value>TRACKPOINT</value>
                <value>IDLE</value>
                <value>TRACKPOINT</value>
              </SimpleArrayData>
              <SimpleArrayData name="speed">
                <value />
                <value>0</value>
                <value>0</value>
                <value>0</value>
                <value>0.6</value>
               </SimpleArrayData>
              <SimpleArrayData name="accuracy_horizontal">
                <value />
                <value>1.8</value>
                <value>3.4</value>
                <value>3.4</value>
                <value>3</value>
               </SimpleArrayData>
            </SchemaData>
          </ExtendedData>
        </Track>
      </MultiTrack>
    </Placemark>
  </Document>
</kml>

I've truncated overwhelmingly most data points, and chose just a random segment. I am most interested in the Track data. I need to strip the xml namespace just to be able to read this.

library(xml2)
library(clock)
library(ggplot2)
library(sf) # has a lot of system dependencies, see docs

xmldoc <- xml_ns_strip(read_xml("~/Drives/2025-04-10.kml"))

df <- data.frame(when = xml_text(xml_find_all(xmldoc, "//when")),
                 coord = xml_text(xml_find_all(xmldoc, "//coord")))

df <- subset(df, df$when != "" & df$coord != "")

# Convert char datetime to precise datetime
df$when <- sys_time_parse(df$when, format = "%Y-%m-%dT%H:%M:%S%Ez", precision = "millisecond")

#df$when <- as_date_time(df$when, zone = "America/Indianapolis") # will this lose precision? Is this necessary for sf processing?

# Split coord data into numeric columns
df[c('lon', 'lat', 'alt')] <- t(do.call("cbind", strsplit(df$coord, split = " ")))
df$lon <- as.numeric(df$lon)
df$lat <- as.numeric(df$lat)
df$alt <- as.numeric(df$alt)

# Testing creating points
ggplot(data = df) + geom_sf(aes(geometry = coord))

This does not output anything. It provides the error:

Warning message: Computation failed in stat_sf(). Caused by error in if (is.na(x)) ...: ! the condition has length > 1

I've tried various different sf functions (including using the lon, lat, alt columns I extracted), with different sorts of errors.

I think sf wants some sort of tuple for the geometry, but I cannot find explicit documentation on how to construct the data. I believe the format from the original kml file is close, but I haven't been able to use it successfully.

How can I plot this data (longitude, latitude, altitude) onto a 2D map using ggplot2?

BONUS: how can I calculate the total distance from the interpolated line? Ideally expressed in miles - I assume converting units would be trivial though.


Solution

  • Regarding st_read(), sf file I/O goes though GDAL and I don't believe GDAL KML drivers support Tracks and Multitracks, this goes for both LIBKML & KML (examples proving me wrong here are more than welcome). This being KML 2.3 instead of more common 2.2 might also play some role.

    Between parsing XML and plotting you seem to skip a step where you'd convert a frame with coordinates to an actual sf object of points ( st_as_sf() ), which you could then turn into a linestring to get its length.

    Following assumes there's a single track per file, as in included example. If there could be more, you'd need to adjust XML parsing a bit to include a grouping variable for each individual track or segment and build a linestring for each group.

    library(xml2)
    library(dplyr)
    library(tidyr)
    library(sf)
    #> Linking to GEOS 3.13.1, GDAL 3.10.2, PROJ 9.5.1; sf_use_s2() is TRUE
    library(ggplot2)
    
    # kml <- "some.kml"
    # or 
    # kml <- '<?xml version="1.0" encoding="UTF-8"?> <kml ...> ...  </kml>`
    
    # assume a single track, get all non-empty <coord>s
    coord_nodes <- 
      read_xml(kml) |> 
      xml_ns_strip() |>
      xml_find_all("//coord[string-length() > 0]")
    
    coords_df <-
      tibble(
        coord = xml_text(coord_nodes),
        # extract timestamps from closest <when> nodes
        ts = xml_find_first(coord_nodes, "./preceding-sibling::when[1]") |> xml_text() |> lubridate::as_datetime() 
      ) |> 
      separate_wider_delim(coord, names = c("lon", "lat", "alt"), delim = " ")
    coords_df
    #> # A tibble: 4 × 4
    #>   lon        lat       alt                ts                 
    #>   <chr>      <chr>     <chr>              <dttm>             
    #> 1 -85.996391 39.897292 192.9928436279297  2025-04-10 21:34:56
    #> 2 -85.996294 39.897434 193.75526428222656 2025-04-10 21:34:57
    #> 3 -85.996197 39.897569 193.9382781982422  2025-04-10 21:34:58
    #> 4 -85.996101 39.897698 194.16091918945312 2025-04-10 21:34:59
    
    # frame to sf object (of points), use WGS84 for coordinate reference system (safe guess for KML);
    # summarise points to a multipoint, keep first and last timestamp; 
    # convert to linestring, get duration and linestring lenght 
    routes_sf <- 
      st_as_sf(coords_df, coords = c("lon", "lat"), crs = "WGS84") |> 
      summarise(
        start = first(ts), 
        stop = last(ts),
        do_union = FALSE
      ) |> 
      st_cast("LINESTRING") |>
      mutate(
        len = st_length(geometry) |> units::set_units(mile),
        duration = stop - start
      )
    routes_sf
    #> Simple feature collection with 1 feature and 4 fields
    #> Geometry type: LINESTRING
    #> Dimension:     XY
    #> Bounding box:  xmin: -85.99639 ymin: 39.89729 xmax: -85.9961 ymax: 39.8977
    #> Geodetic CRS:  WGS 84
    #> # A tibble: 1 × 5
    #>   start               stop                                       geometry    len
    #> * <dttm>              <dttm>                             <LINESTRING [°]> [mile]
    #> 1 2025-04-10 21:34:56 2025-04-10 21:34:59 (-85.99639 39.89729, -85.99629… 0.0320
    #> # ℹ 1 more variable: duration <drtn>
    
    # check with mapview
    mapview::mapview(routes_sf)
    

    
    # plot line + points with ggplot
    routes_sf |> 
      ggplot() +
      geom_sf() +
      geom_sf(data = ~ st_cast(., "POINT")) +
      theme_minimal() +
      theme(axis.text.x = element_text(angle = 45))
    #> Warning in st_cast.sf(., "POINT"): repeating attributes for all sub-geometries
    #> for which they may not be constant
    

    Created on 2025-05-20 with reprex v2.1.1