I'm using OpenTracks on my phone to record business driving for tax reporting. I'm exporting the KML files and then trying to create a map using ggplot2
, and the sf
package which I think is the right tool (although I can be convinced to change).
The data looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.3"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:opentracks="http://opentracksapp.com/xmlschemas/v1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/kml/2.3 http://schemas.opengis.net/kml/2.3/ogckml23.xsd
http://opentracksapp.com/xmlschemas/v1 http://opentracksapp.com/xmlschemas/OpenTracks_v1.xsd">
<Document>
<open>1</open>
<visibility>1</visibility>
<name><![CDATA[2025-04-10T15:46-04]]></name>
<atom:generator><![CDATA[OpenTracks]]></atom:generator>
<Style id="track">
<LineStyle><color>7f0000ff</color><width>4</width></LineStyle>
<IconStyle>
<scale>1.3</scale>
<Icon />
</IconStyle>
</Style>
<Style id="waypoint"><IconStyle>
<Icon />
</IconStyle></Style>
<Schema id="schema">
<SimpleArrayField name="speed" type="float">
<displayName><![CDATA[Speed (m/s)]]></displayName>
</SimpleArrayField>
<SimpleArrayField name="power" type="float">
<displayName><![CDATA[Power (W)]]></displayName>
</SimpleArrayField>
<SimpleArrayField name="cadence" type="float">
<displayName><![CDATA[Cadence (rpm)]]></displayName>
</SimpleArrayField>
<SimpleArrayField name="heartrate" type="float">
<displayName><![CDATA[Heart rate (bpm)]]></displayName>
</SimpleArrayField>
</Schema>
<Placemark>
<name><![CDATA[2025-04-10T15:46-04]]></name>
<description><![CDATA[]]></description>
<opentracks:trackid>30846d55-38a5-44b0-82ab-604bdb15df1f</opentracks:trackid>
<styleUrl>#track</styleUrl>
<ExtendedData>
<Data name="activityType"><value><![CDATA[driving]]></value></Data>
</ExtendedData>
<ExtendedData>
<Data name="type"><value><![CDATA[driving]]></value></Data>
</ExtendedData>
<MultiTrack>
<altitudeMode>absolute</altitudeMode>
<interpolate>1</interpolate>
<Track>
<when>2025-04-10T15:46:50.282-04:00</when>
<coord/>
<when>2025-04-10T17:34:56.425-04:00</when>
<coord>-85.996391 39.897292 192.9928436279297</coord>
<when>2025-04-10T17:34:57.42-04:00</when>
<coord>-85.996294 39.897434 193.75526428222656</coord>
<when>2025-04-10T17:34:58.427-04:00</when>
<coord>-85.996197 39.897569 193.9382781982422</coord>
<when>2025-04-10T17:34:59.419-04:00</when>
<coord>-85.996101 39.897698 194.16091918945312</coord>
<ExtendedData>
<SchemaData schemaUrl="#schema">
<SimpleArrayData name="trackpoint_type">
<value>SEGMENT_START_MANUAL</value>
<value>TRACKPOINT</value>
<value>TRACKPOINT</value>
<value>IDLE</value>
<value>TRACKPOINT</value>
</SimpleArrayData>
<SimpleArrayData name="speed">
<value />
<value>0</value>
<value>0</value>
<value>0</value>
<value>0.6</value>
</SimpleArrayData>
<SimpleArrayData name="accuracy_horizontal">
<value />
<value>1.8</value>
<value>3.4</value>
<value>3.4</value>
<value>3</value>
</SimpleArrayData>
</SchemaData>
</ExtendedData>
</Track>
</MultiTrack>
</Placemark>
</Document>
</kml>
I've truncated overwhelmingly most data points, and chose just a random segment. I am most interested in the Track
data. I need to strip the xml
namespace just to be able to read this.
library(xml2)
library(clock)
library(ggplot2)
library(sf) # has a lot of system dependencies, see docs
xmldoc <- xml_ns_strip(read_xml("~/Drives/2025-04-10.kml"))
df <- data.frame(when = xml_text(xml_find_all(xmldoc, "//when")),
coord = xml_text(xml_find_all(xmldoc, "//coord")))
df <- subset(df, df$when != "" & df$coord != "")
# Convert char datetime to precise datetime
df$when <- sys_time_parse(df$when, format = "%Y-%m-%dT%H:%M:%S%Ez", precision = "millisecond")
#df$when <- as_date_time(df$when, zone = "America/Indianapolis") # will this lose precision? Is this necessary for sf processing?
# Split coord data into numeric columns
df[c('lon', 'lat', 'alt')] <- t(do.call("cbind", strsplit(df$coord, split = " ")))
df$lon <- as.numeric(df$lon)
df$lat <- as.numeric(df$lat)
df$alt <- as.numeric(df$alt)
# Testing creating points
ggplot(data = df) + geom_sf(aes(geometry = coord))
This does not output anything. It provides the error:
Warning message: Computation failed in
stat_sf()
. Caused by error inif (is.na(x)) ...
: ! the condition has length > 1
I've tried various different sf
functions (including using the lon
, lat
, alt
columns I extracted), with different sorts of errors.
I think sf
wants some sort of tuple for the geometry, but I cannot find explicit documentation on how to construct the data. I believe the format from the original kml
file is close, but I haven't been able to use it successfully.
How can I plot this data (longitude, latitude, altitude) onto a 2D map using ggplot2
?
BONUS: how can I calculate the total distance from the interpolated line? Ideally expressed in miles - I assume converting units would be trivial though.
Regarding st_read()
, sf
file I/O goes though GDAL and I don't believe GDAL KML drivers support Track
s and Multitrack
s, this goes for both LIBKML & KML (examples proving me wrong here are more than welcome). This being KML 2.3 instead of more common 2.2 might also play some role.
Between parsing XML and plotting you seem to skip a step where you'd convert a frame with coordinates to an actual sf
object of points ( st_as_sf()
), which you could then turn into a linestring to get its length.
Following assumes there's a single track per file, as in included example. If there could be more, you'd need to adjust XML parsing a bit to include a grouping variable for each individual track or segment and build a linestring for each group.
library(xml2)
library(dplyr)
library(tidyr)
library(sf)
#> Linking to GEOS 3.13.1, GDAL 3.10.2, PROJ 9.5.1; sf_use_s2() is TRUE
library(ggplot2)
# kml <- "some.kml"
# or
# kml <- '<?xml version="1.0" encoding="UTF-8"?> <kml ...> ... </kml>`
# assume a single track, get all non-empty <coord>s
coord_nodes <-
read_xml(kml) |>
xml_ns_strip() |>
xml_find_all("//coord[string-length() > 0]")
coords_df <-
tibble(
coord = xml_text(coord_nodes),
# extract timestamps from closest <when> nodes
ts = xml_find_first(coord_nodes, "./preceding-sibling::when[1]") |> xml_text() |> lubridate::as_datetime()
) |>
separate_wider_delim(coord, names = c("lon", "lat", "alt"), delim = " ")
coords_df
#> # A tibble: 4 × 4
#> lon lat alt ts
#> <chr> <chr> <chr> <dttm>
#> 1 -85.996391 39.897292 192.9928436279297 2025-04-10 21:34:56
#> 2 -85.996294 39.897434 193.75526428222656 2025-04-10 21:34:57
#> 3 -85.996197 39.897569 193.9382781982422 2025-04-10 21:34:58
#> 4 -85.996101 39.897698 194.16091918945312 2025-04-10 21:34:59
# frame to sf object (of points), use WGS84 for coordinate reference system (safe guess for KML);
# summarise points to a multipoint, keep first and last timestamp;
# convert to linestring, get duration and linestring lenght
routes_sf <-
st_as_sf(coords_df, coords = c("lon", "lat"), crs = "WGS84") |>
summarise(
start = first(ts),
stop = last(ts),
do_union = FALSE
) |>
st_cast("LINESTRING") |>
mutate(
len = st_length(geometry) |> units::set_units(mile),
duration = stop - start
)
routes_sf
#> Simple feature collection with 1 feature and 4 fields
#> Geometry type: LINESTRING
#> Dimension: XY
#> Bounding box: xmin: -85.99639 ymin: 39.89729 xmax: -85.9961 ymax: 39.8977
#> Geodetic CRS: WGS 84
#> # A tibble: 1 × 5
#> start stop geometry len
#> * <dttm> <dttm> <LINESTRING [°]> [mile]
#> 1 2025-04-10 21:34:56 2025-04-10 21:34:59 (-85.99639 39.89729, -85.99629… 0.0320
#> # ℹ 1 more variable: duration <drtn>
# check with mapview
mapview::mapview(routes_sf)
# plot line + points with ggplot
routes_sf |>
ggplot() +
geom_sf() +
geom_sf(data = ~ st_cast(., "POINT")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45))
#> Warning in st_cast.sf(., "POINT"): repeating attributes for all sub-geometries
#> for which they may not be constant
Created on 2025-05-20 with reprex v2.1.1