rregexstringcoordinates

Convert latitude and longitude string vector into data frame


I am struggling to parse the location strings I have in my data.

The location is inconveniently set up as a string with both the latitude and longitude info bundled together and I want to extract that info into a separate variable for each (and for each observation).

The data I'm trying to parse looks like this:

ID <- c(1, 2, 3)
location_1 <- c("lat:10.1234567,lng:-70.1234567", "lat:20.1234567891234,lng:-80.1234567891234", "lat:30.1234567,lng:-90.1234567")

df <- data.frame(ID, location_1)

ID   location_1
1     lat:10.1234567,lng:-70.1234567                                                
2     lat:20.1234567891234,lng:-80.1234567891234
3     lat:30.1234567,lng:-90.1234567

I'm trying to get them to look like this:

ID  latitude            longitude
1   10.1234567          -70.1234567
2   20.1234567891234    -80.1234567891234
3   30.12345            -90.12345

I've tried a few different solutions but I can't quite figure out the right phrasing to extract the coordinates.

One I tried was

f <- data.frame(Latitude = str_extract_all(dl$location_1, "\\d+\\.\\d+")[[1]], 
                 Longitude = str_extract_all(dl$location_1, "(?<=,\\s)\\d+\\.\\d+(?=\\))")[[1]])

another was

strcapture("\\(([-0-9.]+)\\s+([-0-9.]+)", location_1, proto = list(lon = 1,lat = 1))

but neither quite fit my original data so I keep getting NAs.


Solution

  • I use tidyr::separate_wider_delim to separate your single column into two columns, breaking at the comma. Then, with dplyr::across we can apply readr::parse_number to parse the number out of the string for both columns:

    library(tidyr)
    library(dplyr)
    library(readr)
    df |>
      separate_wider_delim(location_1, delim = ",", names = c("lat", "lon")) |>
      mutate(across(c(lat, lon), parse_number))
    # # A tibble: 3 × 3
    #      ID   lat   lon
    #   <dbl> <dbl> <dbl>
    # 1     1  10.1 -70.1
    # 2     2  20.1 -80.1
    # 3     3  30.1 -90.1