I'm trying to add a "distance" column to a huge (near 6 million rows) dataframe with coordinate information as start_lng
, start_lat
, end_lng
, end_lat
columns.
I have tried the following:
trips$distance <- distm(c(trips$start_lng, trips$start_lat), c(trips$end_lng, trips$end_lat), fun = distHaversine)`
to which I get:
"Error in .pointsToMatrix(x) : Wrong length for a vector, should be 2"
I checked the answers in here and the solution should be:
trips %>%
rowwise() %>%
mutate(distance = distHaversine(c(trips$start_lng, trips$start_lat), c(trips$end_lng, trips$end_lat)))
but I still get the same error: "base::stop("Wrong length for a vector, should be 2")"
I have also tried using cbind()
instead of c()
but "cannot allocate vector of size 123096.7 Gb"
Using c()
joins the two vectors together so c(trips$end_lng, trips$end_lat)
isn't of length 2, it's length is equal to twice the number of rows in your data set. This is why the approach isn't working.
Your second approach is almost correct (although you don't need to use trips$
), see this small example:
trips <- tibble::tibble(
start_lng = c(56.2, 57.3, 56.2, 58.3),
start_lat = c(76.2, 73.3, 76.2, 78.3),
end_lng = c(56.3, 57.1, 56.5, 58.2),
end_lat = c(75.2, 74.3, 75.3, 77.3)
)
trips %>%
rowwise() %>%
mutate(distance = geosphere::distHaversine(c(start_lng, start_lat),
c(end_lng, end_lat)))
The "cannot allocate vector of size 123096.7 Gb"
warning is due to insufficient RAM.