I'm getting an error when using the zip(*map(...)) call. Long explanation see below.
TypeError: zip argument #1 must support iteration
Here's what I got. A dataframe containing cities and their location in longitude and latitude. Now I want to calculate the distance between the cities using the harversine formular.
Starting point is this Pandas DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
{'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
{'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
df
Then I'm joining the dataframe with itself in order to get pairs of cities:
df['tmp'] = 1
df2 = pd.merge(df,df,on='tmp')
df2 = df2[df2.city_x != df2.city_y]
Which gives me this:
city_x lat_x lng_x tmp city_y lat_y lng_y
1 Berlin 52.52437 13.41053 1 Potsdam 52.39886 13.06566
2 Berlin 52.52437 13.41053 1 Hamburg 53.57532 10.01534
3 Potsdam 52.39886 13.06566 1 Berlin 52.52437 13.41053
5 Potsdam 52.39886 13.06566 1 Hamburg 53.57532 10.01534
6 Hamburg 53.57532 10.01534 1 Berlin 52.52437 13.41053
7 Hamburg 53.57532 10.01534 1 Potsdam 52.39886 13.06566
Now let's do the important part. The harversine formular is put into a function:
def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float:
"""
Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes
based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula
"""
from math import radians, cos, sin, asin, sqrt
R = 6371 # Radius of earth in kilometers. Use 3956 for miles
lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2])
# haversine formula
dlng = lng2 - lng1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
c = 2 * asin(sqrt(a))
distance = c * R
return distance
This function should then be called on the joined dataframe:
def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
dist = zip(*map(haversine_distance, lng1, lat1, lng2, lat2))
return dist
# now invoke the method in order to get a new column (series) back
get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y'])
Problem/Error: This gives me the following error:
TypeError: zip argument #1 must support iteration
Remark: What I don't get, is why I'm getting the error since this other method (see below) works perfectly fine. Basically the same thing!
def lat_lng_to_cartesian(lat: float, lng: float) -> float:
from math import radians, cos, sin
R = 6371 # Radius of earth in kilometers. Use 3956 for miles
lat_, lng_ = map(radians, [lat, lng])
x = R * cos(lat_) * cos(lng_)
y = R * cos(lat_) * sin(lng_)
z = R * sin(lat_)
return x, y, z
def get_cartesian_coordinates(lat: pd.Series, lng: pd.Series) -> (pd.Series, pd.Series, pd.Series):
if lat is None or lng is None:
return
x, y, z = zip(*map(lat_lng_to_cartesian, lat, lng))
return x, y, z
get_cartesian_coordinates(df2['lat_x'], df2['lng_x'])
As I mentioned in the comments, to be able to use the haversine_distance
in the current way you've defined it, you are going to need to zip
those columns first before mapping
. In essence, you will need to edit the get_haversine_distance
function to make sure that it is zipping
the corresponding rows into tuples before unpacking each tuple into arguments for the haversine_distance
function. The following is an illustration, using the provided data:
import pandas as pd
import numpy as np
df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
{'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
{'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
df
# city lat lng tmp
# 0 Berlin 52.52437 13.41053 1
# 1 Potsdam 52.39886 13.06566 1
# 2 Hamburg 53.57532 10.01534 1
# Make sure to reset the index after you filter out the unneeded rows
df['tmp'] = 1
df2 = pd.merge(df,df,on='tmp')
df2 = df2[df2.city_x != df2.city_y].reset_index(drop=True)
# city_x lat_x lng_x tmp city_y lat_y lng_y
# 0 Berlin 52.52437 13.41053 1 Potsdam 52.39886 13.06566
# 1 Berlin 52.52437 13.41053 1 Hamburg 53.57532 10.01534
# 2 Potsdam 52.39886 13.06566 1 Berlin 52.52437 13.41053
# 3 Potsdam 52.39886 13.06566 1 Hamburg 53.57532 10.01534
# 4 Hamburg 53.57532 10.01534 1 Berlin 52.52437 13.41053
# 5 Hamburg 53.57532 10.01534 1 Potsdam 52.39886 13.06566
def get_haversine_distance(lng1: pd.Series, lat1: pd.Series, lng2: pd.Series, lat2: pd.Series) -> pd.Series:
dist = pd.Series(map(lambda x: haversine_distance(*x), zip(lng1, lat1, lng2, lat2)))
return dist
def haversine_distance(lng1: float, lat1: float, lng2: float, lat2: float) -> float:
"""
Computes the distance in kilometers between two points on a sphere given their longitudes and latitudes
based on the Harversine formula. https://en.wikipedia.org/wiki/Haversine_formula
"""
from math import radians, cos, sin, asin, sqrt
R = 6371 # Radius of earth in kilometers. Use 3956 for miles
lng1, lat1, lng2, lat2 = map(radians, [lng1, lat1, lng2, lat2])
# haversine formula
dlng = lng2 - lng1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
c = 2 * asin(sqrt(a))
distance = c * R
return distance
df2['distance'] = get_haversine_distance(df2['lng_x'], df2['lat_x'], df2['lng_y'], df2['lat_y'])
# city_x lat_x lng_x tmp city_y lat_y lng_y distance
# 0 Berlin 52.52437 13.41053 1 Potsdam 52.39886 13.06566 27.215704
# 1 Berlin 52.52437 13.41053 1 Hamburg 53.57532 10.01534 255.223782
# 2 Potsdam 52.39886 13.06566 1 Berlin 52.52437 13.41053 27.215704
# 3 Potsdam 52.39886 13.06566 1 Hamburg 53.57532 10.01534 242.464120
# 4 Hamburg 53.57532 10.01534 1 Berlin 52.52437 13.41053 255.223782
# 5 Hamburg 53.57532 10.01534 1 Potsdam 52.39886 13.06566 242.464120
Let me know if this is what you expect the output to look like.