I have created the following pandas dataframe:
import pandas as pd
import numpy as np
from math import sin, cos, sqrt, atan2, radians
ds1 = {'Longitude':[-46.6736,-46.50926,-46.75166,-46.54743], "Latitude" : [-23.69057,-23.41165,-23.51482,-23.42598]}
df1 = pd.DataFrame(data=ds1)
which looks like this:
print(df1)
Longitude Latitude
0 -46.67360 -23.69057
1 -46.50926 -23.41165
2 -46.75166 -23.51482
3 -46.54743 -23.42598
I need to calculate the distance in KM from a list of Brazilian cities, for which I have latitude and longitude, as follows:
coordinates = {
"rio" : [-23.02,-43.474889],
"curitiba" : [-25.38792,-49.27741],
"portoAlegre" : [-29.98115,-51.19597],
"salvador" : [-12.97369,-38.43908],
"manaus" :[-3.012972,-59.926802],
"campoGrande" : [-20.52243,-54.58743],
"beloHorizonte" : [-19.79722,-43.95691],
"portoVelho" : [-8.774148,-63.851237],
"recife" : [-8.12673,-34.90491],
"boaVista" : [2.844999,-60.718089],
"fortaleza" : [-3.76489,-38.51496],
"rioBranco" : [-9.972341,-67.801294],
"palmas" : [-10.165953,-48.880833],
"natal" : [-5.79861,-35.18398],
"aracaju" : [-10.972717,-37.068985],
"teresina" : [-5.10247,-42.79552]
}
The distance in KM is calculated by the following function:
def radius(latitude1, longitude1, latitude2, longitude2):
R = 6373.0
lat1 = radians(latitude1)
lon1 = radians(longitude1)
lat2 = radians(latitude2)
lon2 = radians(longitude2)
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = R * c
return distance
Now I calculate the distance of each record in dataframe df1
from Rio de Janeiro ("rio" : [-23.02,-43.474889]
):
df1['distanceFromRio'] = radius(coordinates["rio"][0],coordinates["rio"][1],df1['Latitude'],df1['Longitude'])
And I get the following error:
TypeError: cannot convert the series to <class 'float'>
Now, I prefer to avoid to use arrays/lists since I need to do the same for all cities listed in coordinates
, so I will need to calculate:
- distanceFromCuritiba
- distanceFromPortoAlegre
- etc.
Does anyone know a way to do it in Python, please?
Your function radius
expects 4 floats as arguments, however df1['Latitude']
and df1['Longitude']
are pandas Series, both containing multiple floats.
Instead trying to pass Series as function arguments, you need to apply your function to each row of the DataFrame using apply
method.
Your radius
function looks good, so I suggest adding one more intermediate function handling the application to maintain readability:
def apply_radius(row, coords):
return radius(coords[0], coords[1], row["Latitude"], row["Longitude"])
df1["distanceFromRio"] = df1.apply(
lambda row: apply_radius(row, coordinates["rio"]),
axis=1
)
df1["distanceFromRio"]
Output:
0 335.038388
1 313.220640
2 339.317187
3 317.290975
Name: distanceFromRio, dtype: float64