I have the following dataframe
in pandas:
import pandas as pd
df = pd.DataFrame({
"CityId": {
"0": 0,
"1": 1,
"2": 2,
"3": 3,
"4": 4
},
"X": {
"0": 316.83673906150904,
"1": 4377.40597216624,
"2": 3454.15819771172,
"3": 4688.099297634771,
"4": 1010.6969517482901
},
"elevation_meters": {
"0": 1,
"1": 2,
"2": 3,
"3": 4,
"4": 5
},
"Y": {
"0": 2202.34070733524,
"1": 336.602082171235,
"2": 2820.0530112481106,
"3": 2935.89805580997,
"4": 3236.75098902635
}
})
I am trying to create a distance matrix that represents the cost of moving between each of these CityIds
. Using pdist
and squareform
from scipy.spatial.distance
I can do the following:
from scipy.spatial.distance import pdist, squareform
df_m = pd.DataFrame(
squareform(
pdist(
df[['CityId', 'X', 'Y']].iloc[:, 1:],
metric='euclidean')
),
index=df.CityId.unique(),
columns= df.CityId.unique()
)
This gives me a distance matrix between all the CityIds
using pairwise distances calculated from pdist
.
I would like to incorporate elevation_meters
into the this distance matrix. What is an efficient way to do so?
You can try scipy.spatial.distance_matrix
:
xx = df[['X','elevation_meters', 'Y']]
pd.DataFrame(distance_matrix(xx,xx), columns= df['CityId'],
index=df['CityId'])
Output:
CityId 0 1 2 3 4
CityId
0 0.000000 4468.691544 3197.555070 4432.386687 1245.577226
1 4468.691544 0.000000 2649.512402 2617.799439 4443.602402
2 3197.555070 2649.512402 0.000000 1239.367465 2478.738402
3 4432.386687 2617.799439 1239.367465 0.000000 3689.688537
4 1245.577226 4443.602402 2478.738402 3689.688537 0.000000