pythonpandasdistance-matrix

Exporting distance matrix pairs into columns in .xlsx/.csv in python


Probably a pretty basic export but I didn't manage to extract the values for every combination in the distance matrix.

The code to create the distance matrix is very basic and looks as follows:

dist = DistanceMetric.get_metric('haversine')
output = pd.DataFrame(dist.pairwise(df[['latitude', 'longitude']].to_numpy())*6367.45,  columns=df.id.unique(), index=df.id.unique())

resulting in a distance matrix of form:

           1       2       3
  1   0.0000  1.4072  0.5405
  2   1.4072  0.0000  1.8499
  3   0.5405  1.8499  0.0000

I didn't find a way to export it to an xlsx or csv in this format:

1  1  0.0000
1  2  1.4072
1  3  0.5405
2  1  1.4072
2  2  0.0000
2  3  1.8499
... 

Any tips how to retrieve the values for each combination?


Solution

  • You should start by unstacking your dataframe: result = df.unstack().

    Now you have a DataFrame with this shape :

    1  1    0.0000
       2    1.4072
       3    0.5405
    2  1    1.4072
       2    0.0000
       3    1.8499
    ...
    

    Now, if you want to save it in a csv file, just call to_csv() method with appropriate arguments: result.to_csv('result.csv', header=False, index=True, index_label=[0,1])

    Note that index_label argument allows you to define the sequence in which you MultiIndex (from unstack()) is written in the csv columns.