pythondataframedistancepyprojgeographic-distance

Calculating Geographic distances between each row of X,Y points in a python dataframe


Python Beginner here. I have a python dataframe consisting of X,Y points that looks similar to this:

XY TABLE

What I want to do is look at row 1 and find the distance between row 1 and row 2 and output the new distance between those 2 X,Y locations to a new column called "dist". Then do the same for row 2 and 3 and so on. My X,Y data is much larger than this, but this is the basis for my problem. Ultimately the data stops, each point is making up a larger polyline so the end point will have a zero distance.

I'm aware I can use geopy, numpy, and pyproj as few. I initially tried haversine distance but was having issues importing the python module. I'm not sure how to approach this problem using those modules, do I need a search cursor and apply that to each row? So, If I have a polyline with nodes, calculating the distances between each of those nodes. These are coordinates in real locations on earth, so not a cartesian coordinate system, if you will


Solution

  • In order to calculate distances between following points you can use an approach below. For the testing purposes I defined corners of a rectangle.

    X = [0, 1, 1, 0, 0]
    Y = [0, 0, 1, 1, 0]
    
    df = pd.DataFrame({"X": X, "Y": Y})
    
    df["X_lag"] = df["X"].shift(1)
    df["Y_lag"] = df["Y"].shift(1)
    
    
    distances = np.sqrt((df['X']-df["X_lag"])**2+(df['Y']-df["Y_lag"])**2)
    print(distances)
    

    this gives a pandas Series with the following values: [nan, 1.0, 1.0, 1.0, 1.0]

    So now you can drop lag columns with df.drop(["X_lag", "Y_lag"], axis=1, inplace=True) and you get:

    X  Y    distance
    0  0       NaN
    1  0       1.0
    1  1       1.0
    0  1       1.0
    0  0       1.0
    

    For a geographic distance you can import geopy.distance and apply the following code. It will interpret previous numbers as degrees.

    def calc_orthodromic(row):
        try:
            return geopy.distance.geodesic(row["XY"], row["XY_lag"]).m
        except:
            return np.NaN
    
    df['XY'] = list(zip(df["X"], df["Y"]))
    df['XY_lag'] = list(zip(df["X_lag"], df["Y_lag"]))
    
    df['distance'] = df.apply(calc_orthodromic, axis=1)
    

    Which give distance in meters: [nan, 110574.3885578, 111302.64933943, 110574.3885578, 111319.49079327, 156899.56829134]