pythonexcelpandas

Grouping Lats and Lons within a distance of 0.00004 of each other using Pandas


I have an excel file with 2 columns titled Lat and Lon with some data such as:

Lat          Lon
36.19553° N 95.90918° W
36.19550° N 95.93592° W
36.20277° N 95.94484° W
36.20277° N 95.95381° W
36.22436° N 95.98023° W
36.21005° N 95.94487° W
36.21006° N 95.93594° W
35.99968° N 96.09681° W
35.97043° N 95.98949° W
35.96317° N 95.98951° W
35.99968° N 96.11459° W
35.99967° N 96.10568° W
35.96318° N 95.99839° W
35.96315° N 96.00728° W
35.99239° N 96.13247° W

I am trying to read in the excel file and group each one that is within a distance of 0.00004 in the last place of each other in either the lat or the lon column into a group. It should start with the first row and check each row looking for another one that is within a distance of 0.00004 and insert a number starting a one in a column called 'Drive' for each grouping.

the expected output is supposed to be:

  Lat          Lon       Drive
0  36.19553  95.90918      1
1  36.19550  95.93592      1
2  36.20277  95.94484      2
3  36.20277  95.95381      2
4  36.22436  95.98023      3
5  36.21005  95.94487      2
6  36.21006  95.93594      1
7  35.99968  96.09681      4
8  35.97043  95.98949      5
9  35.96317  95.98951      5
10  35.99968  96.11459      4
11  35.99967  96.10568      4
12  35.96318  95.99839      5
13  35.96315  96.00728      5
14  35.99239  96.13247      6

I have made several attempt with no success.

here is the latest attempt:
# Read the data into a pandas DataFrame
df = pd.read_excel('data.xlsx')

# Convert Lat and Lon to absolute values for easy comparison
df['Lat'] = df['Lat'].abs()
df['Lon'] = df['Lon'].abs()

# Initialize the counter and group column
counter = 1
df['Drive'] = 0

# Loop over the DataFrame rows
for i in range(len(df)):
    if df['Drive'][i] == 0:
        df.loc[(df['Lat'].between(df['Lat'][i] - 4, df['Lat'][i] + 4)) &
               (df['Lon'].between(df['Lon'][i] - 4, df['Lon'][i] + 4)), 'Drive'] = counter
        counter += 1

# Print the result
print(df)

I get the error bad operand type for abs(): 'str'


Solution

  • # Read the data into a pandas DataFrame
    df = pd.read_excel(workbook_path)
    
    # Extract the degrees from the string value
    df['Lat'] = df['Lat'].str.extract(r'(\d+\.\d+)')
    df['Lon'] = df['Lon'].str.extract(r'(\d+\.\d+)')
    
    df['Lat'] = df['Lat'].astype(float)
    df['Lon'] = df['Lon'].astype(float)
    
    df['Drive'] = 0
    drive = 1
    for i in range(len(df)):
        if df.loc[i, 'Drive'] == 0:
            df.loc[i, 'Drive'] = drive
            for j in range(i + 1, len(df)):
                if (abs(df.loc[i, 'Lat'] - df.loc[j, 'Lat']) <= 0.00004) or (abs(df.loc[i, 'Lon'] - df.loc[j, 'Lon']) <= 0.00004):
                    df.loc[j, 'Drive'] = drive
            drive += 1
    
    print(df)