I have an excel file with 2 columns titled Lat and Lon with some data such as:
Lat Lon
36.19553° N 95.90918° W
36.19550° N 95.93592° W
36.20277° N 95.94484° W
36.20277° N 95.95381° W
36.22436° N 95.98023° W
36.21005° N 95.94487° W
36.21006° N 95.93594° W
35.99968° N 96.09681° W
35.97043° N 95.98949° W
35.96317° N 95.98951° W
35.99968° N 96.11459° W
35.99967° N 96.10568° W
35.96318° N 95.99839° W
35.96315° N 96.00728° W
35.99239° N 96.13247° W
I am trying to read in the excel file and group each one that is within a distance of 0.00004 in the last place of each other in either the lat or the lon column into a group. It should start with the first row and check each row looking for another one that is within a distance of 0.00004 and insert a number starting a one in a column called 'Drive' for each grouping.
the expected output is supposed to be:
Lat Lon Drive
0 36.19553 95.90918 1
1 36.19550 95.93592 1
2 36.20277 95.94484 2
3 36.20277 95.95381 2
4 36.22436 95.98023 3
5 36.21005 95.94487 2
6 36.21006 95.93594 1
7 35.99968 96.09681 4
8 35.97043 95.98949 5
9 35.96317 95.98951 5
10 35.99968 96.11459 4
11 35.99967 96.10568 4
12 35.96318 95.99839 5
13 35.96315 96.00728 5
14 35.99239 96.13247 6
I have made several attempt with no success.
here is the latest attempt:
# Read the data into a pandas DataFrame
df = pd.read_excel('data.xlsx')
# Convert Lat and Lon to absolute values for easy comparison
df['Lat'] = df['Lat'].abs()
df['Lon'] = df['Lon'].abs()
# Initialize the counter and group column
counter = 1
df['Drive'] = 0
# Loop over the DataFrame rows
for i in range(len(df)):
if df['Drive'][i] == 0:
df.loc[(df['Lat'].between(df['Lat'][i] - 4, df['Lat'][i] + 4)) &
(df['Lon'].between(df['Lon'][i] - 4, df['Lon'][i] + 4)), 'Drive'] = counter
counter += 1
# Print the result
print(df)
I get the error bad operand type for abs(): 'str'
# Read the data into a pandas DataFrame
df = pd.read_excel(workbook_path)
# Extract the degrees from the string value
df['Lat'] = df['Lat'].str.extract(r'(\d+\.\d+)')
df['Lon'] = df['Lon'].str.extract(r'(\d+\.\d+)')
df['Lat'] = df['Lat'].astype(float)
df['Lon'] = df['Lon'].astype(float)
df['Drive'] = 0
drive = 1
for i in range(len(df)):
if df.loc[i, 'Drive'] == 0:
df.loc[i, 'Drive'] = drive
for j in range(i + 1, len(df)):
if (abs(df.loc[i, 'Lat'] - df.loc[j, 'Lat']) <= 0.00004) or (abs(df.loc[i, 'Lon'] - df.loc[j, 'Lon']) <= 0.00004):
df.loc[j, 'Drive'] = drive
drive += 1
print(df)