I have a python code where, when given a list of points (X and Y coordinate pairs) in a dataframe and a specified circle radius I would like the code to go through each point, treat it as the the center of the circle, and find how many other points are inside the circle selection.
Looking around Ive seen people suggest a similar problem using longitudes and latitudes can be solved with BallTree from SciKit-Learn so I've tried that but I'm not getting the answers I'm expecting. My code is below:
import numpy as np
import pandas as pd
from sklearn.neighbors import BallTree
df = pd.DataFrame({'id':list('abcde'),'X': [10, 1000, 1010, 5000, 5100],'Y': [10, 1000, 1010, 5000, 5100]})
radius = int(input('Enter the selection radius of the circle:'))
coords = df[["X","Y"]]
tree = BallTree(coords, metric='haversine')
answers = tree.query_radius(coords, r=radius, count_only=True)
print(answers)
For example when i do radius = 100
i get an answer of [1 1 1 1 1]
which is not correct. Any ideas how to get this to work in the simplest way in python?
The issue you're encountering is because the BallTree is being used with the haversine metric, which is designed for computing distances on a spherical surface, typically used with longitude and latitude coordinates. However, in your case, you're working with Cartesian coordinates (X, Y), where the Euclidean distance should be used instead.
You should replace this line:
tree = BallTree(coords, metric='haversine')
With:
tree = BallTree(coords, metric='euclidean')