I have a large number of latitude and longitude values that I would like to bin together in order to display them on a heatmap (ipyleaflet
only seems to allow 2000 or so points in the heatmap and this would also be much more efficient when using big data).
I am actually using vaex
, but a answer in terms of pandas
is also fine.
The pandas
pd.cut
function seems to be helpful in terms of binning, however it produces a categorical column (category
dtype
) which looks like a list of all the values in the bin. Is there some way of changing this to just be an incremental number identifying each bin (thanks jezreal for that part of the answer)? I just need a bin number in order to then groupby
the bin number and average (mean
) on the latitude and longitude columns. I also need a count for the intensity of the heatmap entry.
For example:
dft = pd.DataFrame({
'latitude': [1.5, 0.5, 1.2, 0.9, 3],
'longitude': [3, 0.2, 2, 0.2, 1.1]
})
dft['bin'] = pd.cut(dft['latitude'], bins=3, labels=False).astype(str) + "_" + pd.cut(dft['longitude'], bins=3, labels=False).astype(str)
dft.groupby('bin').agg(['mean', 'count']).unstack()
Almost gives me the answer, but I think I want this output instead:
bin latitude_mean longitude_mean count
0_0 0.7 0.2 2
0_1 1.2 2.0 1
1_2 1.5 3.0 1
2_0 3.0 1.1 1
It would be helpful if the count can be normalized between 1 and 1000.
How can I use pandas
pd.cut
or something else to groupby
the bin in the rows, average on the latitude and longitude and (heatmap intensity) count in the columns?
The pandas pd.cut function seems to be helpful in terms of binning, however it produces a categorical column (category dtype) which looks like a list of all the values in the bin. Is there some way of changing this to just be an incremental number identifying each bin
Yes, use label=False
parameter in cut
:
labels array or False, default None
Specifies the labels for the returned bins. Must be the same length as the resulting bins. If False, returns only integer indicators of the bins.
Last use GroupBy.agg
for aggregation and last normalize count
column:
df = dft.groupby('bin').agg(latitude_mean=('latitude','mean'),
longitude_mean=('longitude','mean'),
count=('latitude','count'))
#https://stackoverflow.com/a/50028155/2901002
a, b = 1, 1000
x, y = df['count'].min(),df['count'].max()
df['count'] = (df['count'] - x) / (y - x) * (b - a) + a
print (df)
latitude_mean longitude_mean count
bin
0_0 0.7 0.2 1000.0
0_1 1.2 2.0 1.0
1_2 1.5 3.0 1.0
2_0 3.0 1.1 1.0