I try to use geomesa with redis. I thought that redis enables statistics on geomesa by default.
my redis geomesa db:
./geomesa-redis describe-schema -u localhost:6379 -c geomesa -f SignalBuilder
INFO Describing attributes of feature 'SignalBuilder'
geo | Point (Spatio-temporally indexed) (Spatially indexed)
time | Date (Spatio-temporally indexed) (Attribute indexed)
cam | String (Attribute indexed) (Attribute indexed)
imei | String
dir | Double
alt | Double
vlc | Double
sl | Integer
ds | Integer
dir_y | Double
poi_azimuth_x | Double
poi_azimuth_y | Double
User data:
geomesa.attr.splits | 0
geomesa.feature.expiry | time(30 days)
geomesa.id.splits | 0
geomesa.index.dtg | time
geomesa.indices | z3:7:3:geo:time,z2:5:3:geo,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time
geomesa.stats.enable | true
geomesa.table.partition | time
geomesa.z.splits | 0
geomesa.z3.interval | week
from doc: https://www.geomesa.org/documentation/stable/user/datastores/query_planning.html#stats-collected
Stat generation can be enabled or disabled through the simple feature type user data using the key geomesa.stats.enable
Cached statistics, and thus cost-based query planning, are currently only implemented for the Accumulo and Redis data stores.
*Total count, *Min/max (bounds) for the default geometry, *default date and any indexed attributes, *Histograms for the default geometry, default date and any indexed attributes, *Frequencies for any indexed attributes...
Why the return time is increased when increased amount of data?
./geomesa-redis export -u localhost:6379 -c geomesa -f SignalBuilder -q "cam like '%' and bbox(geo,38,56,39,57)" --hints STATS_STRING='Enumeration(cam)'
INFO Running export - please wait...
id,stats:String,*geom:Geometry
stat,"{""5798a065-d51e-47a1-b04b-ab48df9f1324"":203215}",POINT (0 0)
INFO Feature export complete to standard out for 1 features in 2056ms
next request
/geomesa-redis export -u localhost:6379 -c geomesa -f SignalBuilder -q "cam like '%' and bbox(geo,38,56,39,57)" --hints STATS_STRING='Enumeration(cam)'
INFO Running export - please wait...
id,stats:String,*geom:Geometry
stat,"{""5798a065-d51e-47a1-b04b-ab48df9f1324"":595984}",POINT (0 0)
INFO Feature export complete to standard out for 1 features in 3418ms
How to understand that statistics are collected and saved, and used when returning hints stats, like STATS_STRING='MinMax(time)'
or STATS_STRING='Enumeration(cam)'
?
And how to use sampling with geotools? I try next
geomesa-cassandra export -P 10.200.217.24:9042 -u cassandra -p cassandra \
-k geomesa -c gsm_events -f SignalBuilder \
-q "cam like '%' and time DURING 2021-12-27T16:50:38.004Z/2022-01-26T16:50:38.004Z" \
--hints SAMPLE_BY='cam';SAMPLING=0.000564
but it does not work. Thank you for any answer.
When you run an export with a query hint for stats, GeoMesa will always run a query. If you want to use the cached statistics, use the stats-*
commands instead. In code, you'd use the stats
method which all GeoMesa data stores implement.