cassandrageomesa

Geomesa: Radius query DWITHIN(geo, POINT (80.57944937597278 69.63928679990498), 130.0, meters) is very long


I have cassandra with geomesa, in there I have next schema

 ~  bin/geomesa-cassandra_2.11-3.3.0/bin/geomesa-cassandra describe-schema -P localhost:9042 -u cassandra -p cassandra -k geomesa -c gsm_events -f SignalBuilder 
INFO  Describing attributes of feature 'SignalBuilder'
geo           | Point   (Spatio-temporally indexed)
time          | Date    (Spatio-temporally indexed) (Attribute indexed)
cam           | String  (Attribute indexed) (Attribute indexed)
imei          | String  (Attribute indexed)
dir           | Double  
alt           | Double  
vlc           | Double  
sl            | Integer 
ds            | Integer 
dir_y         | Double  
poi_azimuth_x | Double  
poi_azimuth_y | Double  

User data:
  geomesa.attr.splits     | 4
  geomesa.feature.expiry  | time(30 days)
  geomesa.index.dtg       | time
  geomesa.indices         | z3:7:3:geo:time,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time,attr:8:3:imei
  geomesa.stats.enable    | true
  geomesa.table.partition | time
  geomesa.z.splits        | 4
  geomesa.z3.interval     | week

when I do the next request, it is really fast:

 ~  date                                                                                                                                                                                                                                                                                      
Tues 11 Jan 2022 18:24:24 MSK
 ~  bin/geomesa-cassandra_2.11-3.3.0/bin/geomesa-cassandra export -P 10.200.217.24:9042 -u cassandra -p cassandra -k geomesa -c gsm_events -f SignalBuilder -q "cam='5798a065-d51e-47a1-b04b-ab48df9f1324'" -m 10                                                                             
INFO  Running export - please wait...
id,*geo:Point,time:Date,cam:String,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
5798a065-d51e-47a1-b04b-ab48df9f1324-1638361381456,POINT (37.715528122 55.713859037),2021-12-01T12:23:01.456Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,0.0,60.0,5,1,0.0,0.0,0.0
5798a065-d51e-47a1-b04b-ab48df9f1324-1640706820000,POINT (31.970000000000002 56.88),2021-12-28T15:53:40.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,0.0,1.23,1,1,0.0,0.0,0.0
5798a065-d51e-47a1-b04b-ab48df9f1324-1640797130000,POINT (31.39951572344312 68.98547135697989),2021-12-29T16:58:50.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1640797434000,POINT (31.399484359174057 69.07515114832239),2021-12-29T17:03:54.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1640797627000,POINT (31.39948363272515 69.09555283117254),2021-12-29T17:07:07.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1640797784000,POINT (31.39948340746396 69.10451379934774),2021-12-29T17:09:44.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1640797993000,POINT (31.39948291171159 69.11711406549725),2021-12-29T17:13:13.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1640798006000,POINT (31.399482907612764 69.11813926100234),2021-12-29T17:13:26.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1640798020000,POINT (31.39948286587965 69.12141054010591),2021-12-29T17:13:40.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1640798414000,POINT (31.399453814973754 69.20771971657824),2021-12-29T17:20:14.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
 ~  date                                                                                                                                                                                                         
Tues 11 Jan 2022 18:24:31 MSK

But when I make next request by radius DWITHIN(geo, POINT (80.57944937597278 69.63928679990498), 130.0, meters) It takes more than 4 minutes

✘  ~  date                     
Tues 11 Jan 2022 17:42:10 MSK
 ~  bin/geomesa-cassandra_2.11-3.3.0/bin/geomesa-cassandra export -P 10.200.217.24:9042 -u cassandra -p cassandra -k geomesa -c gsm_events -f SignalBuilder -q "DWITHIN(geo, POINT (80.57944937597278 69.63928679990498), 130.0, meters)"
INFO  Running export - please wait...
id,*geo:Point,time:Date,cam:String,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
5798a065-d51e-47a1-b04b-ab48df9f1324-1641900233000,POINT (80.57844937597278 69.63828679990498),2022-01-11T11:23:53.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
 ~  date                                                                                                                                                                                                                                 
Tues 11 Jan 2022 17:46:49 MSK

Explain shows it uses Z3Index:

 ✘  ~  bin/geomesa-cassandra_2.11-3.3.0/bin/geomesa-cassandra explain -P 10.200.217.24:9042 -u cassandra -p cassandra -k geomesa -c gsm_events -f SignalBuilder -q "DWITHIN(geo, POINT (80.57944937597278 69.63928679990498), 130.0, meters)"
Planning 'SignalBuilder' DWITHIN(geo, POINT (80.57944937597279 69.63928679990498), 130.0, meters)
  Original filter: DWITHIN(geo, POINT (80.57944937597279 69.63928679990498), 130.0, meters)
  Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
  Sort: none
  Transforms: none
  Max features: none
  Strategy selection:
    Query processing took 19ms for 1 options
    Filter plan: FilterPlan[Z3Index(geo,time)[INCLUDE][DWITHIN(geo, POINT (80.57944937597279 69.63928679990498), 130.0, meters)](Infinity)]
    Strategy selection took 3ms for 1 options
  Strategy 1 of 1: Z3Index(geo,time)
    Strategy filter: Z3Index(geo,time)[INCLUDE][DWITHIN(geo, POINT (80.57944937597279 69.63928679990498), 130.0, meters)](Infinity)
    Plan: org.locationtech.geomesa.cassandra.data.StatementPlan
      Tables: gsm_events_SignalBuilder_z3_geo_time_v7_02576, gsm_events_SignalBuilder_z3_geo_time_v7_02579, gsm_events_SignalBuilder_z3_geo_time_v7_02708, gsm_events_SignalBuilder_z3_geo_time_v7_02711, gsm_events_SignalBuilder_z3_geo_time_v7_02712, gsm_events_SignalBuilder_z3_geo_time_v7_02713, gsm_events_SignalBuilder_z3_geo_time_v7_02714, gsm_events_SignalBuilder_z3_geo_time_v7_02715
      Ranges (8): SELECT * FROM geomesa.gsm_events_SignalBuilder_z3_geo_time_v7_02576;, SELECT * FROM geomesa.gsm_events_SignalBuilder_z3_geo_time_v7_02579;, SELECT * FROM geomesa.gsm_events_SignalBuilder_z3_geo_time_v7_02708;, SELECT * FROM geomesa.gsm_events_SignalBuilder_z3_geo_time_v7_02711;, SELECT * FROM geomesa.gsm_events_SignalBuilder_z3_geo_time_v7_02712;
      Client-side filter: DWITHIN(geo, POINT (80.57944937597279 69.63928679990498), 130.0, meters)
      Reduce: class:LocalTransformReducer, state:{name=SignalBuilder, tnam=, tsft=, tdef=, hint=RETURN_SFT,"SignalBuilder,""*geo:Point,time:Date,cam:String,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double;geomesa.stats.enable='true',geomesa.z.splits='4',geomesa.feature.expiry='time(30 days)',geomesa.table.partition='time',geomesa.index.dtg='time',geomesa.indices='z3:7:3:geo:time,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time',geomesa.attr.splits='4',geomesa.z3.interval='week'""", spec=*geo:Point,time:Date,cam:String,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double;geomesa.stats.enable='true',geomesa.z.splits='4',geomesa.feature.expiry='time(30 days)',geomesa.table.partition='time',geomesa.index.dtg='time',geomesa.indices='z3:7:3:geo:time,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time',geomesa.attr.splits='4',geomesa.z3.interval='week', filt=DWITHIN(geo, POINT (80.57944937597279 69.63928679990498), 130.0, meters)}
    Plan creation took 87ms
  Query planning took 263ms

Why this last query is so long? And is there any way to make faster the request?

P.s. All db has 4_754_928 records

P.s.s Found that any query by geo figure takes about five minutes.

BBox has the same behavior, and takes something like 5 minutes, on cassandra cluster

 ~  bin/geomesa-cassandra_2.11-3.3.0/bin/geomesa-cassandra export -P 10.200.217.24:9042 -u cassandra -p cassandra -k geomesa -c gsm_events -f SignalBuilder -q "BBOX(geo, 79.01240030621715, 68.63828679990498, 81.01240030621715, 70.63828679990498)"
INFO  Running export - please wait...
id,*geo:Point,time:Date,cam:String,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
5798a065-d51e-47a1-b04b-ab48df9f1324-1641895055000,POINT (80.01240030621715 69.63828679990498),2022-01-11T09:57:35.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1641895317000,POINT (80.13819338413361 69.63828679990498),2022-01-11T10:01:57.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,
5798a065-d51e-47a1-b04b-ab48df9f1324-1641900233000,POINT (80.57844937597278 69.63828679990498),2022-01-11T11:23:53.000Z,5798a065-d51e-47a1-b04b-ab48df9f1324,1234-2222-3333-4444-55555,10.8,,1.23,5,1,0.0,,

P.S.S.S For geom like BBOX I needed to have z2 index.

Why when I have z3 (POINT(lat,lon),time) index this query is long too

bin/geomesa-cassandra_2.11-3.3.0/bin/geomesa-cassandra export -P 10.200.217.24:9042 -u cassandra -p cassandra -k geomesa -c gsm_events -f SignalBuilder -q "BBOX(geo, 79.0, 30.0, 81.0, 40.0) AND time AFTER 2021-09-23T13:05:00.0Z"
^C%                                                                              ✘  ~  bin/geomesa-cassandra_2.11-3.3.0/bin/geomesa-cassandra explain -P 10.200.217.24:9042 -u cassandra -p cassandra -k geomesa -c gsm_events -f SignalBuilder -q "BBOX(geo, 79.0, 30.0, 81.0, 40.0) AND time AFTER 2021-09-23T13:05:00.0Z"
Planning 'SignalBuilder' BBOX(geo, 79.0,30.0,81.0,40.0) AND time AFTER 2021-09-23T13:05:00+00:00
  Original filter: BBOX(geo, 79.0,30.0,81.0,40.0) AND time AFTER 2021-09-23T13:05:00+00:00
  Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
  Sort: none
  Transforms: none
  Max features: none
  Strategy selection:
    Query processing took 35ms for 2 options
    Filter plan selected: FilterPlan[AttributeIndex(time)[time AFTER 2021-09-23T13:05:00+00:00][BBOX(geo, 79.0,30.0,81.0,40.0)](2.5)] (Cost 250 in 0ms)
    Filter plans not selected: FilterPlan[Z3Index(geo,time)[BBOX(geo, 79.0,30.0,81.0,40.0) AND time AFTER 2021-09-23T13:05:00+00:00][None](3.0)] (Cost 300 in 0ms)
    Strategy selection took 12ms for 2 options
  Strategy 1 of 1: AttributeIndex(time)
    Strategy filter: AttributeIndex(time)[time AFTER 2021-09-23T13:05:00+00:00][BBOX(geo, 79.0,30.0,81.0,40.0)](2.5)
    Plan: org.locationtech.geomesa.cassandra.data.StatementPlan
      Tables: gsm_events_SignalBuilder_attr_time_v8_02576, gsm_events_SignalBuilder_attr_time_v8_02579, gsm_events_SignalBuilder_attr_time_v8_02708, gsm_events_SignalBuilder_attr_time_v8_02711, gsm_events_SignalBuilder_attr_time_v8_02712, gsm_events_SignalBuilder_attr_time_v8_02713, gsm_events_SignalBuilder_attr_time_v8_02714, gsm_events_SignalBuilder_attr_time_v8_02715
      Ranges (32): SELECT * FROM geomesa.gsm_events_SignalBuilder_attr_time_v8_02576 WHERE shard=0 AND attrVal>'8000017c12c25c60';, SELECT * FROM geomesa.gsm_events_SignalBuilder_attr_time_v8_02576 WHERE shard=1 AND attrVal>'8000017c12c25c60';, SELECT * FROM geomesa.gsm_events_SignalBuilder_attr_time_v8_02576 WHERE shard=2 AND attrVal>'8000017c12c25c60';, SELECT * FROM geomesa.gsm_events_SignalBuilder_attr_time_v8_02576 WHERE shard=3 AND attrVal>'8000017c12c25c60';, SELECT * FROM geomesa.gsm_events_SignalBuilder_attr_time_v8_02579 WHERE shard=0 AND attrVal>'8000017c12c25c60';
      Client-side filter: BBOX(geo, 79.0,30.0,81.0,40.0)
      Reduce: class:LocalTransformReducer, state:{name=SignalBuilder, tnam=, tsft=, tdef=, hint=RETURN_SFT,"SignalBuilder,""*geo:Point,time:Date,cam:String,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double;geomesa.stats.enable='true',geomesa.z.splits='4',geomesa.feature.expiry='time(30 days)',geomesa.table.partition='time',geomesa.index.dtg='time',geomesa.indices='z3:7:3:geo:time,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time',geomesa.attr.splits='4',geomesa.z3.interval='week'""", spec=*geo:Point,time:Date,cam:String,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double;geomesa.stats.enable='true',geomesa.z.splits='4',geomesa.feature.expiry='time(30 days)',geomesa.table.partition='time',geomesa.index.dtg='time',geomesa.indices='z3:7:3:geo:time,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time',geomesa.attr.splits='4',geomesa.z3.interval='week', filt=BBOX(geo, 79.0,30.0,81.0,40.0)}
    Plan creation took 9061ms
  Query planning took 9324ms

It is return nothing, but takes about 5 minutes

~  bin/geomesa-cassandra_2.11-3.3.0/bin/geomesa-cassandra export -P 10.200.217.24:9042 -u cassandra -p cassandra -k geomesa -c gsm_events -f SignalBuilder -q "BBOX(geo, 79.0, 30.0, 81.0, 40.0) AND time AFTER 2021-09-23T13:05:00.0Z";                                    
INFO  Running export - please wait...
id,*geo:Point,time:Date,cam:String,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double

Thank you!


Solution

  • The query is slow because it's doing a full scan of all your data. If you look at the explain output, you'll see:

    Ranges (8): SELECT * FROM geomesa.gsm_events_SignalBuilder_z3_geo_time_v7_02576;...
    Client-side filter: DWITHIN(geo, POINT (80.57944937597279 69.63928679990498), 130.0, meters)
    

    If you want to query just by spatial predicates, you should re-enable the z2 index, which is designed for this use case. See Index Overview and Customizing Index Creation in the GeoMesa documentation for details.