crate

how to avoid data query laterncy in crate


crate version: 1.1.3 after inserting data into crate, I send a key to activemq to query this data immediately, but unfortunately, it failed everytime, so I sleep the thread for 2000ms, it worked, so I guess the cluster need some time to sync data here is crate.yaml:

psql.enabled: true
psql.port: 33892
prepareThreshold: 0

http.max_content_length: 150mb
indices.store.throttle.max_bytes_per_sec: 150mb
threadpool.bulk.type: fixed
threadpool.bulk.size: 128
threadpool.bulk.queue_size: 5000

cluster.name: EIn_Cluster
node.name: dscn1
index.number_of_replicas: 2
path.conf: /home/hadmin/crate/config
path.data: /home/hadmin/data/crate
path.work: /home/hadmin/data/crate/tmp
path.logs: /home/hadmin/data/crate/logs
path.plugins: /home/hadmin/crate/plugins
blobs.path: /home/hadmin/data/crate/crate_blob_data/disk
network.host: 192.168.13.50
gateway.recover_after_nodes: 3
discovery.zen.minimum_master_nodes: 3
gateway.expected_nodes: 3
discovery.zen.ping.timeout: 10s
discovery.zen.fd.ping_interval: 10s
#transport.tcp.port: 4399
discovery.zen.ping.unicast.hosts:
  - dscn1:4300
  - dscn2:4300
  - dscn3:4300

is this related to multi zone setup? or I miss some settings? how to avoid this

thanks


Solution

  • As crate is eventually consistent, not all inserted documents are immediately available for querying. The availability of new/changed documents depends on different impacts, the most important one is the configured refresh_interval (see https://crate.io/docs/reference/sql/reference/create_table.html#sql-ref-refresh-interval). But be aware that lowering this value will result in lower ingest performance.

    You can also force refreshing by using the refresh table command, see https://crate.io/docs/reference/sql/refresh.html, which is the recommended way to do if inserting is not happening constantly (e.g. refresh after insert completed, before issuing the next statement).