elasticsearchcassandraelassandra

Elasticsearch query from JSON formatted string?


I am working with Elasticsearch and I need to query value from the document which stored as JSON formatted string.

Is there any option to query form Elasticsearch which stored JSON formatted string?

Please see my use case

I am saving my application data in Cassandra and replicating these data to elastic search (I am using elassandra bundled version for it). But in Cassandra, I have a field with list<text> and it holds JSON array with nested JSON objects.

When I map the Cassandra table to Elasticsearch (as per suggested by elassandra doc) it mapping Cassandra field name as JSON key in Elasticsearch and entire JSON array considered as JSON formatted string.

Now I need to do the query based on the key inside the JSON which are stored as JSON string in Elasticsearch.

Please see the sample of my data stored in the Elasticsearch:

    {
    "status": {
        \"visibilityStatus\": true,
        \"deleteStatus\": true
    }
}

Here status is Cassandra field name and remaining is the value of one record.

Now I need to search the record with deleteStatus=true, any clue, please.

Thanks in advance


Solution

  • You should store your status object as an Elasticsearch object backed by an UDT (a Cassandra User Defined Type), and then, you will be able to search in with an elasticsearch nested query.

    You can create your cassandra schema with a UDT for the status column and auto-discover the mapping, or specify the elasticsearch mapping to generate the CQL schema. The optional cql_udt_name allows to name the UDT name, as shown bellow :

    XContentBuilder mapping = XContentFactory.jsonBuilder()
                    .startObject()
                        .startObject("properties")
                            .startObject("id").field("type", "keyword").field("cql_collection", "singleton").field("cql_primary_key_order", 0).field("cql_partition_key", true).endObject()
                            .startObject("event_timestamp")
                                .field("type", "date")
                                .field("format", "strict_date_hour_minute_second||epoch_millis")
                                .field("cql_collection", "singleton")
                            .endObject()
                            .startObject("event_info")
                                .field("type", "nested")
                                .field("cql_collection", "singleton")
                                .field("cql_udt_name", "event_info_udt")
                                .field("dynamic", "false")
                                .startObject("properties")
                                   .startObject("event_timestamp")
                                    .field("type", "date")
                                    .field("format", "strict_date_hour_minute_second||epoch_millis")
                                    .field("cql_collection", "singleton")
                                .endObject()
                            .endObject()
                        .endObject()
                        .endObject()
                    .endObject();