[SOLVED] Indexing Cassandra using Elassandra

Indexing Cassandra using Elassandra

I'm trying to use Elassandra as a standalone instance locally. Using bin/cqlsh I've created a keyspace and have added a test table to it. I want to create an index on this table to run elasticsearch queries, but I'm not sure how to go about it. I found this information, but it's just one example without really going through the options or what they mean. Can anyone point me in the right direction to index on my table? I've tried going through the ElasticSearch documentation as well with no luck. Thanks in advance.

Solution

Yes I admit, Elassandra documentation is far from perfect, and hard for newcomers.

Let's create a keyspace and table and insert some rows :

CREATE KEYSPACE ks WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': 1};
CREATE TABLE ks.t (id int PRIMARY KEY, name text);
INSERT INTO ks.t (id, name) VALUES (1, 'foo');
INSERT INTO ks.t (id, name) VALUES (2, 'bar');

NetworkTopologyStrategy is mandatory, SimpleStrategy is not supported.

Mapping all cql types to ES types can be boring, so there is a discover option to generate the mapping :

curl -XPUT -H 'Content-Type: application/json' 'http://localhost:9200/myindex' -d '{
    "settings": { "keyspace":"ks" },
    "mappings": {
        "t" : {
            "discover":".*"
        }
    }
}'

This creates an index named myindex, with a type named t (the cassandra table).

The name of the keyspace must be specified in settings.keyspace (because index name and keyspace name are differents).

The discover field contains a regex. Each cassandra column that matches this regex will be indexed automatically, with type inference.

Let's look at the generated mapping :

{
  "myindex": {
    ...
    "mappings": {
      "t": {
        "properties": {
          "id": {
            "type": "integer",
            "cql_collection": "singleton",
            "cql_partition_key": true,
            "cql_primary_key_order": 0
          },
          "name": {
            "type": "keyword",
            "cql_collection": "singleton"
          }
        }
      }
    },
 ...
}

There is a bunch of special cql_* options here.

For cql_collection, singleton means that the index field is backed by a cassandra scalar column - neither a list or set. This is mandatory because elasticsearch fields are multi-valued.

cql_partition_key, and cql_primary_key_order tell the index which column to use to create the _id field.