multiple-columnssecondary-indexestarantoolnosql

how to create spaces in tarantool searchable on multiple criteria


Our use case is as below

We will be getting articles from a source.
Each article has certain metadata.The metadata has a field categories which is a list of category names(mobiles,laptops etc) that the article might fall into.Similarly the metadata has a keywords field with a list of keywords that the article might fall into.

Below are the query's that we might need

a) Get all articles for a given category and are from a certain source,location.Source and location are metadata fields of the article.
b) Get all articles for a given category,source,location. c) Get all articles for a given category,location.
d) Get all articles for a given keyword,category,location.

Below is how we are designing the spaces.So upon getting each article,we insert the article into articles space and also insert into category_articles and keyword_articles space based on which categories and keywords this article falls into.

Below are few queries I have on how to efficiently create the secondary indexes for the given query patterns.

a)For a secondary key {category,source,location},if we miss certain parts in the input value ,then will the rows match if other parts of key are equal ? https://tarantool.org/doc/singlehtml.html#lua-data.index_object.parts
I know that we cant specify 1st and 3rd part in input i.e {'category','location'} for the key ? or shud we have a nil at second position.

b)Can we model the schema in a better way so as to efficiently query for the given patterns.

articles

urlhash
title
content
url
date

primary key : {urlhash}

--
category_articles

id
category
source
location
{urlhash,..}

primary key : {id}
secondary key : {category,source,location}

--
keyword_articles

id
keyword
category
source
location
{urlhash,..}

primary key : {id}
secondary key : {keyword,category,source,location}

Updates:
Query :
box.space.category_articles.index.secondary_key:select{'bollywoord',nil,'india'}
Ouput :

[1, 'bollywoord', 'ht', 'india', ['url1', 'url2']]
[3, 'bollywoord', 'ht', 'us', ['url3', 'url4']]


Solution

  • You need to include all fields you're going to use for search into an index. You can define multiple secondary indexes. E.g. you can define an index on keyword, category, source and another one on location, category, source. These are all fine combinations in addition to the primary key. You can not search in Tarantool on an arbitrary field combination, unless you build an index on it.