ruby-on-railsrubyelasticsearchactiverecordelasticsearch-model

Approach for associated model searching with Elasticsearch [Ruby, ActiveRecord, elasticsearch-model]


I have a question about choosing the best approach in case of searching in Elasticsearch.

I have Ruby off Rails API, with ActiveRecord, together with Elasticsearch ( elasticsearch-model gem).

It's a simple API which returns projects (Project AR model), where I set up indexes:

mapping dynamic: false do
indexes :created_at, type: 'date'
end

Then I simply make a search on Elasticsearch and return AR relations directly from the controller. It's working perfectly.

Now I am trying to add categories to projects, as categories has_many projects, and projects belongs_to categories. I am wondering about two things:

How now should I make a query to get projects from specific categories, should I reimplement it to return result = Category.search(...) and return result.jobs, or still seeking by projects, but searching by category_id?

How to combine Category and Project in Elasticsearch to make possible to search for projects from a specific category, and from various multiple categories? Merge mappings?

Thanks in advance!


Solution

  • My first though here is that you're use case is very simple. You don't need Elasticsearch at all. You can simplify and use just ActiveRecord, select by SQL and return records. You don't need any of the features provided by Elasticsearch for your use case, you're just creating more work for yourself by doing so.

    However, I'll assume your developing iteratively and your usage will get significantly more complex. Justifying the use of Elasticsearch.

    Relational and Non-Relational Data

    ActiveRecord is a ORM for relational data. Typicalhttps://www.elastic.co/guide/en/elasticsearch/reference/current/documents-indices.htmlly it sits on top of a Structured Query Language powered relational database. It supports relations (associations in Rails parlance) really well.

    Elasticsearch is a non relational document store, storing information as JSON in an inverted index. This allows for very quick full-text search (amongst other uses). It doesn't support relations between documents very well. It is designed to not relate data! It wants you to not store relations and to instead repeat data constantly, the opposite of the SQL approach. This is called denormalization, more on this below.

    These are very different ways of think about data storage! That doesn't mean they don't play well together. They can work well together if used correctly. However, they take different ways of thinking. It's important to have a good grasp of the fundamentals behind each in order to make sound judgements on how to use them, in my opinion.

    Elasticsearch has great documentation. I recommend you spend a couple hours reading up on it.

    How does Elasticsearch do associations?

    You care about a basic has-many association in your question, so how does Elastic handle these? What are the options?

    There are 4 major options. I think you should denormalize your data here.

    I think you want to return all projects from certain categories. So you should create a index which contains a document per category. Each category should have within it a complete list of it's projects. You can then query for a category and return all it's projects.

    This is one approach. Your use case is so simple that it feels like overkill to me, you could solve this using any of the 4 major options for Elasticsearch linked above. Or ideally by not using Elasticsearch at all.

    If you provide more detail on your use case, I'd be able to discuss more what approach you could use.