Both Elasticsearch and Pinot use Apache Lucene internally. In what ways do they differ in their indexing strategies?
Apache Pinot and Elasticsearch solve distinct problems.
Elasticsearch is a search engine used for full-text searches, fuzzy queries, auto-completion of search terms, etc. It achieves this using something called an inverted index. Conventional indexing used sorted index where the document was stored as the key and the keywords as the value. In this case, the query latency would be very high since the entire document needs to be searched. But in an inverted index, the keyword is stored as the key and the document id's as the value. Here, since only the search keywords are needed to be searched, the query latency would be very low. Hence, Elasticsearch uses inverted indices to solve its core purpose, which is 'search'.
Apache Pinot was not built for 'search'. It was rather built for realtime analytics. It uses something called Star-Tree index, which is something like pre-aggregated value store of all combinations of all dimensions of the data. As you can see, Apache Pinot is interested in the aggregate derivations/reductions from the data rather than the data itself. It uses these pre-aggregated values to provide a very low latency, realtime analytics on the data.
A very important use case of Apache Pinot would be to compute realtime per-user-level analytics and render live per-user-facing dashboards. Elasticsearch too can render realtime dashboards using Kibana, but since it uses inverted index approach, it won't be suitable for per-user-level analytics as that will put a huge load on the server and will require a large number of elastic instances. Due to this upper bound, Elasticsearch would not be suited for per-user-level analytics.
So, if you want to have search functionality in your application and also per-user-level analytics, the best way would be to have both Elasticsearch and Pinot consumers ingest data from the same Kafka topic, through parallel pipelines. This way, while Elasticsearch indexes the data for search purposes, Pinot will process the data for per-user-level analytics.