databasetime-seriesvictoriametrics

How MergeSet works in VictoriaMetrics?


I met a research paper referred MergeSet from VictoriaMetrics.

It says

MergeSet is a simplified one-level LSM-tree implemented by VictoriaMetrics [6], which concatenates a tag and a posting ID together in a key, and the posting list is naturally formed because of the sorted order maintained by the LSM-tree.

Although I tried to read the source code from VM yet not sure about what MergeSet is.

For a common LSM-tree, each element is a key-value pair sorted by the key. If MergeSet follows this paradigm, then my questions are as follows:

  1. Does MergeSet concatenate each tag key-value pair with related TSID and store it as the key of a tuple in the MergeSet.table?
  2. If the tag key-value pair and TSID are both embedded in keys, what does the MergeSet.table value represent?

If not, should I consider MergeSet as a (partially) sorted string set inspired by the LSM-tree, utilizing on-disk capacity?


Solution

  • The github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset package implements LSM-like data structure with the following properties:

    The mergeset is used by VictoriaMetrics for storing various indexes, which are collectively named indexdb. These indexes include the following entries:

    How does VictoriaMetrics store these entries in the mergeset, which works only with sorted byte slices? It marshals entries into byte slices in the way they can be searched via prefix scan. It also prepends byte slices for every entry type with an individual prefix, so they do not clash with each other. See currently supported prefixes.