indexingstoredbscanelki

How to store index in ELKI?


I am using ELKI 0.7.2 (master) for running DBSCAN with R* tree on a large data set. Afterwards, I need to store the tree persistently, so that it can be reloaded in memory when new data points are evaluated whether they are noise or not. To this end, I tried PersistentPageFileFactory and got the following error

java.lang.ClassCastException: de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeNode cannot be cast to de.lmu.ifi.dbs.elki.persistent.ExternalizablePage

Although I simply revised RStarTreeNode to implement the interface ExternalizablePage, it didn't help. When I utilzed OnDiskArrayPageFileFactory, I got another error as follows

java.lang.RuntimeException: IOException occurred during reading of page 0
at de.lmu.ifi.dbs.elki.persistent.OnDiskArrayPageFile.readPage(OnDiskArrayPageFile.java:113)

Is there a way to store an index, e.g. R* tree, into a file and to load it from the file?

Many thanks in advance!


Solution

  • The disk deserialization code has been unused for years, and thus is likely broken.

    I am even not sure if it ever fully supported reading back the index from disk standalone; I assume it was only implemented to simulate an on-disk index for benchmarking purposes (i.e., it will read and write the data from disk, but it probably cannot read an existing index).

    This just is not a functionaly I needed, so I never worked on this code much beyond refactoring. And I have actually been trying to slowly remove much of this code (in particular ExternalizablePage) because I did not have the impression it is usable.

    I have a rewritten version of the R-tree somewhere that is better suited for actual on-disk usage. But it is not finished, it does not support R*-tree re-insertions yet. So the code is not published yet (and may never get finished, unfortunately).

    So you may need to rewrite large parts of that code to make it usable.

    If you do so, please share your modifications on Github.