javacluster-analysiselki

Showing progress of the ELKI DBSCAN clustering model while the model is running?


I am using ELKI's implementation of DBSCAN to cluster different datasets with various sizes (ranging from millions to thousands of observations), and since it can take quite long time for the different datasets when I run the algorithm I was wondering if it somehow is possible to show the progress (or a good estimate) of the algorithm?

I tried unsuccessfully to look in the ELKI documentation for the Clustering Class.

private static Clustering<Model> runModel(double eps, int minpts, Database db){

    //double eps = 10;
    //int minpts = 5;
    //db = data in a double[][] format;

    Clustering<Model> c = new DBSCAN<NumberVector>(
            EuclideanDistanceFunction.STATIC, eps, minpts).run(db);

    return c;
}

I would like to have this method to write to the console on a regular basis. Or in any way that I would be able to see the progress of the algorithm.


Solution

  • Yes.

    If you use the -verbose flag, logging will include progress.

    Programmatically, you can use LoggingConfiguration to set the verbosity level.

    Not for all algorithms, but for many; including DBSCAN. The progress logging will also include an estimate of the remaining time.

    Note that logging is not for free - it comes at an extra effort, and thus may cause the program to run longer. Verbose should be reasonable (and progress logging includes a rate control to bound the costs), but at DEBUG level it may become too expensive.

    To reduce the runtime, make sure to add an index to your database.