javacluster-analysiselkiclique

Subspace clustering using CLIQUE in ELKI


I am trying to detect dense subspaces from a high dimensional dataset. For this I want to use ELKI library. But there are very few documentations and examples of ELKI library.

I tried the following-

    Database db=makeSimpleDatabase("D:/sample.csv", 600);

    ListParameterization params = new ListParameterization();
    params.addParameter(CLIQUE.TAU_ID, "0.1");
    params.addParameter(CLIQUE.XSI_ID, 20);

    // setup algorithm
    CLIQUE<DoubleVector> clique = ClassGenericsUtil.parameterizeOrAbort(CLIQUE.class, params);

    // run CLIQUE on database
    Clustering<SubspaceModel<DoubleVector>> result = clique.run(db);

    for(Cluster<?> cl : result.getToplevelClusters()) {
        System.out.println(cl.getIDs());
    }

I gave the following input-

2,2
2,3
5,2
5,3
8,4

and the result was-

[2, 1]
[4, 3]
[5]
[3, 1]
[4, 2]
[5]
[1]
[2]
[3]
[4]
[5]

I expect the output as input datapoints grouped into subspaces. May be I am picking the wrong values or setting the parameters in a wrong way.

Please help. Thanks in advance.


Solution

  • Note that CLIQUE produces overlapping clusters.

    Elements can be in 0 to many clusters at the same time. If you choose your parameters badly (and CLIQUE parameters seem to be really hard to choose), you will get weird results. In your case, it seems to be 11 clusters, despite your data set only having 5 elements.

    Essentially what the clustering tells you is:

    Elements [2,1] cluster (they both have x=2)

    Elements [4,3] cluster (they both have x=5)

    Element [5] is a cluster (only element with x=8)

    Elements [3,1] cluster (they both have y=2)

    Elements [4,2] cluster (they both have y=3)

    Element [5] is a cluster (only element with y=4)

    In the x,y subspace, every element is separate, and its own cluster.

    Choose better parameters for this fragile algorithm.

    TAU = 0.1 (10% of 5 points): anything with more than 0.5 points is a cluster... in other words, everything. That is why you get this result - you asked for it.