[SOLVED] Keeping instances IDs during mcl clustering

Keeping instances IDs during mcl clustering

I am trying to cluster points using mcl. The points take indices ind (e.g ind= [4,54,3,etc]). I converted my graph to .abc format and applied mcl to this file (following the instructions provided by micans). The output gives me clusters using the canonical domain (that is, for the example above, 3 would be represented by 0, 4 by 1, 54 by 3). Is there a way to get the output using the indices I gave in input?

Solution

This is the basic workflow, using an example file name 'f.abc' in abc format:

mcxload -abc f.abc --stream-mirror -o f.mci -write-tab f.tab
mcl f.mci
mcxdump -icl out.f.mci.I20 -tabr f.tab -o dump.f.mci.I20

The file dump.f.mci.I20 should now contain the labels that were used in the 'abc' file. However, if you just do

mcl f.abc --abc

then you should get the exact same result, although now in the (default output) file out.f.abc.I20. By default mcl assumes an 'mcl graph file' (in the documentation this is often called matrix format or refered to as a matrix file, as graphs and sparse matrices are the same thing in the mcl software). You can give mcl a file in abc format, but it will not figure out by itself that the format is different, hence the use of the --abc option.