I am using mahout for k-means clustering on a directory containing 12 documents with the help of following commands:
mahout seq2sparse -i /user/manisha1414/dir_001-seqfiles -o /user/manisha1414/dir_001-vectors --maxDFPercent 85 --namedVector
mahout seqdumper -i /user/manisha1414/dir_001-kmeans-clusters/clusteredPoints/part-m-00000 > ./dir_001-cluster-docs.txt
I am getting the following Output
Key: 0: Value: wt: 1.0 distance: 47.44299700930014 vec: [{"0":2.386},{"2":1.875},{"9":2.386},{"14":2.386.........
Key: 11: Value: wt: 1.0 distance: 217.4603558919857 vec: [{"0":2.386},{"2":1.875},{".........
I am not getting vector-ids in above output.
Please help me to get vector-ids also in the output !!
Use "--namedVector true" while converting your sequencefiles to vector.