k-meansmahout

Mahout clustering: Error in retrieving the name of a named vector using seqdumper


I am using mahout for k-means clustering on a directory containing 12 documents with the help of following commands:

mahout seq2sparse -i /user/manisha1414/dir_001-seqfiles -o /user/manisha1414/dir_001-vectors --maxDFPercent 85 --namedVector
mahout seqdumper -i /user/manisha1414/dir_001-kmeans-clusters/clusteredPoints/part-m-00000 > ./dir_001-cluster-docs.txt

I am getting the following Output

Key: 0: Value: wt: 1.0 distance: 47.44299700930014  vec: [{"0":2.386},{"2":1.875},{"9":2.386},{"14":2.386......... 
Key: 11: Value: wt: 1.0 distance: 217.4603558919857  vec: [{"0":2.386},{"2":1.875},{".........

I am not getting vector-ids in above output.

Please help me to get vector-ids also in the output !!


Solution

  • Use "--namedVector true" while converting your sequencefiles to vector.