I'm trying to determine the similarity between two documents using carrot. Is it possible get this similarity directly from the framework?
Additionally I've been studying the tf-idf matrix and realized that the rows correspond to the stemmed all words and columns to documents. However, how can I identify which document corresponds to which column?
For example, suppose a list of documents, the column order will be the order of the documents in the list?
Ex:
List docs = {doc1, doc2, doc3}
and
Column 0 = doc1 Coluns 1 = doc2
...
Is this?
Carrot2 does not use the conventional notion of document-document similarity, so you won't find it there. You can indeed use the term-document matrix to compute all sorts of document-document similarity.
You are correct in assuming that the columns of the term-document matrix are in the same order as the documents in the input list. You can check the source code to clear any other doubts.