xmlcarrot2

What to be specified as URL in carrot2 xml file?


I have a set of documents(Multiple Line Sentences Text). I would like to cluster them using carrot2. According to the xml file format specified in the documentation. There has to be a query and documents with the snippets and url and title.

My questions are the following:-

  1. What should be written in the query component in XML file??
  2. What should be given as the URL and title for the documents as I have neither of them. I just have documents(Multiple Line Texts) which I extracted from a dataset.

I think answer to the first question is *:*. Is that correct?? Please help!!

Edit:-

The carrot2-wordbench throws the java.lang.NullPointerException after specifying the xml file and pressing process.

I am confident that the error is due to the xml file being given as input.

Does anyone know about possible things wrong with the xml which could cause the program to throw the Exception?

I have not been able to figure this out for a long time.


Solution

  • You can leave the title and URL fields empty. Title content, if present, is given more weight during clustering. The URL field is used only for display purposes.