I came across several methods for measuring semantic similarity that use the structure and hierarchy of WordNet, e.g. Jiang and Conrath measure (JNC), Resnik measure(RES), Lin measure (LIN) etc.
The way they are measured using NLTK is:
sim2=wn.jcn_similarity(entry1,entry2,brown_ic)
sim3=entry1.res_similarity(entry2, brown_ic)
sim4=entry1.lin_similarity(entry2,brown_ic)
If WordNet is the basis of calculating semantic similarity, what is the use of Brown Corpus here?
Take a look at the explanation at the NLTK howto for wordnet.
Specifically, the *_ic notation is information content.
synset1.res_similarity(synset2, ic): Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node). Note that for any similarity measure that uses information content, the result is dependent on the corpus used to generate the information content and the specifics of how the information content was created.
A bit more info on information content from here:
The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actual usage in text as derived from a large corpus