hunspellthesaurus

numbering words in thesaurus index file


I will like to know how the thesaurus dictionaries are built. What is the relation between .dat file and index file .idx? For e.g. the relevant entry from th_en_CA_v2.dat file looks like this...

ploy|2
(noun)|gambit|remark (generic term)|comment (generic term)
(noun)|gambit|stratagem|maneuver (generic term)|manoeuvre (generic term)|tactical maneuver (generic term)|tactical manoeuvre (generic term)

The relevant entry from th_en_CA_v2.idx file

ploy|12626348

What is that number (12626348) next to word ploy?


Solution

  • It's the byte offset of the entry for ploy in the .dat file.