Assume I have the following example hierarchy:
I see two ways that I could index a “Grand Rapids, Michigan” document with prefixed terms:
XFIRSTLEVELus
XSECONDLEVELmichigan
XTHIRDLEVELgrandrapids
or
XFIRSTLEVELus
XSECONDLEVELus_michigan
XTHIRDLEVELus_michigan_grandrapids
I’m inclined to use the second approach thinking that it will return more intuitive results. That is, a search that includes Grand Rapids, Michigan search criteria is less likely to include documents from Minnesota and Ohio.
However, two aspects of this approach bother me. First, the creation and maintenance of term prefixes for each level of the hierarchy feels wrong. Second, the concatenation of values seems like a surrogate for using weights.
So, what is the best way to represent a hierarchy with term prefixes?
As with all these things, It might be best to think about how you want to use the data, rather than what the 'best' way of storing it is.
In the past, I have stored location data like you describe as if they were URL paths, converting the place name in to a slug, so your example above would look something like:
us
us/michigan
us/michigan/detroit
us/michigan/grand-rapids
us/michigan/lansing
us/minnesota
us/minnesota/grand-rapids
us/minnesota/minneapolis
us/minnesota/st-paul
us/ohio
us/ohio/columbus
us/ohio/grand-rapids
us/ohio/sandusky
Give each document a prefixed term with one of those paths, and use an exact term search to get all documents only in a place (location:us/minnesota/minneapolis
) or a wildcard search to get all children of a location (location:us/minnesota/*
)
This may or may not be the 'best' solution, but it might work for some applications :)