I'm using Zend Lucene, but don't think the question is specific to that library.
Say I want to provide fulltext search for a database of books. Assume following models:
Model 1:
TABLE: book
- book_id
- name
TABLE: book_author
- book_author_id
- book_id
- author_id
TABLE: author
- author_id
- name
(a book can have 0 or more authors)
Model 2:
TABLE: book
- book_id
- name
TABLE: book_eav
- book_eav_id
- book_id
- attribute (e.g. "author")
- value (e.g. "Tom Clancy")
(a book can have 0 or more authors + information about publisher, number of pages, etc.)
What do I need to do in order to insert all the authors associated with a particular book in a document to be indexed? Do I put all the authors in one field in the document? Would I use some sort of delimiter to group author information? I'm looking for general strategies with this kind of data.
Put all the authors in one field in the document with a delimiter. So the document schema will be:
book_id
name
author: |author 1|author 2|...|author n|
other_attribute_1: |val 1|val 2|
other_attribute_2: |val 1|val 2|
With this schema you can search by author with different boosts with a query like:
(author:"|Tom Clancy|")^10 OR
(author:"Tom Clancy")^5 OR
(author:Tom Clancy)^1
This query will show the exact matches first, phrase matches then and finally other matches.