javasearchfull-text-searchlucenenear-real-time

Is it possible to obtain real time search results sorted by frequently updating field with Lucene 3.0 in Java


Consider following assumptions:

  1. I have Java 5.0 Web Application for which I'm considering to use Lucene 3.0 for full-text searching
  2. There will be more than 1000K Lucene documents, each with 100 words (average)
  3. New documents must be searchable just after they are created (real time search)
  4. Lucene documents have frequently updating integer field named quality

Where to find code examples (simple but as complete as possible) of near real time search of Lucene 3.0?

Is it possible to obtain query results sorted by one of document fields (quality) which may be updated frequently (for already indexed document)? Such updating of document field will have to trigger Lucene index rebuilding? What is performance of such rebuilding? How to done it efficiently - I need some examples / documentation of complete solution.

If, however, index rebuilding is not necessarily needed in this case - how to sort search results efficiently? There may be queries returning lots of documents (>50K), so I consider it unefficient to obtain them unsorted from Lucene and then sort them by quality field and finally divide sorted list to pages for pagination.

Is Lucene 3.0 my best choice within Java or should I consider some other frameworks/solutions? Maybe full text search provided by SQL Server itself (I'm using PostgreSQL 8.3)?


Solution

  • The Lucene API is capable of everything you're asking, but it won't be easy. It's a fairly low-level API, and making it do complicated things is quite an exercise in itself.

    I can highly recommend Compass, which is a search/indexing framework built on top of Lucene. As well as a much friendlier API, it provides functionality such as object/XML/JSON mapping to Lucene indexes, as well as fully transactional behaviour. It should have no trouble with your requirements, such as realtime sorting of transactionally-updated documents.

    Compass 2.2.0 is built upon Lucene 2.4.1, but a Lucene 3.0-based version is in the works. It's sufficiently abstracted from the Lucene API that the transition should be seamless, though.