searchenterprise

Strategies for search across disparate data sources


I am building a tool that searches people based on a number of attributes. The values for these attributes are scattered across several systems.

As an example, dateOfBirth is stored in a SQL Server database as part of system ABC. That person's sales region assignment is stored in some horrible legacy database. Other attributes are stored in a system only accessible over an XML web service.

To make matters worse, the the legacy database and the web service can be really slow.

What strategies and tips should I consider for implementing a search across all these systems?

Note: Although I posted an answer, I'm not confident its a great answer. I don't intend to accept my own answer unless no one else gives better insight.


Solution

  • You could consider using an indexing mechanism to retrieve and locally index the data across all the systems, and then perform your searches against the index. Searches would be an awful lot faster and more reliable.

    Of course, this just shifts the problem from one part of your system to another - now your indexing mechanism has to handle failures and heterogeneous systems, but that may be an easier problem to solve.

    Another factor is how often the data changes. If you have to query data in real-time that goes stale very quickly, then indexing may not be practical.