sql-server-2008full-text-searchfull-text-catalog

SQL Server 2k8 Fulltext search on "unrelated" tables using view or?


I'm pretty new to Fulltext searches and I'd really like to know the best approach to perform a "site search" style fulltext search across multiple unrelated tables (I am planning to do this across 4 tables). I was thinking of using a view like so:

CREATE VIEW [dbo].[Search] WITH SCHEMABINDING
    AS

        SELECT   p.ProductId AS ItemId
            ,'Product' AS ItemType
            ,p.Title AS ItemTitle
            ,p.LongDescription AS LongDescription
            ,p.Price AS Price
    FROM dbo.Product AS p
    WHERE p.IsActive = 1

    UNION

    SELECT   a.ArticleId AS ItemId
            ,'Article' AS ItemType
            ,a.ArticleTitle AS ItemTitle
            ,a.Contents AS LongDescription
            ,NULL AS Price
    FROM dbo.Article AS a
    WHERE a.IsActive = 1

But whilst researching the correct syntax for the index I realised that "a" I need a unique index and "b" apparently views with Unions can't be used to create a Fulltext index...

An alternative approach I saw was to create an FTI for each table, then in a Stored Proc, UNION them into a tmp table and then select the tmp table with with an Order By rank.

I'd really appreciate some guidance on this, most of what I've found related to multiple related tables where joins to a view are suffice to overcome the problem.

EDIT:

@Joe kindly answered this question which I'd forgotten about and actually solved in a fashion but was concerned that it was a bit long winded, it seems that it may be the most logical of the two ways he's suggested and here is what I'm using - I completely forgot that I had to paginate it to... I don't think the client would have been thrilled with an endless list of results...

A colleague of mine also suggested another technique he has seen implemented which is to throw metadata at the tables and basically cache the results in another table that you then do your fulltext search on, this isn't too bad an approach if you know what your metadata is going to be plus you also still need to key it up to your original tables to get actual result either immediately or to display (the full article if need be so-to-speak)

    CREATE PROCEDURE [dbo].[up_Search]
     @Term VARCHAR(100)
    ,@Skip INT = 0
    ,@Take INT = 10
AS
DECLARE @Search TABLE
(
     ItemId INT
    ,ItemType VARCHAR(50)
    ,ItemTitle VARCHAR(100)
    ,LongDescription VARCHAR(MAX)
    ,Price DECIMAL(10,2)
    ,SearchRank INT
)

INSERT INTO @Search SELECT * FROM (

    SELECT   p.ProductId AS ItemId
            ,'Product' AS ItemType
            ,p.Title AS ItemTitle
            ,p.LongDescription AS LongDescription
            ,p.Price AS Price
            ,KEY_TBL.RANK AS SearchRank
    FROM dbo.Product AS p
    INNER JOIN CONTAINSTABLE(dbo.Product, Title, @Term) AS KEY_TBL ON p.ProductId = KEY_TBL.[KEY]
    WHERE p.IsActive = 1

    UNION

    SELECT   a.ArticleId AS ItemId
            ,'Article' AS ItemType
            ,a.ArticleTitle AS ItemTitle
            ,a.Contents AS LongDescription
            ,NULL AS Price
            ,KEY_TBL.RANK AS SearchRank
    FROM dbo.Article AS a
    INNER JOIN CONTAINSTABLE(dbo.Article, ArticleTitle, @Term) AS KEY_TBL ON a.ArticleId = KEY_TBL.[KEY]
    WHERE a.IsActive = 1

    UNION    

    SELECT   n.NewsId AS ItemId
            ,'News' AS ItemType
            ,n.NewsTitle AS ItemTitle
            ,n.Contents AS LongDescription
            ,NULL AS Price
            ,KEY_TBL.RANK AS SearchRank
    FROM dbo.News AS n
    INNER JOIN CONTAINSTABLE(dbo.News, NewsTitle, @Term) AS KEY_TBL ON n.NewsId = KEY_TBL.[KEY]
    WHERE n.IsActive = 1

    UNION

    SELECT   b.BusinessId AS ItemId
            ,bt.Title AS ItemType
            ,b.Title AS ItemTitle
            ,b.LongDescription AS LongDescription
            ,NULL AS Price
            ,KEY_TBL.RANK AS SearchRank
    FROM dbo.Business AS b
    INNER JOIN CONTAINSTABLE(dbo.Business, Title, @Term) AS KEY_TBL ON b.BusinessId = KEY_TBL.[KEY]
    INNER JOIN dbo.BusinessType AS bt ON b.BusinessTypeId = bt.BusinessTypeId
    WHERE b.IsActive = 1
) AS tmp;

WITH SearchCT AS
(
    SELECT   ItemId
            ,ItemType
            ,ItemTitle
            ,LongDescription
            ,Price
            ,SearchRank
            ,ROW_NUMBER() OVER (ORDER BY SearchRank DESC) AS RowNumber
            ,COUNT(*) OVER () AS RecordCount
    FROM @Search
)
SELECT ItemId, ItemType, ItemTitle, LongDescription, SearchRank, RowNumber, RecordCount
FROM SearchCT
WHERE RowNumber BETWEEN @Skip + 1 AND (@Skip + @Take)
ORDER BY RowNumber

RETURN 0


Solution

  • I think you have two basic approaches you can take here:

    1) Aggregate the four tables into a single table and search on that table. You will need to have a unique identifier for the primary key on this table. So the table structure would be similar to the indexed view you were considering, and would look something like this:

    CREATE TABLE AggregatedTable
    (
        Id int IDENTITY(1,1) primary key,
        ItemId int,
        ItemType nvarchar(50),
        ItemTitle nvarchar(255),
        LongDescription nvarchar(max),
        IsActive int
    )
    

    You would then need to create a full-text index on the LongDescription column.

    The advantages of this approach are that you can do your full-text search on a single table in a single query, something like:

    SELECT Id, ItemId, ItemType, ct.RANK      
        FROM dbo.AggregateTable AS a INNER JOIN 
        CONTAINSTABLE (AggregateTable , *, '(light NEAR aluminum)',   1033) AS ct
            ON a.ItemId= ct.[KEY]
    WHERE IsActive = 1
    ORDER BY ct.RANK desc
    

    The disadvantages of this approach are: 1. You will have to periodically run a job to load the data from the 4 base tables into your aggregated table 2. You will be using twice the disk space

    A second approach is to keep the data in the four separate tables, and then write FTS queries that UNION the results from the four tables. You should be able to rank the results by relevance and then take the top N most relevant results. You would have to write queries something as follows:

    SELECT   p.ProductId AS ItemId, 'Product' AS ItemType, ct.RANK 'Rank'       
        FROM dbo.Product AS p INNER JOIN 
        CONTAINSTABLE (Product, *, '(light NEAR aluminum)',   1033) AS ct
            ON p.ProductId = ct.[KEY]
    WHERE p.IsActive = 1
    UNION
    SELECT   a.ArticleId AS ItemId, 'Article' AS ItemType, ct.RANK  
          CONTAINSTABLE (Article, *, '(light NEAR aluminum)',   1033) AS ct
            ON p.ProductId = ct.[KEY]  
        FROM dbo.Article AS a
        WHERE a.IsActive = 1
        ORDER BY 'Rank' DESC
    UNION ... other two tables
    

    The advantage of this approach is that you don't need to have jobs that aggregate content from the four tables into a single table.

    The disadvantage is that your querys are more complex, since they need to UNION results from four queries.

    I would lean toward the second approach. I think that is more straight-forward and easier to maintain, and the UNION queries are straight-forward to construct.