sql-serverfull-text-searchfull-text-indexingfulltext-indexfull-text-catalog

What will happen in SQL Server if a fulltext search is done using fulltext index defined for other language


I am new to SQL Server full text search.

I have a table in which a column titled description has type ntext and it can contain data in any language.

Now I was going to implement fulltext search and on googling a bit I found currently it is not an easy option to use same index for multilanguages.

I was wondering what will happen if I create a fulltext index for English using code 1033 in database, and than use same index for searching using non-english string and if some records have non-english data in them.

Will it completely fail or will it return some data. What will be the behavior exactly?


Solution

  • It won't completely fail but you will have unwanted behavior for some searches. Here are the areas I can think of where you will run into problem, though it is probably not a complete list.

    1. Words will be broken apart in the index according to English rules. (ex: dog-catcher is split so that it can be matched you can search on dog or catcher. But dog's is treated as 1 word and will not match dog.) I'm sure there are other languages where these rules are not the same or where certain punctuation symbols play a different role and thus the words will not be broken apart as expected.
    2. If you are using an English stop list, any non-English words with the same spelling as common English words (ex: is, at, as, can) will be removed from your index.
    3. You won't be able to use FREETEXT/FREETEXTTABLE or FORMSOF because they will use English synonyms and English inflectional forms.
    4. When using NEAR, the rules for determining word distance could vary.
    5. Searches for quoted phrases (ex: CONTAINS(*, '"planet earth"')) may have unpredictable results. The full text engine will apply English language rules to how the words are parsed and how to deal with punctuation. For example, when searching for "a. lincoln" in an English index, the parser will think that a. is the end of a sentence and thus may not match the text a lincoln in the index. If you're dealing with a language that has different rules about how sentences are ended or how periods are used with abbreviations then you could run into problems. (That's just one example. There are likely more potential issues.)
    6. Searches on numbers may have unpredictable results. For example, in English you can use a comma separator in large numbers (ex: 1,234,567) and full text will match this to 1234567 and vice-versa. If you're dealing with a language that has different number formatting rules then you could run into problems.

    You may be best off using the Neutral language without a stop list.