I am new to SQL Server full text search.
I have a table in which a column titled description has type ntext
and it can contain data in any language.
Now I was going to implement fulltext search and on googling a bit I found currently it is not an easy option to use same index for multilanguages.
I was wondering what will happen if I create a fulltext index for English using code 1033 in database, and than use same index for searching using non-english string and if some records have non-english data in them.
Will it completely fail or will it return some data. What will be the behavior exactly?
It won't completely fail but you will have unwanted behavior for some searches. Here are the areas I can think of where you will run into problem, though it is probably not a complete list.
dog-catcher
is split so that it can be matched you can search on dog
or catcher
. But dog's
is treated as 1 word and will not match dog
.) I'm sure there are other languages where these rules are not the same or where certain punctuation symbols play a different role and thus the words will not be broken apart as expected.FREETEXT
/FREETEXTTABLE
or FORMSOF
because they will use English synonyms and English inflectional forms.NEAR
, the rules for determining word distance could vary.CONTAINS(*, '"planet earth"')
) may have unpredictable results. The full text engine will apply English language rules to how the words are parsed and how to deal with punctuation. For example, when searching for "a. lincoln"
in an English index, the parser will think that a.
is the end of a sentence and thus may not match the text a lincoln
in the index. If you're dealing with a language that has different rules about how sentences are ended or how periods are used with abbreviations then you could run into problems. (That's just one example. There are likely more potential issues.)1,234,567
) and full text will match this to 1234567
and vice-versa. If you're dealing with a language that has different number formatting rules then you could run into problems.You may be best off using the Neutral language without a stop list.