.nettextunicodediacritics

How can I "flatten" text that contains macrons and umlauts in .NET?


Possible Duplicate:
How to convert a Unicode character to its ASCII equivalent
How do I remove diacritics (accents) from a string in .NET?

I need to make a search form insensitive to text that contains macrons, umlauts, etc.

For example, "ŌōṒṓṐṑȪȫ" should be considered equal to "oooooooo".

In TSQL I'm able to get it partially working with:

select Cast('ŌōṒṓṐṑȪȫ' as varchar)

which returns Oo??????. It is smart enough to translate the first two characters to "O" and "o".

I was trying to use this C# code to "flatten" the text but it doesn't work at all. The result is "????????".

var text = "ŌōṒṓṐṑȪȫ";
var buffer = Encoding.Convert(Encoding.Unicode, Encoding.ASCII, Encoding.Unicode.GetBytes(text));

var result = Encoding.ASCII.GetString(buffer);

Is there a way do this in .NET? I know I could create a map that links characters such as "ŌōṒṓṐṑȪȫ" to "o" and so on for other characters, but I'm hoping there is already a built-in way to do this.


Solution

  • You don't need to do normalization, it is time consuming, and there is something better.

    Most string comparison operations have a flavor that takes a CompareOptions. You can use this for CompareOptions:

    static_cast<CompareOptions>(CompareOptions::IgnoreCase | CompareOptions::IgnoreNonSpace)
    

    See the CompareInfo class http://msdn.microsoft.com/en-us/library/2z428sw8.aspx