objective-cstringurl

Remove apostrophe in CFStringTransform results


I'm converting Russian (or any language) string to a good looking Latin string to use in URL like example.com/obezd-pedestala

I use this code:

CFMutableStringRef bufferRef = (__bridge CFMutableStringRef)buffer;
CFStringTransform(bufferRef, NULL, kCFStringTransformToLatin, false);
CFStringTransform(bufferRef, NULL, kCFStringTransformStripCombiningMarks, false);
CFStringTransform(bufferRef, NULL, kCFStringTransformStripDiacritics, false);

If I pas string like buffer Объезд пьедестала, I get Obʺezd pʹedestala. Letter ъ is replaced by ʺ and ь is replaced by ʹ.

I can use stringByAddingPercentEscapesUsingEncoding to get a valid URL of course, but this is not a good looking URL I want.

How can I remove all those quotes and god knows what else characters from resulting string?


Solution

  • The docs for CFStringTransform() note that it can take "any valid ICU transform ID defined in the ICU User Guide for Transforms". From that and a bit of knowledge about Unicode categories, I came up with the following, which will strip such odd characters from the string:

    CFStringTransform(bufferRef, NULL, CFSTR("[^[:Latin:][:space:][:number:]] Remove"), false);
    

    Apparently, kCFStringTransformToLatin does not leave only characters in the Latin category. The above transform removes any character which is not in the union of the Latin, space, and number categories. You could customize that further with different character sets if you have different needs.