Transliterator::listIDs()
will list IDs, but apparently it's not a complete list.
In the example from this page, the ID looks like:
Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();
which is kind of weird, because IDs are supposed to be unique. This looks more like a rule, but it doesn't work if I pass it to the createFromRules
method :)
Anyway, I'm trying to remove any punctuation from the string, except dash (-
), or characters from a specific list.
Do you know if that's possible? Or is there some documentation that better explains the syntax for the transliterator ?
The ids that Transliterator::listIDs()
are the "basic ids". The example you gave is a "compound id". You can see the ICU docs on this.
You can also create your own rules with Transliterator::createFromRules()
.
You can take a look at the prefefined rules:
<?php
$a = new ResourceBundle(NULL, sprintf('icudt%dl-translit', INTL_ICU_VERSION), true);
foreach ($a['RuleBasedTransliteratorIDs'] as $name => $v) {
$file = @$v['file'];
if (!$file) {
$file = $v['internal'];
echo $name, " (direction $file[direction]; internal)\n";
} else {
echo $name, " (direction: $file[direction])\n";
echo $file['resource'];
}
echo "\n--------------\n";
}
After formatting, the result looks like this.