phptransliterationintl

Where can I find a list of IDs or rules for the PHP transliterator (Intl)?


Transliterator::listIDs() will list IDs, but apparently it's not a complete list.

In the example from this page, the ID looks like:

Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();

which is kind of weird, because IDs are supposed to be unique. This looks more like a rule, but it doesn't work if I pass it to the createFromRules method :)

Anyway, I'm trying to remove any punctuation from the string, except dash (-), or characters from a specific list.

Do you know if that's possible? Or is there some documentation that better explains the syntax for the transliterator ?


Solution

  • The ids that Transliterator::listIDs() are the "basic ids". The example you gave is a "compound id". You can see the ICU docs on this.

    You can also create your own rules with Transliterator::createFromRules().

    You can take a look at the prefefined rules:

    <?php
    $a = new ResourceBundle(NULL, sprintf('icudt%dl-translit', INTL_ICU_VERSION), true);
    
    foreach ($a['RuleBasedTransliteratorIDs'] as $name => $v) {
        $file = @$v['file'];
        if (!$file) {
            $file = $v['internal'];
            echo $name, " (direction $file[direction]; internal)\n";
        } else { 
            echo $name, " (direction: $file[direction])\n";
            echo $file['resource'];
        }
        echo "\n--------------\n";
    }
    

    After formatting, the result looks like this.