I'm implementing an alphabetic search based on telephone keypad, like Phone keypad1
When user types , say 2, I get {A, B, C} in the combination. When user types 23, I get {AD, AE, AF, BD, BE, BF, CD, CE, CF} in the combinations, and so on. If I keep typing and make combinations I get thousands of combinations which make search process quite slow. So now I want to implement an algorithm which delete illogical combinations like CF BD CD, I mean logically no one's name starts with these combinations, perhaps two consonants without vowel. So this way I want to narrow down my search. Anyone knowing about such state machine, implemented in C?
Keep in mind that when it comes to linguistic data "illogical" is not a good proxy for "unlikely." This is particularly true when it comes to names. As an example, according to a standard definition of "consonant" in English, my last name starts with four consonants. If it were to be written after a German fashion, it would start with five. When thinking about such issues it is useful to keep in mind that:
m
plays the role of the syllabic nucleous, like the vowel generally do in English. So a Berber name can look like CCmC
(where C stands for consonants) and be perfect in that language. It is not unlikely that a person of Berber origin would then use the similar orthography in English, which a naive system would rule out as "illogical"sh
represents one sound, see point 1), but this is particularly true when transcribing foreign words.So unless you know very well the orthographic rules for the names you are expecting, you are likely to rule out legitimate names using a naive system.