unicodecharacterlettersalphabetlatin

Unicode letters with more than 1 alphabetic latin character?


I'm not really sure how to express it but I'm searching for unicode letters which are more than one visual latin letter.

I found this in Word so far:

Any others?


Solution

  • Here are some of the characters I've found. I'd first done this manually by looking at some probable blocks. However I've later written a Python script to do this automatically that you can find at the end of this answer

    Digraphs

    Two Glyphs Digraph Unicode Code Point HTML
    DZ, Dz, dz DZ, Dz, dz U+01F1 U+01F2 U+01F3 DZ Dz dz
    DŽ, Dž, dž DŽ, Dž, dž U+01C4 U+01C5 U+01C6 DŽ Dž dž
    IJ, ij IJ, ij U+0132 U+0133 IJ ij
    LJ, Lj, lj LJ, Lj, lj U+01C7 U+01C8 U+01C9 LJ Lj lj
    NJ, Nj, nj NJ, Nj, nj U+01CA U+01CB U+01CC NJ Nj nj

    Ligatures

    Non-ligature Ligature Unicode HTML
    AA, aa Ꜳ, ꜳ U+A732, U+A733 Ꜳ ꜳ
    AE, ae Æ, æ U+00C6, U+00E6 Æ æ
    AO, ao Ꜵ, ꜵ U+A734, U+A735 Ꜵ ꜵ
    AU, au Ꜷ, ꜷ U+A736, U+A737 Ꜷ ꜷ
    AV, av Ꜹ, ꜹ U+A738, U+A739 Ꜹ ꜹ
    AV, av (with bar) Ꜻ, ꜻ U+A73A, U+A73B Ꜻ ꜻ
    AY, ay Ꜽ, ꜽ U+A73C, U+A73D Ꜽ ꜽ
    et 🙰 U+1F670 🙰
    f‌f U+FB00 ff
    f‌f‌i U+FB03 ffi
    f‌f‌l U+FB04 ffl
    f‌i U+FB01 fi
    f‌l U+FB02 fl
    OE, oe Œ, œ U+0152, U+0153 Œ œ
    OO, oo Ꝏ, ꝏ U+A74E, U+A74F Ꝏ ꝏ
    ſs, ſz ẞ, ß U+1E9E, U+00DF ß
    st U+FB06 st
    ſt U+FB05 ſt
    TZ, tz Ꜩ, ꜩ U+A728, U+A729 Ꜩ ꜩ
    ue U+1D6B ᵫ
    VY, vy Ꝡ, ꝡ U+A760, U+A761 Ꝡ ꝡ

    There are a few other ligatures that are used for phonetic transcription but looks like Latin characters

    Non-ligature Ligature Unicode HTML
    db ȸ U+0238 ȸ
    dz ʣ U+02A3 ʣ
    IJ, ij IJ, ij U+0132, U+0133 IJ ij
    ls ʪ U+02AA ʪ
    lz ʫ U+02AB ʫ
    qp ȹ U+0239 ȹ
    ts ʦ U+02A6 ʦ
    ui U+AB50 ꭐ
    turned ui U+AB51 ꭑ

    https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode#Digraphs_and_ligatures


    Edit:

    There are more letterlike symbols beside ℻ and ℡ like what the OP found in the comment:

    ℀ ℁ ⅍ ℅ ℆ ℔ ℠ ™

    Longer letters are mainly from the CJK Compatibility block

    U+XXXX 0 1 2 3 4 5 6 7 8 9 A B C D E F
    U+338x
    U+339x
    U+33Ax
    U+33Bx
    U+33Cx
    U+33Dx

    Among the 3-letter-like symbols are ㎈ ㎑ ㎒ ㎓ ㎔㏒ ㏕ ㏖ ㏙ ㎪ ㎫ ㎬ ㎭ ㏆ ㏿ ㍱... Probably the ones with most characters are ㎉ and ㎯

    Unicode even have codepoints for Roman numerals. Here another 4-letter-like character can be found: Ⅷ

    U+XXXX 0 1 2 3 4 5 6 7 8 9 A B C D E F
    U+215x
    U+216x
    U+217x
    U+218x

    If normal numbers can be considered then there are some other code points for multiple digits like ⒆ ⒇ ⓳ ⓴ in enclosed alphanumerics

    U+XXXX 0 1 2 3 4 5 6 7 8 9 A B C D E F
    U+246x
    U+247x
    U+248x
    U+249x
    U+24Ax
    U+24Bx
    U+24Cx
    U+24Dx
    U+24Ex
    U+24Fx

    and in Enclosed Alphanumeric Supplement

    🅫, 🅪, 🆋, 🆌, 🆍, 🄭, 🄮, 🅊, 🅋, 🅌, 🅍, 🅎, 🅏

    A few more:

    Currency symbol group

    ₧ ₨ ₶ ₯ ₠ ₢ ₷

    Miscellaneous technical group

    ⎂ ⏨

    Control pictures (probably you'll need to zoom out to see)

    U+XXXX 0 1 2 3 4 5 6 7 8 9 A B C D E F
    U+240x
    U+241x
    U+242x

    Alchemical Symbols

    🜀 🜅 🜆 🜇 🜈 🝪 🝫 🝬 🝛 🝜 🝝

    Musical Symbols

    𝄶 𝄷 𝄸 𝄹 𝄉 𝄊 𝄫

    And there are the emojis 🔟 💤🆔🚾🆖🆗🔢🔡🔠 💯🆘🆎🆑™🔙🔚🔜🔝🔛📆🗓🔞

    Vertical bars may be considered uppercase i or lowercase L (like your 〷 example which is actually the TELEGRAPH LINE FEED SEPARATOR SYMBOL) and we have


    Here's the automatic script to find the multi-character letters

    import unicodedata
    
    for c in range(0, 0x10FFFF + 1):
        d = unicodedata.normalize('NFKD', chr(c))
        if len(d) > 1 and d.isascii() and d.isalpha():
            print("U+%04X (%s): %s\n" % (c, chr(c), d))
    

    It won't be able to find many ligatures like æ or œ because they're not considered orthographic ligatures and aren't decomposable in Unicode. Here's the result in Unicode 11.0.0 (checked with unicodedata.unidata_version)

    U+0132 (IJ): IJ
    U+0133 (ij): ij
    U+01C7 (LJ): LJ
    U+01C8 (Lj): Lj
    U+01C9 (lj): lj
    U+01CA (NJ): NJ
    U+01CB (Nj): Nj
    U+01CC (nj): nj
    U+01F1 (DZ): DZ
    U+01F2 (Dz): Dz
    U+01F3 (dz): dz
    U+20A8 (₨): Rs
    U+2116 (№): No
    U+2120 (℠): SM
    U+2121 (℡): TEL
    U+2122 (™): TM
    U+213B (℻): FAX
    U+2161 (Ⅱ): II
    U+2162 (Ⅲ): III
    U+2163 (Ⅳ): IV
    U+2165 (Ⅵ): VI
    U+2166 (Ⅶ): VII
    U+2167 (Ⅷ): VIII
    U+2168 (Ⅸ): IX
    U+216A (Ⅺ): XI
    U+216B (Ⅻ): XII
    U+2171 (ⅱ): ii
    U+2172 (ⅲ): iii
    U+2173 (ⅳ): iv
    U+2175 (ⅵ): vi
    U+2176 (ⅶ): vii
    U+2177 (ⅷ): viii
    U+2178 (ⅸ): ix
    U+217A (ⅺ): xi
    U+217B (ⅻ): xii
    U+3250 (㉐): PTE
    U+32CC (㋌): Hg
    U+32CD (㋍): erg
    U+32CE (㋎): eV
    U+32CF (㋏): LTD
    U+3371 (㍱): hPa
    U+3372 (㍲): da
    U+3373 (㍳): AU
    U+3374 (㍴): bar
    U+3375 (㍵): oV
    U+3376 (㍶): pc
    U+3377 (㍷): dm
    U+337A (㍺): IU
    U+3380 (㎀): pA
    U+3381 (㎁): nA
    U+3383 (㎃): mA
    U+3384 (㎄): kA
    U+3385 (㎅): KB
    U+3386 (㎆): MB
    U+3387 (㎇): GB
    U+3388 (㎈): cal
    U+3389 (㎉): kcal
    U+338A (㎊): pF
    U+338B (㎋): nF
    U+338E (㎎): mg
    U+338F (㎏): kg
    U+3390 (㎐): Hz
    U+3391 (㎑): kHz
    U+3392 (㎒): MHz
    U+3393 (㎓): GHz
    U+3394 (㎔): THz
    U+3396 (㎖): ml
    U+3397 (㎗): dl
    U+3398 (㎘): kl
    U+3399 (㎙): fm
    U+339A (㎚): nm
    U+339C (㎜): mm
    U+339D (㎝): cm
    U+339E (㎞): km
    U+33A9 (㎩): Pa
    U+33AA (㎪): kPa
    U+33AB (㎫): MPa
    U+33AC (㎬): GPa
    U+33AD (㎭): rad
    U+33B0 (㎰): ps
    U+33B1 (㎱): ns
    U+33B3 (㎳): ms
    U+33B4 (㎴): pV
    U+33B5 (㎵): nV
    U+33B7 (㎷): mV
    U+33B8 (㎸): kV
    U+33B9 (㎹): MV
    U+33BA (㎺): pW
    U+33BB (㎻): nW
    U+33BD (㎽): mW
    U+33BE (㎾): kW
    U+33BF (㎿): MW
    U+33C3 (㏃): Bq
    U+33C4 (㏄): cc
    U+33C5 (㏅): cd
    U+33C8 (㏈): dB
    U+33C9 (㏉): Gy
    U+33CA (㏊): ha
    U+33CB (㏋): HP
    U+33CC (㏌): in
    U+33CD (㏍): KK
    U+33CE (㏎): KM
    U+33CF (㏏): kt
    U+33D0 (㏐): lm
    U+33D1 (㏑): ln
    U+33D2 (㏒): log
    U+33D3 (㏓): lx
    U+33D4 (㏔): mb
    U+33D5 (㏕): mil
    U+33D6 (㏖): mol
    U+33D7 (㏗): PH
    U+33D9 (㏙): PPM
    U+33DA (㏚): PR
    U+33DB (㏛): sr
    U+33DC (㏜): Sv
    U+33DD (㏝): Wb
    U+33FF (㏿): gal
    U+FB00 (ff): ff
    U+FB01 (fi): fi
    U+FB02 (fl): fl
    U+FB03 (ffi): ffi
    U+FB04 (ffl): ffl
    U+FB05 (ſt): st
    U+FB06 (st): st
    U+1F12D (🄭): CD
    U+1F12E (🄮): WZ
    U+1F14A (🅊): HV
    U+1F14B (🅋): MV
    U+1F14C (🅌): SD
    U+1F14D (🅍): SS
    U+1F14E (🅎): PPV
    U+1F14F (🅏): WC
    U+1F16A (🅪): MC
    U+1F16B (🅫): MD
    U+1F190 (🆐): DJ