pythonvariablesunicodesyntaxpython-3.x

What Unicode symbols are accepted in Python 3 variable names?


I want to use a larger variety of Unicode symbols for variable names in my Python 3 scripts. What characters are acceptable to use in Python 3 variable names?

I recently started using Unicode symbols (such as Greek and Asian symbols) for code obfuscation.


Solution

  • According to PEP 3131, the first character of an identifier needs to belong to ID_Start, the rest to ID_Continue, defined as follows:

    ID_Start is defined as all characters having one of the general categories uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers (Nl), the underscore, and characters carrying the Other_ID_Start property. XID_Start then closes this set under normalization, by removing all characters whose NFKC normalization is not of the form ID_Start ID_Continue* anymore.

    ID_Continue is defined as all characters in ID_Start, plus nonspacing marks (Mn), spacing combining marks (Mc), decimal number (Nd), connector punctuations (Pc), and characters carryig the Other_ID_Continue property. Again, XID_Continue closes this set under NFKC-normalization; it also adds U+00B7 to support Catalan.

    That's a long list (currently around 120.000 characters) - fortunately there is a helpful project on GitHub that contains the list and a script to generate it.