I want to use a larger variety of Unicode symbols for variable names in my Python 3 scripts. What characters are acceptable to use in Python 3 variable names?
I recently started using Unicode symbols (such as Greek and Asian symbols) for code obfuscation.
According to PEP 3131, the first character of an identifier needs to belong to ID_Start
, the rest to ID_Continue
, defined as follows:
ID_Start
is defined as all characters having one of the general categories uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers (Nl), the underscore, and characters carrying the Other_ID_Start property. XID_Start then closes this set under normalization, by removing all characters whose NFKC normalization is not of the formID_Start ID_Continue*
anymore.
ID_Continue
is defined as all characters inID_Start
, plus nonspacing marks (Mn), spacing combining marks (Mc), decimal number (Nd), connector punctuations (Pc), and characters carryig the Other_ID_Continue property. Again,XID_Continue
closes this set under NFKC-normalization; it also addsU+00B7
to support Catalan.
That's a long list (currently around 120.000 characters) - fortunately there is a helpful project on GitHub that contains the list and a script to generate it.