pythonstringstripremoving-whitespace

Full list symbols stripped with str.strip() by default


As said in documentation:

str.strip([chars])
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace.

What is whitespace?

   import string    
   print(string.whitespace)

gives smth like ' \t\n\r\x0b\x0c'

in the same time,

'\t\n\r\f\x85\x1c\x1d\v\u2028\u2029'.strip() 

gives '' too.

So the question is: what is the full list of symbols striped by default with str.strip()?
Sorry but GPT says rubbish on it.


Solution

  • string.whitespace is documented as

    A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.

    It only includes ASCII whitespace, not all whitespace.

    As documented under str.isspace,

    A character is whitespace if in the Unicode character database (see unicodedata), either its general category is Zs (“Separator, space”), or its bidirectional class is one of WS, B, or S.

    This is the definition of whitespace used by str.strip. All characters with the listed Unicode properties will be stripped from the ends of the string. The code to check for this is generated from the Unicode character database and hardcoded into the Python interpreter, so it will reflect whatever version of the Unicode character database was used to build the Python version you're running.