python-3.xcommentspython-unicodeunicode-stringunicode-literals

Python encoding errors from comments containing Windows paths


I want to include Windows paths in python script comments, without causing an encoding error.

If I include a Windows path in a comment, I will sometimes get an encoding error, e.g., "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 4612: invalid start byte".
I found one "article" which indicated that including a Windows path in a comment can trigger a unicode error, https://programmersought.com/article/28013377080/.

On the other hand, sometimes I can include a Windows path in a comment, without triggering a unicode error. I don't understand why some Windows paths trigger errors, and other paths do not.

The following are a few examples of Windows paths that do, or do not cause encoding errors, as indicated below:

'''

OK      # E:\Apps\ParticlesByMarc\regularexpression_info_SAVE_aaa_.py
ERROR   # E:\Apps\UnitiesByMarc\regularexpression_info_SAVE_aaa_.py
OK      # E:\Apps\ UnitiesByMarc\regularexpression_info_SAVE_aaa_.py# File 
ERROR   # E:\ Apps\ UnitiesByMarc\xxx\regularexpression_info_SAVE_aaa_py
OK      # E:\ Apps\ UnitiesByMarc\ xxx\regularexpression_info_SAVE_aaa_py
OK      # File E:\ Apps\ UnitiesByMarc\x123x\regularexpression_info_SAVE_aaa_py

'''

I cannot figure out what makes two of those Windows path formats OK to be included in a comment, and the other four not OK to be included in a comment.

My questions:

  1. Is there something I could do to format the comment so that I would not have to insert a space after each backslash?
  2. If there are other limits on text that can be included in a comment, where can I find a list of those limits?
  3. Where can I find the rules that identify and explain the reason for the limitations?

Any suggestions about how to find the answer would be very welcome.

Thanks, Marc


Solution

  • A triple quoted string isn't a comment; it's a string which could become a Docstring:

    A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object.

    Example:

    def somefunc(somepar):
      r'''
    This is a docstring
    
      E:\Apps\ParticlesByMarc\regularexpression_info_SAVE_aaa_.py
      E:\Apps\UnitiesByMarc\regularexpression_info_SAVE_aaa_.py
    # E:\Apps\UnitiesByMarc\regularexpression_info_SAVE_aaa_.py # File 
    # E:\Apps\UnitiesByMarc\xxx\regularexpression_info_SAVE_aaa_py
      E:\Apps\UnitiesByMarc\xxx\regularexpression_info_SAVE_aaa_py
    # File E:\Apps\UnitiesByMarc\x123x\regularexpression_info_SAVE_aaa_py
    
      '''
      print('supplied:', somepar, end='\n\n')
      '''
    This isn't recognized as a docstring (i.e. not assigned to __doc__)
      '''
    
    
    somefunc('par')
    help(somefunc)
    

    Result: .\SO\68553726.py

    supplied: par
    
    Help on function somefunc in module __main__:
    
    somefunc(somepar)
        This is a docstring
    
          E:\Apps\ParticlesByMarc\regularexpression_info_SAVE_aaa_.py
          E:\Apps\UnitiesByMarc\regularexpression_info_SAVE_aaa_.py
        # E:\Apps\UnitiesByMarc\regularexpression_info_SAVE_aaa_.py # File
        # E:\Apps\UnitiesByMarc\xxx\regularexpression_info_SAVE_aaa_py
          E:\Apps\UnitiesByMarc\xxx\regularexpression_info_SAVE_aaa_py
        # File E:\Apps\UnitiesByMarc\x123x\regularexpression_info_SAVE_aaa_py