textencodingnotepad++text-editorkindle

Stripping all but visible characters from copied text (Invisible control characters corrupting code)


I've copied some code from a kindle e book, for pasting into a Jupyter notebook. Python reports errors when trying to run the code. For context, I'm running the notebook in VSCode, but that is not in it's self the issue. The chrome extension I'm using to facilitate the copying is here

Here's an example of what I see in the editor when pasting text into the notebook from the kindle ebook:

housing["income_cat"] = pd.cut(housing["median_income"], bins=[0., 1.5, 3.0, 4.5, 6., np.inf], labels=[1, 2, 3, 4, 5]) 
housing["income_cat"].hist()

The Jupyter notebook reports SyntaxError: invalid character in identifier

When I inspect the encoding in Notepad++, I see the encoding reported as UTF-8.

If I convert to UTF8 and view as ANSI I see the string:

housing["income_cat"] = pd.cut(housing["median_income"], bins=[0., 1.5, 3.0, 4.5, 6., np.inf], labels=[1, 2, 3, 4, 5]) housing["income_cat"].hist()

If I convert to ANSI and view as UTF8 I see the  replaced with symbol xA0

So there appears to be a control character being copied along with the text.

Is there a tool I can paste into, or a way that I can use notepad++ that will strip everything except visible white space and text?


Solution

  • Update

    I'm needing to apply the below resolution enough that I made a little VSCode extension for replacing non printing (NPC) control characters:

    https://github.com/appsoftwareltd/no-control

    Hope it helps!


    The character according to this website is

    Character: Â    
    ANSI Number: 194    
    Unicode Number: 194 
    ANSI Hex: 0xC2  
    Unicode Hex: U+00C2 
    HTML 4.0 Entity: Â    
    Unicode Name: Latin capital letter A with circumflex    
    Unicode Range: Latin-1 Supplement
    

    Resolution has been to replace regex matches for [^\x00-\x7f] with a white space character.

    As found here:

    https://weblogs.asp.net/kon/finding-those-pesky-unicode-characters-in-visual-studio