pythonparsingindexingisspace

Wrong output from code using isspace() test over index values


For some reason this code is not working correctly. I am trying to only replace dashes that do not have whitespace around them. However, dashes are still getting replaced when there is no white space.

    ls = []
    for idx, letter in enumerate(line):
        if letter == '-':
            ls.append(idx)
    for m in ls:
        if line[m-1].isspace() == True and line[m+1].isspace() == True:
            line = line[m].replace('-', ' @-@ ')

For example:

If thieves came to you, if robbers by night -- oh, what disaster awaits you -- wouldn't they only steal until they had enough? If grape pickers came to you, wouldn't they leave some gleaning grapes?
How Esau will be ransacked! How his hidden treasures are sought out example-case!

Gives:

If thieves came to you , if robbers by night  @-@  @-@  oh , what disaster awaits you  @-@  @-@  wouldn ' t they only steal until they had enough ? If grape pickers came to you , wouldn ' t they leave some gleaning grapes ?
How Esau will be ransacked ! How his hidden treasures are sought out example @-@ case !

Note: there is other data tokenization going on here.

Desired output is:

If thieves came to you , if robbers by night -- oh , what disaster awaits you -- wouldn ' t they only steal until they had enough ? If grape pickers came to you , wouldn ' t they leave some gleaning grapes ?
How Esau will be ransacked ! How his hidden treasures are sought out example @-@ case !

Thank you for your help!


Solution

  • You're mutating the line as you're accessing it, so your indices will be wrong without manually fixing them up.

    This really is a case where you'll want to use a regular expression using a lookbehind:

    import re
    
    line = "How his hidden treasures -- oh, what was the line again -- are sought out example-case!"
    fixed_line = re.sub(r"(?<=[^\s])-(?=[^\s])", " @-@ ", line)
    print(fixed_line)
    

    outputs

    How his hidden treasures -- oh, what was the line again -- are sought out example @-@ case!