pythonregexcalibre

Regex Pattern to Match Character Not Surrounded by Particular Characters


I'm trying to use a regular expression to match a certain character only when it isn't immediately adjacent to a certain character. (For an eBook in Calibre)

Specifically, I want to match all that aren't at the end of a sentence, which means they will be between regular characters, not an angle bracket or space. I thought ”[^<] would work, but that selects both the quotation mark and the next character, not just the character itself. I'm also not sure how to do an OR to check for a space. I'm assuming it would be something like ”[^<]|[^ ] but that's not right either.

Here's an example of what I would like to match:

Beside angle bracket: <p class="calibre1">“I”m tired!”</p>

Beside space: <p class="calibre1">“I”m tired!” he said</p>

Only the quotation mark within I”m should be selected (and only the quotation mark itself)

I'm sorry if there's an obvious answer for this, but I've been reading over Python's regex documentation and I can't figure it out. :(


Solution

  • You can perhaps use a negative lookahead (?! ... ) like that:

    ”(?!<)
    

    This will match unless it is followed by <.

    To add the space...

    ”(?![< ])
    

    That one will match unless followed by < or space.