pythonregexcalibre

Python look-behind regex issue: Invalid regular expression: look-behind requires fixed-width pattern


I need to match a linebreak in-between double quotes, as in:

<p class="calibre1">“This is the first sentence.</p>
<p class="calibre1">And this is the second!”</p>

This would match </p> <p class="calibre1">

Now, I got this working with the regex (?<=“[^”]*)</p>\s*<p[^>]*>(?!“) but I get the error described in the title: "Invalid regular expression: look-behind requires fixed-width pattern" when I try to use it non-manually. I need this regex for the eBook management/editing program, Calibre, which uses Python for its regex engine. The regex above works for manually searching a book, but when I try to include the regex as a "common option" (run on each eBook conversion) I get that error.

I don't see how it's possible to do this without a variable width look-behind, since you can't know how long it will be from the left doublequote to the linebreak. Help would be much appreciated!


Solution

  • Python re module, as most languages (with the notable exception of .NET), doesn't support variable length lookbehind.

    Can't you use a capturing group instead ?

    “[^”]*(</p>\s*<p[^>]*>)
    

    Data in the first capturing group.