pythonplaywrightplaywright-python

Regex is not working in Python Playwright page.wait_for_url()?


I found a strange difference in Python VS JavaScript regex implementation of page.waitForURL() / page.wait_for_url().

In python version this code doesn't work:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://playwright.dev/python/docs/api/class-page")
    page.wait_for_url(r"docs/api")
    browser.close()

In JavaScript version it works fine:

// @ts-check
const playwright = require('playwright');

(async () => {
  // Try to add 'playwright.firefox' to the list ↓
  for (const browserType of [playwright.chromium, playwright.webkit]) {
    const browser = await browserType.launch();
    const context = await browser.newContext();
    const page = await context.newPage();
    await page.goto('https://playwright.dev/python/docs/api/class-page');
    await page.waitForURL(/docs\/api/);
    await browser.close();
  }
})();

Solution

  • r"" strings are raw strings in Python, not regex strings. It so happens that many regex functions in the re library accept plain strings as arguments, which require escaping, so it's common to see r"" strings in conjunction with regex. The purpose is to ensure all \ characters are literal, so you can write \b rather than \\b.

    Instead of using a compiled regex as described in hardcoded's answer, I'd suggest using the **/docs/api syntax, which works in both Python

    page.wait_for_url("**/docs/api/*")
    

    and JS

    await page.waitForURL("**/docs/api/*");
    

    To me, this reads easier (in Python, no re.compile() call and having to remember to add the r to r"", and in JS, no backslash escaping) and is less prone to unexpected regex characters creating false positives.

    If you pass in a plain string (raw or otherwise) rather than a regex, Playwright assumes you're using this "glob" syntax, which is mostly a normal string with asterisks for skipping one or more path segments. page.wait_for_url(r"docs/api") is waiting for an exact match that never happens, hence the timeout.