pythonregexpython-requestspython-re

Grabbing a specific url from a webpage with re and requests


import requests, re

r = requests.get('example.com')
p = re.compile('\d')

print(p.match(str(r.text)))

This always prints None, even though r.text definitely contains numbers, but print(p.match('12345')) works. What do I need to do to r.text to make it readable by re.compile.match()? Casting to str is clearly insufficient.


Solution

  • It is because re.match only checks for a match at the beginning of the string, and r.text does not start with a number.

    If you want to find the first match, then use re.search instead:

    import requests, re
    
    r = requests.get('https://example.com')
    p = re.compile(r'\d')
    
    print(p.search(r.text))
    

    Output:

    <re.Match object; span=(88, 89), match='8'>
    

    From the docs:

    Pattern.match: If zero or more characters at the beginning of string match this regular expression, return a corresponding Match.

    Pattern.search: Scan through string looking for the first location where this regular expression produces a match, and return a corresponding Match.