pythonregexpython-re

How to use Python Regex to match url


I have a string:

test_string="lots of other html tags ,'https://news.sky.net/upload_files/image/2022/202209_166293.png',and still 'https://news.sky.net/upload_files/image/2022/202209_166293.jpg'"

How can I get the whole 2 urls in the string,by using python Regex ?

I tried:

pattern = 'https://news.sky.net/upload_files/image'
result = re.findall(pattern, test_string)

I can get a list:

['https://news.sky.net/upload_files/image','https://news.sky.net/upload_files/image']

but not the whole url ,so I tried:

pattern = 'https://news.sky.net/upload_files/image...$png'
result = re.findall(pattern, test_string)

But received an empty list.


Solution

  • You could match a minimal number of characters after image up to a . and either png or jpg:

    test_string = "lots of other html tags ,'https://news.sky.net/upload_files/image/2022/202209_166293.png',and still 'https://news.sky.net/upload_files/image/2022/202209_166293.jpg'"
    pattern = r'https://news.sky.net/upload_files/image.*?\.(?:png|jpg)'
    re.findall(pattern, test_string)
    

    Output:

    [
     'https://news.sky.net/upload_files/image/2022/202209_166293.png',
     'https://news.sky.net/upload_files/image/2022/202209_166293.jpg'
    ]