pythonweb-scrapingbeautifulsouppython-requestsbrave

How to webscrape image src's from Brave Browser


I'm trying to get a list of the src values and the source code from a https://search.brave.com/images?q= image search. I don't really know the problem, because the code works on other sites. Below can you see the code and the html tag that I'm trying to webscrape.

url = "https://search.brave.com/images?q=lfc"
r = requests.get(url)
content = r.content
soup = BeautifulSoup(content, "html.parser")

print("\n 1) Insert into .txt\n")
fp = urllib.request.urlopen(url)
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
with open("txt.txt", "w") as textFile:
    textFile.write(mystr)

print("\n 2) Check if src == true \n")
with open('txt.txt') as f:
    if 'src' in f.read():
        print(" 2) True \n")

print(" 3) Find All Img")
anchors = soup.find_all('img')
all_links = set()
with open("imgUrls.txt", "w") as textFile_1:
    for link in anchors:
        if(link.get('src') != '#'): 
            linkText = url+str(link.get('src'))
            all_links.add(link)
            print(linkText)
            textFile_1.writelines(linkText+'\n')

Below is the tag section in Brave browser, it is the img tag with classname : image svelte-qd248k that contains the src tag with a link. I want to gather all the src-links from classname image svelte-qd248k.

Brave browser tags


Solution

  • Images data is being retrieved from an API. You can get the info you need like so:

    import requests
    import pandas as pd
    
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.60 Safari/537.17'}
    
    r = requests.get('https://search.brave.com/api/images?q=lfc&source=web', headers=headers)
    df = pd.DataFrame(r.json()['results'])
    print(df)
    

    This will return a dataframe - 150 rows x 7 columns:

    title   url page_age    safe    source  thumbnail   properties
    0   Lfc Images Free : 546 Lfc Photos Free Royalty ...   https://fanniefrenzel.blogspot.com/2021/04/lfc...   2021-05-07T01:04:00.0000000Z    True    fanniefrenzel.blogspot.com  {'src': 'https://imgs.search.brave.com/GvA-lkD...   {'url': 'https://i.pinimg.com/originals/22/90/...
    1   [76+] Lfc Wallpaper on WallpaperSafari  https://wallpapersafari.com/lfc-wallpaper/  2021-05-19T00:22:00.0000000Z    True    wallpapersafari.com {'src': 'https://imgs.search.brave.com/H1oCsoq...   {'url': 'https://cdn.wallpapersafari.com/90/14...
    2   The LFC Review - YouTube    https://www.youtube.com/channel/UChf7tE8oAh4UK...   2020-05-28T10:59:00.0000000Z    True    YouTube {'src': 'https://imgs.search.brave.com/uuT_1hI...   {'url': 'https://yt3.ggpht.com/a/AATXAJx70Gsn7...