I'm trying to get a list of the src values and the source code from a https://search.brave.com/images?q= image search. I don't really know the problem, because the code works on other sites. Below can you see the code and the html tag that I'm trying to webscrape.
url = "https://search.brave.com/images?q=lfc"
r = requests.get(url)
content = r.content
soup = BeautifulSoup(content, "html.parser")
print("\n 1) Insert into .txt\n")
fp = urllib.request.urlopen(url)
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
with open("txt.txt", "w") as textFile:
textFile.write(mystr)
print("\n 2) Check if src == true \n")
with open('txt.txt') as f:
if 'src' in f.read():
print(" 2) True \n")
print(" 3) Find All Img")
anchors = soup.find_all('img')
all_links = set()
with open("imgUrls.txt", "w") as textFile_1:
for link in anchors:
if(link.get('src') != '#'):
linkText = url+str(link.get('src'))
all_links.add(link)
print(linkText)
textFile_1.writelines(linkText+'\n')
Below is the tag section in Brave browser, it is the img
tag with classname : image svelte-qd248k
that contains the src tag with a link. I want to gather all the src-links
from classname image svelte-qd248k
.
Images data is being retrieved from an API. You can get the info you need like so:
import requests
import pandas as pd
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.60 Safari/537.17'}
r = requests.get('https://search.brave.com/api/images?q=lfc&source=web', headers=headers)
df = pd.DataFrame(r.json()['results'])
print(df)
This will return a dataframe - 150 rows x 7 columns:
title url page_age safe source thumbnail properties
0 Lfc Images Free : 546 Lfc Photos Free Royalty ... https://fanniefrenzel.blogspot.com/2021/04/lfc... 2021-05-07T01:04:00.0000000Z True fanniefrenzel.blogspot.com {'src': 'https://imgs.search.brave.com/GvA-lkD... {'url': 'https://i.pinimg.com/originals/22/90/...
1 [76+] Lfc Wallpaper on WallpaperSafari https://wallpapersafari.com/lfc-wallpaper/ 2021-05-19T00:22:00.0000000Z True wallpapersafari.com {'src': 'https://imgs.search.brave.com/H1oCsoq... {'url': 'https://cdn.wallpapersafari.com/90/14...
2 The LFC Review - YouTube https://www.youtube.com/channel/UChf7tE8oAh4UK... 2020-05-28T10:59:00.0000000Z True YouTube {'src': 'https://imgs.search.brave.com/uuT_1hI... {'url': 'https://yt3.ggpht.com/a/AATXAJx70Gsn7...