I'm trying to read a picture from a website. This is my code so far:
from bs4 import BeautifulSoup
import requests
url = 'https://www.basketball-reference.com/players/h/hardeja01.html'
page_request = requests.get(url)
soup = BeautifulSoup(page_request.text,"lxml")
img_src = soup.find("div", {"class": "media-item"})
print img_src
# <div class="media-item"><img alt="Photo of James Harden" itemscope="image" src="https://d2cwpp38twqe55.cloudfront.net/req/201804182/images/players/hardeja01.jpg"/>\n</div>
I'm interested in the url of the jpg image. I can write some regular expression to get the jpg but there must be some easier way to do that.
What is the best way to extract the url of the jpg?
You can do that in several ways. This as one of such approach:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.basketball-reference.com/players/h/hardeja01.html")
soup = BeautifulSoup(page.text, 'html.parser')
image = soup.find(itemscope="image")['src']
print(image)
Output:
https://d2cwpp38twqe55.cloudfront.net/req/201804182/images/players/hardeja01.jpg