
Python web scraper, with BeautifulSoup I am having problem with my link , the link is now going to headline story but redirecting to the archives page

The link is redirecting me to an archives page with other top stories The tag news on the link between .com and babel should not be there as I believe it is that which is redirecting the news headline to another page.

from bs4 import BeautifulSoup
import requests

base_url =''

source = requests.get(base_url).text

soup = BeautifulSoup(source, "html.parser")       
articles = soup.find_all(class_ = 'list-item-card post')

for article in articles:
    headline = article.h4.text.strip()
    link = base_url + article.find_all("a")[1]["href"]
    text = article.find(class_="card-text").text.strip()
    img_url = base_url+article.picture.img['src']
    print("Image " + img_url)


  • The error happens because you are concatinating your base link (which already includes /news/) to an absolute url

    To prevent this, you can use urllib.parse.urljoin()

    In your example this should fix the issue:

    from urllib.parse import urljoin
    link = urljoin(base_url, article.find_all("a")[1]["href"])