
error on a code which automatically opens a websites and copies text from there

I have this code:

import pyperclip
import requests
from bs4 import BeautifulSoup

base_url = ""
url = base_url + "/news/world"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
articles = soup.find_all('div', class_='gs-c-promo-body')
text = ''
for article in articles:
    headline = article.find('h3', class_='gs-c-promo-heading__title')
    if headline:
        text += headline.text + '\n'
    summary = article.find('p', class_='gs-c-promo-summary')
    if summary:
        text += summary.text + '\n'
    link = article.find('a', class_='gs-c-promo-heading')
    if link:
        href = link['href']
        if href.startswith('//'):
            article_url = 'https:' + href
            article_url = base_url + href
        article_response = requests.get(article_url)
        article_soup = BeautifulSoup(article_response.text, 'html.parser')
        article_text = article_soup.find('div', class_='story-body__inner')
        if article_text:
            text += article_text.get_text() + '\n\n'

The code I gave above is for example, let's say that I need to copy the text of each headline and the contents inside the headline. So I want to make a Python code that automatically goes to a website and then reproduces the head line, creates an empty line, and then creates another line with the contents given inside the headline.

Traceback (most recent call last):
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\", line 200, in _new_conn
    sock = connection.create_connection(
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\util\", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "C:\Users\msala\AppData\Local\Programs\Python\Python39\lib\", line 954, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\", line 790, in urlopen
    response = self._make_request(
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\", line 491, in _make_request
    raise new_e
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\", line 467, in _make_request
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\", line 1092, in _validate_conn
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\", line 604, in connect
    self.sock = sock = self._new_conn()
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\", line 207, in _new_conn
    raise NameResolutionError(, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x000001AF8CB443D0>: Failed to resolve '' ([Errno 11001] getaddrinfo failed)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\requests\", line 486, in send
    resp = conn.urlopen(
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\", line 844, in urlopen
    retries = retries.increment(
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\urllib3\util\", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: // (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000001AF8CB443D0>: Failed to resolve '' ([Errno 11001] getaddrinfo failed)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\msala\PycharmProjects\pythonProject1\", line 26, in <module>
    article_response = requests.get(article_url)
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\requests\", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\requests\", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\requests\", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\requests\", line 701, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\msala\PycharmProjects\learnPython\venv\pythonProject1\lib\site-packages\requests\", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: // (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000001AF8CB443D0>: Failed to resolve '' ([Errno 11001] getaddrinfo failed)"))

Process finished with exit code 1

I have tried multiple times to fix the code but


  • You try to fetch the URL

    which is clearly wrong. It is caused by the following statement:

    article_url = base_url + href

    You should not prefix an already absolute URL. Check if href is already an URL you can directly fetch. You can use validators package or write your own logic.
