python-3.xrequestresponsehttpresponsehttpurlconnection

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))


i have a column in a Dataframe called url. i'm trying to send a request to these servers and get the

element of the content. The problem occours when i run my script and ALWAYS with the 7th request. If i use k+=5 the url that on the previous run showed this error will successfully run but again by the 7th url starting at 5 python shows this error

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

i wish the error message was more percise but i have no clue why it's caused.

this is my code:

    blocklist = [
  'style',
  'script',
  'meta',
  'head'
  # other elements,
]


for k,i in enumerate(df['url']):   
#k+=5
    website_text=list()
    print(df.at[k,'url'])   
    response=requests.get(i)
    soup = BeautifulSoup(response.content, 'html.parser')
    if soup.findAll('p'):                          
        for data in soup.find_all("p"): 
            #print(data.get_text(),'\n','=================================================================================================','\n')                          
            website_text.append(data.get_text())
        df.at[k,'text']=website_text
                
df.head()    

this is the Full error message:

---------------------------------------------------------------------------
RemoteDisconnected                        Traceback (most recent call last)
File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connectionpool.py:790, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    789 # Make the request on the HTTPConnection object
--> 790 response = self._make_request(
    791     conn,
    792     method,
    793     url,
    794     timeout=timeout_obj,
    795     body=body,
    796     headers=headers,
    797     chunked=chunked,
    798     retries=retries,
    799     response_conn=response_conn,
    800     preload_content=preload_content,
    801     decode_content=decode_content,
    802     **response_kw,
    803 )
    805 # Everything went great!

File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connectionpool.py:536, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    535 try:
--> 536     response = conn.getresponse()
    537 except (BaseSSLError, OSError) as e:

File c:\Users\user\anaconda3\envs\GDELT\Lib\site-packages\urllib3\connection.py:454, in HTTPConnection.getresponse(self)
    453 # Get the response from http.client.HTTPConnection
--> 454 httplib_response = super().getresponse()
    456 try:

File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:1375, in HTTPConnection.getresponse(self)
   1374 try:
-> 1375     response.begin()
   1376 except ConnectionError:

File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:318, in HTTPResponse.begin(self)
    317 while True:
--> 318     version, status, reason = self._read_status()
    319     if status != CONTINUE:

File c:\Users\user\anaconda3\envs\GDELT\Lib\http\client.py:287, in HTTPResponse._read_status(self)
    284 if not line:
    285     # Presumably, the server closed the connection before
    286     # sending a valid response.
--> 287     raise RemoteDisconnected("Remote end closed connection without"
...
    503 except MaxRetryError as e:
    504     if isinstance(e.reason, ConnectTimeoutError):
    505         # TODO: Remove this in 3.0.0: see #2811

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Solution

  • i found 2 Answers on this post but with different Error Message and the other one with the same error message.

    The issue is that the website filters out requests without a proper User-Agent, so just use a random one from MDN:

    requests.get("https://apis.digital.gob.cl/fl/feriados/2020", headers={
    "User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
    })
    

    or

    try:

    import requests
    
    url = 'https://dictionary.cambridge.org/us/dictionary/english-arabic/hi'
    
    headers = requests.utils.default_headers()
    
    headers.update(
        {
            'User-Agent': 'My User Agent 1.0',
        }
    )
    
    response = requests.get(url, headers=headers)
    print(response.text)