pythonpython-3.xsslfeedparser

Feedparser errors during SSl read operation when accessing NASDAQ RSS Feeds


By utliizing Python 3.12, Feedparser 6.0.11, ca-certificates installed

When attempting to read this RSS feed: https://www.nasdaq.com/feed/rssoutbound?category=Financial+Advisors. Feedparser library returns this error.

https://www.nasdaq.com/feed/rssoutbound?category=Innovation
^CTraceback (most recent call last):
  File "/home/nckr/kiwichi/read.py", line 78, in <module>
    NewsFeed = feedparser.parse(url)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/nckr/kiwichi/venv/lib/python3.12/site-packages/feedparser/api.py", line 216, in parse
    data = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers, request_headers, result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nckr/kiwichi/venv/lib/python3.12/site-packages/feedparser/api.py", line 115, in _open_resource
    return http.get(url_file_stream_or_string, etag, modified, agent, referrer, handlers, request_headers, result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nckr/kiwichi/venv/lib/python3.12/site-packages/feedparser/http.py", line 171, in get
    f = opener.open(request)
        ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/urllib/request.py", line 515, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/urllib/request.py", line 532, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/urllib/request.py", line 492, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.12/urllib/request.py", line 1392, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/urllib/request.py", line 1348, in do_open
    r = h.getresponse()
        ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/client.py", line 1428, in getresponse
    response.begin()
  File "/usr/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/ssl.py", line 1252, in recv_into
    return self.read(nbytes, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/ssl.py", line 1104, in read
    return self._sslobj.read(len, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

I tried setting the unverified SSL: context(Feedparser.parse() 'SSL: CERTIFICATE_VERIFY_FAILED') at the beginning of the script but still get this error.

if hasattr(ssl, '_create_unverified_context'):
    ssl._create_default_https_context = ssl._create_unverified_context

I'm assuming the global SSL setting will be magically used by the feedparsers lib and I don't need to pass it to it explicitly.

Is there another workaround or anyway to get more info regarding the actual error other than an SSL read error. Could also be a timeout error as well. However I can access the URL successful using curl on the command line.


Solution

  • Based on my repro steps. I was able to find out that the urllib does not work with http2 for the feeds that require http2 or where http1.1 does not work.

    So I switched to using pycurl and then passing the response to feedparser and everything works. Hope it helps someone.