pythonrssfeedparser

Output of feedparser in python unexpectedly truncated


I'm writing a piece of code that parses information from an RSS feed. I am storing the parsed informations for later research. In the case at hand I'd like to store info such as [Name, Surname, Type of Insidertrade, Price, ...].

My Problem

The string I'm trying to parse has >1800 characters but the string my parser outputs only has around 330 and ends with a "...". My question is How can I adjust the maximum length of the string feedparser parses in Python? or Why is my code truncated and not listed in its entired when printing or storing it?

What I've tried

import feedparser
InsiderFeed = feedparser.parse("https://www.finanztreff.de/rdf_news_category-insidertrades.rss")
summary = InsiderFeed.entries[0].summary # just to give one example here instead of looping through full list
print(summary)

Output

Looks like:

Notification and public disclosure of transactions by persons discharging managerial responsibilities and persons closely associated with them 23.06.2020 / 18:37 The issuer is solely responsible for the content of this announcement. *1. Details of the person discharging managerial responsibilities / person closely associated*...

but should look like: (ignoring brakes \n which seem to be sanitized by default by feedparser)

Notification and public disclosure of transactions by persons discharging
managerial responsibilities and persons closely associated with them

23.06.2020 / 18:37
The issuer is solely responsible for the content of this announcement.

*1. Details of the person discharging managerial responsibilities / person
closely associated*

a) Name

+++
|Name and legal form:|Krüper + Krüper Hochallee 60 GbR|
+++
*2. Reason for the notification*

a) Position / status

+++
|Person closely associated with: |
+++
|Title: |Dr. |
+++
|First name: |Manfred |
+++
|Last name(s): |Krüper |
+++
|Position: |Member of the administrative or supervisory |
| |body |
+++
b) Initial notification

*3. Details of the issuer, emission allowance market participant, auction
platform, auctioneer or auction monitor*

a) Name

++
|ENCAVIS AG|
++
b) LEI

++
|391200ECRGNL09Y2KJ67|
++
*4. Details of the transaction(s)*

a) Description of the financial instrument, type of instrument,
identification code

+++
|Type:|Share |
+++
|ISIN:|DE0006095003|
+++
b) Nature of the transaction

++
|Erwerb von neuen Aktien durch die Ausübung von 10.363 |
|Bezugsrechten im Rahmen der Aktiendividende der Encavis AG. |
|10.363 : 60,25 = 172 neue Aktien. |
++
c) Price(s) and volume(s)

+++
|Price(s) |Volume(s) |
+++
|10.845 EUR|1865.34 EUR|
+++
d) Aggregated information

+++
|Price |Aggregated volume|
+++
|10.8450 EUR|1865.3400 EUR |
+++
e) Date of the transaction

++
|2020-06-19; UTC+2|
++
f) Place of the transaction

++
|Outside a trading venue|
++

23.06.2020 The DGAP Distribution Services include Regulatory Announcements,
Financial/Corporate News and Press Releases.
Archive at www.dgap.de
Language: English
Company: ENCAVIS AG
Große Elbstraße 59
22767 Hamburg
Germany
Internet: www.encavis.com

End of News DGAP News Service

60877 23.06.2020



(END) Dow Jones Newswires

June 23, 2020 12:38 ET ( 16:38 GMT) 

using this example here http://www.finanztreff.de/news/dgap-dd-encavis-ag-english/20845911.

I have also tried to find a suitable flag / keyword to define the max length of my parsed string in the feedparser documentation but with no luck.

Looking forward to you help, it's much appreciated!


Solution

  • Got it

    So it turns out there is no issue with feedparser. The content of the RSS feed of the website is simply a truncated version of the content that is shown on the website, as the extract fo the feed below clearly shows for the of each title.

    Looks like I'll have to parse the links that come with the RSS feed for the complete content and parse that for the information I need.

    <?xml version='1.0' encoding='UTF-8'?>
    <?xml-stylesheet href='https://www.w3.org/2000/08/w3c-synd/style.css' type='text/css'?>
    <rss version='2.0' xmlns:media="https://search.yahoo.com/mrss/">
      <channel>
        <title>finanztreff.de / INSIDERTRADES </title>
        <description>News und Berichte aus der Finanzwelt von finanztreff.de</description>
        <language>de-de</language>
        <copyright>Copyright 2020 vwd netsolutions GmbH</copyright>
        <lastBuildDate>2020-06-25T12:26:48+02:00</lastBuildDate>
        <link>https://www.finanztreff.de</link>
        <image>
          <title>finanztreff.de-Logo</title>
          <url>https://www.finanztreff.de/images/finanztreff.jpg</url>
          <link>https://www.finanztreff.de</link>
        </image>
      <item>
        <title>EANS-DD: Oberbank AG / Mitteilung über Eigengeschäfte von Führungskräften gemäß Artikel 19 MAR - ANHANG</title>
        <link>http://www.finanztreff.de/news/eans-dd-oberbank-ag+mitteilung-ueber-eigengeschaefte-von-fuehrungskraeften-gemaess-artikel/20867797</link>
        <description>Directors&apos; Dealings-Mitteilung gemäß Artikel 19 MAR übermittelt durch euro adhoc mit dem Ziel einer europaweiten Verbreitung. Für den Inhalt ist der Emittent verantwortlich. Personenbezogene Daten: Mitteilungspflichtige Person: Name: Elfriede Höchtel (Natürliche Person) Grund der Mitteilungspflicht: Grund: Meldepflichtige...</description>
        <enclosure url='https:' length='' type='image/' />
        <media:keywords></media:keywords>
        <media:thumbnail url='https:' width='' height='' />
        <media:thumbnail url='https:' width='' height='' />
        <pubDate>2020-06-25T11:59:05+02:00</pubDate>
        <guid>20867797</guid>
    

    Edit 1: The solution

    The code below gets the complete string from the website that was truncated in the rss feed.

    import requests
    from bs4 import BeautifulSoup
    html_text = requests.get("http://www.finanztreff.de/news/dgap-dd-encavis-ag-english/20845911").text
    soup = BeautifulSoup(html_text, 'html.parser')
    print(soup.find(id="newsSource56").text)