pythonrssfeedparser

Identify itunes:keywords and itunes:category individually with feedparser?


I'm using feedparser to parse rss feeds such as https://www.relay.fm/analogue/feed and can't work out how explicitly identify the itunes:category values.

Looking at the feedparser itunes tests it appears that both the itunes:keywords and itunes:category values are put into the feed['tags'] dictionary.

From the tests for category:

<!--
Description: iTunes channel category
Expect:      not bozo and feed['tags'][0]['term'] == 'Technology'
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
    <channel>
        <itunes:category text="Technology"></itunes:category>
    </channel>
</rss>

and then keywords:

<!--
Description: iTunes channel keywords
Expect:      not bozo and feed['tags'][0]['term'] == 'Technology' and 
'itunes_keywords' not in feed
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
    <channel>
        <itunes:keywords>Technology</itunes:keywords>
    </channel>
</rss>

For the example feed above the entries are:

<itunes:keywords>Hurley, Liss, feelings</itunes:keywords>

and

<itunes:category text="Society &amp; Culture"/>
<itunes:category text="Technology"/>

resulting in the feed[tags] being populated as so:

[{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Hurley'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Liss'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'feelings'},
 {'label': None,'scheme': 'http://www.itunes.com/','term': 'Society & Culture'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Technology'}]

Is there any way to uniquely identify the values that came from the itunes:category tag?


Solution

  • I couldn't find a way to do this with just feedparser so I made use of beautifulsoup as well:

    import bs4
    
    soup = bs4.BeautifulSoup(raw_data, "lxml")        
    
    def is_itunes_category(tag):
            return tag.name == 'itunes:category'
    
    categories = [tag.attrs['text'] for tag in soup.find_all(is_itunes_category)]