I'm using feedparser to parse rss feeds such as https://www.relay.fm/analogue/feed and can't work out how explicitly identify the itunes:category
values.
Looking at the feedparser itunes tests it appears that both the itunes:keywords
and itunes:category
values are put into the feed['tags']
dictionary.
From the tests for category
:
<!--
Description: iTunes channel category
Expect: not bozo and feed['tags'][0]['term'] == 'Technology'
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
<channel>
<itunes:category text="Technology"></itunes:category>
</channel>
</rss>
and then keywords
:
<!--
Description: iTunes channel keywords
Expect: not bozo and feed['tags'][0]['term'] == 'Technology' and
'itunes_keywords' not in feed
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
<channel>
<itunes:keywords>Technology</itunes:keywords>
</channel>
</rss>
For the example feed above the entries are:
<itunes:keywords>Hurley, Liss, feelings</itunes:keywords>
and
<itunes:category text="Society & Culture"/>
<itunes:category text="Technology"/>
resulting in the feed[tags]
being populated as so:
[{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Hurley'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Liss'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'feelings'},
{'label': None,'scheme': 'http://www.itunes.com/','term': 'Society & Culture'},
{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Technology'}]
Is there any way to uniquely identify the values that came from the itunes:category
tag?
I couldn't find a way to do this with just feedparser so I made use of beautifulsoup as well:
import bs4
soup = bs4.BeautifulSoup(raw_data, "lxml")
def is_itunes_category(tag):
return tag.name == 'itunes:category'
categories = [tag.attrs['text'] for tag in soup.find_all(is_itunes_category)]