pythonxmlelementtreeatom-feed

How to get a href attribute value in xml content (atom feed)?


I'm saving the content (atom feed / xml content) from a get request as content = response.text and the content looks like this:

<feed xmlns="http://www.w3.org/2005/Atom">
    <title type="text">title-a</title>
    <subtitle type="text">content: application/abc</subtitle>
    <updated>2021-08-05T16:29:20.202Z</updated>
    <id>tag:tag-a,2021-08:27445852</id>
    <generator uri="uri-a" version="v-5.1.0.3846329218047">abc</generator>
    <author>
        <name>name-a</name>
        <email>email-a</email>
    </author>
    <link href="url-a" rel="self"/>
    <link href="url-b" rel="next"/>
    <link href="url-c" rel="previous"/>
</feed>

How can I get the value "url-b" of the href attribute with rel="next" ?

I tried it with the ElementTree module, for example:

from xml.etree import ElementTree

response = requests.get("myurl", headers={"Authorization": f"Bearer {my_access_token}"})
content = response.text

tree = ElementTree.fromstring(content)

tree.find('.//link[@rel="next"]')
// or
tree.find('./link').attrib['href']

but that didn't work.

I appreciate any help and thank you in advance.

If there is an easier, simpler solution (maybe feedparser) I welcome that too.


Solution

  • How can I get the value "url-b" of the href attribute with rel="next" ?

    see below

    from xml.etree import ElementTree as ET
    
    xml = '''<feed xmlns="http://www.w3.org/2005/Atom">
        <title type="text">title-a</title>
        <subtitle type="text">content: application/abc</subtitle>
        <updated>2021-08-05T16:29:20.202Z</updated>
        <id>tag:tag-a,2021-08:27445852</id>
        <generator uri="uri-a" version="v-5.1.0.3846329218047">abc</generator>
        <author>
            <name>name-a</name>
            <email>email-a</email>
        </author>
        <link href="url-a" rel="self"/>
        <link href="url-b" rel="next"/>
        <link href="url-c" rel="previous"/>
    </feed>'''
    
    root = ET.fromstring(xml)
    links = root.findall('.//{http://www.w3.org/2005/Atom}link[@rel="next"]')
    for link in links:
        print(f'{link.attrib["href"]}')
    

    output

    url-b