xpathxml-namespacesxmllint

xmllint --xpath for <rdf:RDF><channel><title>


I have looked through every answer in these first three search pages and cannot get a solution; after page 2 questions aren't even relevant.

In this RSS feed:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
        xmlns="http://purl.org/rss/1.0/">

<channel rdf:about="https://www.myfeed.tld/">
  <title>My Feed</title>
  <link>https://www.myfeed.tld/</link>
</channel>

<item rdf:about="https://www.myfeed.tld/mypost">
  <title><![CDATA[Posting to SO SO Good]]></title>
  <link>https://www.myfeed.tld/mypost</link>
  <dc:date>2024-06-19T07:48:00-05:00</dc:date>
</item>

</rdf:RDF>

I need to get the text content for:

Based on this answer, I think I should be able to use:

xmllint --xpath "//*[local-name()='rdf:RDF']/channel/title/text()" feed.rss
xmllint --xpath "//*[local-name()='rdf:RDF']/item/title/text()" feed.rss
xmllint --xpath "//*[local-name()='rdf:RDF']/item/*[local-name()='dc:date']/text()" feed.rss

I've tried every variant, but I only get: XPath set is empty


Solution

  • Namespace prefix is not needed when using local-name(). Also, the elements without a prefix still have a default xmlns="http://purl.org/rss/1.0/" namespace so local-name()must be used there too

    //*[local-name()='RDF']/*[local-name()='channel']/*[local-name()='title']/text()
    

    Alternatively, namespaces can be used as follows

    printf "%s\n" "setrootns" "cat //rdf:RDF/defaultns:channel/defaultns:title/text()" "cat //rdf:RDF/defaultns:item/dc:date/text()" | xmllint --shell feed.rss | grep -Ev '\/ >| --'
    

    Result

    My Feed
    2024-06-19T07:48:00-05:00