xsltatom-feed

Trying to take an ATOM feed and parse out a section written in XHTML in XSLT format


I am trying to take an NOAA RSS feed (NOAA site says it uses ATOM and CAPS) and convert it for SharePoint using XSLT. I am new to this and have limited experience working in XSLT. Here is a sample of the feed.

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" 
xmlns:georss="http://www.georss.org/georss">
<id>urn:uuid:9ae4ae29-830f-4870-bace-0f70984b76bd</id><title>        
TSUNAMI INFORMATION STATEMENT NUMBER   1        </title>
<updated>2022-01-29T03:00:32Z</updated>
<author>
  <name>NWS PACIFIC TSUNAMI WARNING CENTER HONOLULU HI</name>
 <uri>http://ntwc.arh.noaa.gov/</uri>
 <email>ntwc@noaa.gov</email>
 </author>
 <icon>http://ntwc.arh.noaa.gov/images/favicon.ico</icon>
 <link type="application/atom+xml" rel="self" title="self" 
 href="http://ntwc.arh.noaa.gov/events/xml/PAAQAtom.xml"/>
 <link rel="related" title="Energy Map"  
 <entry>
 <title>KERMADEC ISLANDS REGION</title><updated>2022-01-29T03:00:32Z</updated>
 <geo:lat>-29.751</geo:lat>
 <geo:long>-174.709</geo:long>
 <summary type="xhtml">
    <div xmlns="http://www.w3.org/1999/xhtml">
    <strong>Category:</strong> Information<br/>
    <strong>Bulletin Issue Time: </strong> 2022.01.29 03:00:32 UTC 
    <br/><strong>Preliminary Magnitude: </strong>6.6(Mwp)<br/> 
    <strong>Lat/Lon: </strong>-29.751 / -174.709<br/>
    <strong>Affected Region: </strong>KERMADEC ISLANDS REGION<br/>
</div>
</summary>
</entry>
</feed>

My problem is trying to convert the "summary type=xhtml" section into a readable format (like below) instead of a long run-on sentence.

CATEGORY: Information
BULLETIN ISSUE TIME: 
PRELIMINARY MAGNITUDE:

Can someone provide me with some suggestions on how to parse the information in XSLT?

Thank you in advance.


Solution

  • AFAIK there is no standard format for the contents of an Atom summary. If your data provider adheres to the format shown in your example, then - given a well-formed XML input such as:

    XML

    <?xml version="1.0" encoding="UTF-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom" 
    xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" 
    xmlns:georss="http://www.georss.org/georss">
    <id>urn:uuid:9ae4ae29-830f-4870-bace-0f70984b76bd</id><title>        
    TSUNAMI INFORMATION STATEMENT NUMBER   1        </title>
    <updated>2022-01-29T03:00:32Z</updated>
    <author>
      <name>NWS PACIFIC TSUNAMI WARNING CENTER HONOLULU HI</name>
     <uri>http://ntwc.arh.noaa.gov/</uri>
     <email>ntwc@noaa.gov</email>
     </author>
     <icon>http://ntwc.arh.noaa.gov/images/favicon.ico</icon>
     <link type="application/atom+xml" rel="self" title="self" 
     href="http://ntwc.arh.noaa.gov/events/xml/PAAQAtom.xml"/>
     <entry>
     <title>KERMADEC ISLANDS REGION</title><updated>2022-01-29T03:00:32Z</updated>
     <geo:lat>-29.751</geo:lat>
     <geo:long>-174.709</geo:long>
     <summary type="xhtml">
        <div xmlns="http://www.w3.org/1999/xhtml">
        <strong>Category:</strong> Information<br/>
        <strong>Bulletin Issue Time: </strong> 2022.01.29 03:00:32 UTC 
        <br/><strong>Preliminary Magnitude: </strong>6.6(Mwp)<br/> 
        <strong>Lat/Lon: </strong>-29.751 / -174.709<br/>
        <strong>Affected Region: </strong>KERMADEC ISLANDS REGION<br/>
    </div>
    </summary>
    </entry>
    </feed>
    

    you could do something like:

    XSLT 1.0

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:a="http://www.w3.org/2005/Atom"
    xmlns:x="http://www.w3.org/1999/xhtml">
    <xsl:output method="text" encoding="UTF-8" />
    
    <xsl:template match="/a:feed">
        <xsl:for-each select="a:entry/a:summary/x:div/x:strong">
            <xsl:value-of select="." />
            <xsl:value-of select="normalize-space(following-sibling::text()[1])" />
            <xsl:text>&#10;</xsl:text>
        </xsl:for-each>
    </xsl:template>
    
    </xsl:stylesheet>
    

    to get:

    Result

    Category:Information
    Bulletin Issue Time: 2022.01.29 03:00:32 UTC
    Preliminary Magnitude: 6.6(Mwp)
    Lat/Lon: -29.751 / -174.709
    Affected Region: KERMADEC ISLANDS REGION