vb.netrssextractatom-feedselectnodes

How to extract Atom/RSS


Given a URL, if it has any RSS nodes, then I am adding to the database.

e.g.:

For this URL, rssDoc.SelectNodes("rss/channel/item").Count is greater than zero.

But for the atom url, rssDoc.SelectNodes("rss/channel/item").count is equal to zero.

How can I check if the Atom/RSS URL has any nodes or not? I have tried for rssDoc.SelectNodes("feed/entry").Count, but is giving me zero count.

Public Shared Function HasRssItems(ByVal url as string) As Boolean
Dim myRequest As WebRequest
Dim myResponse As WebResponse
Try
    myRequest = System.Net.WebRequest.Create(url)
    myRequest.Timeout = 5000
    myResponse = myRequest.GetResponse()

    Dim rssStream As Stream = myResponse.GetResponseStream()
    Dim rssDoc As New XmlDocument()
    rssDoc.Load(rssStream)

    Return rssDoc.SelectNodes("rss/channel/item").Count > 0
Catch ex As Exception
    Return False
Finally
    myResponse.Close()
End Try

End Function


Solution

  • Your main problem here is that the XML "node path" on this line:

    Return rssDoc.SelectNodes("rss/channel/item").Count > 0

    is only valid for RSS feeds, not ATOM feeds.

    One way I've got over this in the past is to use a simple function to convert an ATOM feed into an RSS feed. Of course, you could go the other way, or not convert at all, however, converting to a single format enables you to write one "generic" chunk of code that will pull out the various elements of a feed's items that you may be interested in (i.e. date, title etc.)

    There is an ATOM to RSS Converter article on Code Project that provides such a conversion, however, that is in C#. I have previously manually converted this to VB.NET myself, so here's the VB.NET version:

    Private Function AtomToRssConverter(ByVal atomDoc As XmlDocument) As XmlDocument
        Dim xmlDoc As XmlDocument = atomDoc
        Dim xmlNode As XmlNode = Nothing
        Dim mgr As New XmlNamespaceManager(xmlDoc.NameTable)
        mgr.AddNamespace("atom", "http://purl.org/atom/ns#")
        Const rssVersion As String = "2.0"
        Const rssLanguage As String = "en-US"
        Dim rssGenerator As String = "RDFFeedConverter"
        Dim memoryStream As New MemoryStream()
        Dim xmlWriter As New XmlTextWriter(memoryStream, Nothing)
        xmlWriter.Formatting = Formatting.Indented
        Dim feedTitle As String = ""
        Dim feedLink As String = ""
        Dim rssDescription As String = ""
    
        xmlNode = xmlDoc.SelectSingleNode("//atom:title", mgr)
        If xmlNode Is Nothing Then
              This looks like an ATOM v1.0 format, rather than ATOM v0.3.
            mgr.RemoveNamespace("atom", "http://purl.org/atom/ns#")
            mgr.AddNamespace("atom", "http://www.w3.org/2005/Atom")
        End If
    
        xmlNode = xmlDoc.SelectSingleNode("//atom:title", mgr)
        If Not xmlNode Is Nothing Then
            feedTitle = xmlNode.InnerText
        End If
        xmlNode = xmlDoc.SelectNodes("//atom:link/@href", mgr)(2)
        If Not xmlNode Is Nothing Then
            feedLink = xmlNode.InnerText
        End If
        xmlNode = xmlDoc.SelectSingleNode("//atom:tagline", mgr)
        If Not xmlNode Is Nothing Then
            rssDescription = xmlNode.InnerText
        End If
        xmlNode = xmlDoc.SelectSingleNode("//atom:subtitle", mgr)
        If Not xmlNode Is Nothing Then
            rssDescription = xmlNode.InnerText
        End If
    
        xmlWriter.WriteStartElement("rss")
        xmlWriter.WriteAttributeString("version", rssVersion)
        xmlWriter.WriteStartElement("channel")
        xmlWriter.WriteElementString("title", feedTitle)
        xmlWriter.WriteElementString("link", feedLink)
        xmlWriter.WriteElementString("description", rssDescription)
        xmlWriter.WriteElementString("language", rssLanguage)
        xmlWriter.WriteElementString("generator", rssGenerator)
        Dim items As XmlNodeList = xmlDoc.SelectNodes("//atom:entry", mgr)
        If items Is Nothing Then
            Throw New FormatException("Atom feed is not in expected format. ")
        Else
            Dim title As String = [String].Empty
            Dim link As String = [String].Empty
            Dim description As String = [String].Empty
            Dim author As String = [String].Empty
            Dim pubDate As String = [String].Empty
            For i As Integer = 0 To items.Count - 1
                Dim nodTitle As XmlNode = items(i)
                xmlNode = nodTitle.SelectSingleNode("atom:title", mgr)
                If Not xmlNode Is Nothing Then
                    title = xmlNode.InnerText
                End If
                Try
                    link = items(i).SelectSingleNode("atom:link[@rel= alternate ]", mgr).Attributes("href").InnerText
                Catch ex As Exception
                    link = items(i).SelectSingleNode("atom:link", mgr).Attributes("href").InnerText
                End Try
                xmlNode = items(i).SelectSingleNode("atom:content", mgr)
                If Not xmlNode Is Nothing Then
                    description = xmlNode.InnerText
                End If
                xmlNode = items(i).SelectSingleNode("//atom:name", mgr)
                If Not xmlNode Is Nothing Then
                    author = xmlNode.InnerText
                End If
                xmlNode = items(i).SelectSingleNode("atom:issued", mgr)
                If Not xmlNode Is Nothing Then
                    pubDate = xmlNode.InnerText
                End If
                xmlNode = items(i).SelectSingleNode("atom:updated", mgr)
                If Not xmlNode Is Nothing Then
                    pubDate = xmlNode.InnerText
                End If
                xmlWriter.WriteStartElement("item")
                xmlWriter.WriteElementString("title", title)
                xmlWriter.WriteElementString("link", link)
                If pubDate.Length < 1 Then
                    pubDate = Date.MinValue.ToString()
                End If
                xmlWriter.WriteElementString("pubDate", Convert.ToDateTime(pubDate).ToUniversalTime().ToString("ddd, dd MMM yyyy HH:mm:ss G\MT"))
                xmlWriter.WriteElementString("author", author)
                xmlWriter.WriteElementString("description", description)
                xmlWriter.WriteEndElement()
            Next
            xmlWriter.WriteEndElement()
            xmlWriter.Flush()
            xmlWriter.Close()
        End If
        Dim retDoc As New XmlDocument()
        Dim outStr As String = Encoding.UTF8.GetString(memoryStream.ToArray())
        retDoc.LoadXml(outStr)
        Return retDoc
    End Function
    

    Usage is fairly straight forward. Simply load in your ATOM feed into an XmlDocument object and pass it to this function, and you'll get an XmlDocument object back, in RSS format!

    If you're interested, I've put an entire RSSReader class up on pastebin.com