sql-serverxmlsqlxmlsql-server-openxml

Unable to parse XML with namespace


Attempting to parse the XML from treasury.gov, and it uses two namespaces. However, I can't seem to extract the data from the XML. Nothing returns from my query. I have tried adding the namespaces to the preparedoc, but no data is being returned. The namespace is the issue, as if I remove them, my code works. As a workaround, using replace to eliminate the namespaces to "clean up" the XML. This eliminates the namespace entirely, presenting straight XML. While this works, it is very ugly, and obviously not the proper way to resolve.

    DECLARE @xml AS XML;
    DECLARE @idoc INT;

SET @xml = '
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xml:base="http://data.treasury.gov/Feed.svc/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom">
  <title type="text">DailyTreasuryYieldCurveRateData</title>
  <id>http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData</id>
  <updated>2019-05-30T20:27:58Z</updated>
  <link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
  <entry>
    <id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(1)</id>
    <title type="text"></title>
    <updated>2019-05-30T20:27:58Z</updated>
    <author>
      <name />
    </author>
    <link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(1)" />
    <category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
    <content type="application/xml">
      <m:properties>
        <d:Id m:type="Edm.Int32">1</d:Id>
        <d:NEW_DATE m:type="Edm.DateTime">1997-01-02T00:00:00</d:NEW_DATE>
        <d:BC_1MONTH m:type="Edm.Double" m:null="true" />
        <d:BC_2MONTH m:type="Edm.Double" m:null="true" />
        <d:BC_3MONTH m:type="Edm.Double">5.190000057220459</d:BC_3MONTH>
        <d:BC_6MONTH m:type="Edm.Double">5.3499999046325684</d:BC_6MONTH>
        <d:BC_1YEAR m:type="Edm.Double">5.630000114440918</d:BC_1YEAR>
        <d:BC_2YEAR m:type="Edm.Double">5.96999979019165</d:BC_2YEAR>
        <d:BC_3YEAR m:type="Edm.Double">6.130000114440918</d:BC_3YEAR>
        <d:BC_5YEAR m:type="Edm.Double">6.3000001907348633</d:BC_5YEAR>
        <d:BC_7YEAR m:type="Edm.Double">6.4499998092651367</d:BC_7YEAR>
        <d:BC_10YEAR m:type="Edm.Double">6.5399999618530273</d:BC_10YEAR>
        <d:BC_20YEAR m:type="Edm.Double">6.8499999046325684</d:BC_20YEAR>
        <d:BC_30YEAR m:type="Edm.Double">6.75</d:BC_30YEAR>
        <d:BC_30YEARDISPLAY m:type="Edm.Double">0</d:BC_30YEARDISPLAY>
      </m:properties>
    </content>
  </entry>
</feed>
'

EXEC sp_XML_PrepareDocument @iDoc OUTPUT, @xml, '<feed xml:base="http://data.treasury.gov/Feed.svc/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:x="http://www.w3.org/2005/Atom">';

SELECT *
FROM OPENXML(@iDoc, 'feed/entry/content/m:properties/' )
WITH
(
   [Id] VARCHAR(100) 'd:Id'                                                                                             
  ,[Date] VARCHAR(100) 'd:NEW_DATE'
)


EXEC sp_xml_removedocument @iDoc

The Id and Date columns return no data. If I eliminate the namespace, it will return the data as expected.


Solution

  • The approach with FROM OPENXML together with the stored procedures to prepare and to remove a document is outdated and should not be used anymore. Rather use the native XML methods:

    Your XML

    DECLARE @xml XML =
    '<?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <feed xml:base="http://data.treasury.gov/Feed.svc/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom">
      <title type="text">DailyTreasuryYieldCurveRateData</title>
      <id>http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData</id>
      <updated>2019-05-30T20:27:58Z</updated>
      <link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
      <entry>
        <id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(1)</id>
        <title type="text"></title>
        <updated>2019-05-30T20:27:58Z</updated>
        <author>
          <name />
        </author>
        <link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(1)" />
        <category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
        <content type="application/xml">
          <m:properties>
            <d:Id m:type="Edm.Int32">1</d:Id>
            <d:NEW_DATE m:type="Edm.DateTime">1997-01-02T00:00:00</d:NEW_DATE>
            <d:BC_1MONTH m:type="Edm.Double" m:null="true" />
            <d:BC_2MONTH m:type="Edm.Double" m:null="true" />
            <d:BC_3MONTH m:type="Edm.Double">5.190000057220459</d:BC_3MONTH>
            <d:BC_6MONTH m:type="Edm.Double">5.3499999046325684</d:BC_6MONTH>
            <d:BC_1YEAR m:type="Edm.Double">5.630000114440918</d:BC_1YEAR>
            <d:BC_2YEAR m:type="Edm.Double">5.96999979019165</d:BC_2YEAR>
            <d:BC_3YEAR m:type="Edm.Double">6.130000114440918</d:BC_3YEAR>
            <d:BC_5YEAR m:type="Edm.Double">6.3000001907348633</d:BC_5YEAR>
            <d:BC_7YEAR m:type="Edm.Double">6.4499998092651367</d:BC_7YEAR>
            <d:BC_10YEAR m:type="Edm.Double">6.5399999618530273</d:BC_10YEAR>
            <d:BC_20YEAR m:type="Edm.Double">6.8499999046325684</d:BC_20YEAR>
            <d:BC_30YEAR m:type="Edm.Double">6.75</d:BC_30YEAR>
            <d:BC_30YEARDISPLAY m:type="Edm.Double">0</d:BC_30YEARDISPLAY>
          </m:properties>
        </content>
      </entry>
    </feed>';
    

    --There are several namespaces involved. The one xmlns is the so called default namespace while the other namespaces have got a prefix. You can use whatever prefix you like, but - for easy reading - I use the same as in the original XML:

    --The following code will demonstrate, how to read some top-level elements directly, how to use .nodes() to dive into a nested element and how to use another .nodes() to dive into a relative sub-node.

    WITH XMLNAMESPACES(DEFAULT 'http://www.w3.org/2005/Atom'
                              ,'http://schemas.microsoft.com/ado/2007/08/dataservices/metadata' AS m
                              ,'http://schemas.microsoft.com/ado/2007/08/dataservices' AS d
                              ,'http://data.treasury.gov/Feed.svc/' AS base )
    SELECT @xml.value('(/feed/title/text())[1]','varchar(100)') AS title
          ,@xml.value('(/feed/title/@type)[1]','varchar(100)') AS title_type
          ,@xml.value('(/feed/id/text())[1]','varchar(100)') AS id
          ,@xml.value('(/feed/updated/text())[1]','datetime') AS updated --type-safe
          --pick more elements in top level
          ,ent.value('(id/text())[1]','varchar(100)') AS entry_id
          --pick more elements from entry-level
          ,prp.value('(d:Id/text())[1]','int') AS Prop_id
          --pick more elements below <m:properties>
    FROM @xml.nodes('/feed/entry') A(ent)
    CROSS APPLY A.ent.nodes('content/m:properties') B(prp);
    

    --If the properties might not be known in advance, you can use the * in the path and return a generic EAV-list:

    WITH XMLNAMESPACES(DEFAULT 'http://www.w3.org/2005/Atom'
                              ,'http://schemas.microsoft.com/ado/2007/08/dataservices/metadata' AS m
                              ,'http://schemas.microsoft.com/ado/2007/08/dataservices' AS d
                              ,'http://data.treasury.gov/Feed.svc/' AS base )
    SELECT @xml.value('(/feed/title/text())[1]','varchar(100)') AS title
          ,@xml.value('(/feed/title/@type)[1]','varchar(100)') AS title_type
          ,@xml.value('(/feed/id/text())[1]','varchar(100)') AS id
          ,@xml.value('(/feed/updated/text())[1]','datetime') AS updated --type-safe
          --pick more elements in top level
          ,ent.value('(id/text())[1]','varchar(100)') AS entry_id
          --pick more elements from entry-level
          ,AllPrps.value('local-name(.)','varchar(100)') AS Prop_Name
          ,AllPrps.value('@m:type','varchar(100)') AS Prop_Value
          ,AllPrps.value('text()[1]','varchar(100)') AS Prop_Value
          --pick more elements below <m:properties>
    FROM @xml.nodes('/feed/entry') A(ent)
    CROSS APPLY A.ent.nodes('content/m:properties/d:*') B(AllPrps)