So, I am working on a project that scrapes and collects data from many different sources around the internet with many different methods depending on each source's characteristics.
The most recent addition is a web API
call which returns the following XML
as a response:
<?xml version="1.0"?>
<Publication_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0">
<mRID>29b526a69b9445a7bb507ba446e3e8f9</mRID>
<revisionNumber>1</revisionNumber>
<type>A44</type>
<sender_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</sender_MarketParticipant.mRID>
<sender_MarketParticipant.marketRole.type>A32</sender_MarketParticipant.marketRole.type>
<receiver_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</receiver_MarketParticipant.mRID>
<receiver_MarketParticipant.marketRole.type>A33</receiver_MarketParticipant.marketRole.type>
<createdDateTime>2019-09-19T11:28:51Z</createdDateTime>
<period.timeInterval>
<start>2019-09-18T22:00Z</start>
<end>2019-09-19T22:00Z</end>
</period.timeInterval>
<TimeSeries>
<mRID>1</mRID>
<businessType>A62</businessType>
<in_Domain.mRID codingScheme="A01">10YCS-SERBIATSOV</in_Domain.mRID>
<out_Domain.mRID codingScheme="A01">10YCS-SERBIATSOV</out_Domain.mRID>
<currency_Unit.name>EUR</currency_Unit.name>
<price_Measure_Unit.name>MWH</price_Measure_Unit.name>
<curveType>A01</curveType>
<Period>
<timeInterval>
<start>2019-09-18T22:00Z</start>
<end>2019-09-19T22:00Z</end>
</timeInterval>
<resolution>PT60M</resolution>
<Point>
<position>1</position>
<price.amount>44.08</price.amount>
</Point>
<Point>
<position>2</position>
<price.amount>37.14</price.amount>
</Point>
<Point>
<position>3</position>
<price.amount>32.21</price.amount>
</Point>
<Point>
<position>4</position>
<price.amount>31.44</price.amount>
</Point>
<Point>
<position>5</position>
<price.amount>32.48</price.amount>
</Point>
<Point>
<position>6</position>
<price.amount>45.52</price.amount>
</Point>
<Point>
<position>7</position>
<price.amount>56.05</price.amount>
</Point>
<Point>
<position>8</position>
<price.amount>74.96</price.amount>
</Point>
<Point>
<position>9</position>
<price.amount>74.08</price.amount>
</Point>
<Point>
<position>10</position>
<price.amount>69.03</price.amount>
</Point>
<Point>
<position>11</position>
<price.amount>72.89</price.amount>
</Point>
<Point>
<position>12</position>
<price.amount>68.91</price.amount>
</Point>
<Point>
<position>13</position>
<price.amount>74.95</price.amount>
</Point>
<Point>
<position>14</position>
<price.amount>72.91</price.amount>
</Point>
<Point>
<position>15</position>
<price.amount>75.97</price.amount>
</Point>
<Point>
<position>16</position>
<price.amount>76.49</price.amount>
</Point>
<Point>
<position>17</position>
<price.amount>59.08</price.amount>
</Point>
<Point>
<position>18</position>
<price.amount>60.19</price.amount>
</Point>
<Point>
<position>19</position>
<price.amount>64.69</price.amount>
</Point>
<Point>
<position>20</position>
<price.amount>69.18</price.amount>
</Point>
<Point>
<position>21</position>
<price.amount>64.97</price.amount>
</Point>
<Point>
<position>22</position>
<price.amount>63.38</price.amount>
</Point>
<Point>
<position>23</position>
<price.amount>52.92</price.amount>
</Point>
<Point>
<position>24</position>
<price.amount>48.08</price.amount>
</Point>
</Period>
</TimeSeries>
</Publication_MarketDocument>
Having dealt successfully with situations like that using Microsoft XML, v6.0
I tried the following:
Dim respXML As New MSXML2.DOMDocument60
respXML.LoadXML (ThisWorkbook.Worksheets("Sheet2").Range("A1")) 'for the sake of the post's simplicity I'm loading the xml from excel
Debug.Print respXML.getElementsByTagName("price.amount").Length
This should be returning 24
but instead it returns 0
.
Indeed the following:
Debug.Print respXML.getElementsByTagName("price.amount")(1) Is Nothing
returns True
, which means that the <price.amount></price.amount>
elements are not being found. However, Debug.Print respXML.XML
yields the expected results.
I read somewhere that early binding could be causing problems so I tried the following as well:
Dim respXML As Object
Set respXML = CreateObject("MSXML2.DOMDocument.6.0")
respXML.LoadXML (ThisWorkbook.Worksheets("Sheet2").Range("A1"))
Debug.Print respXML.getElementsByTagName("price.amount").Length
Debug.Print respXML.getElementsByTagName("price.amount")(1) Is Nothing
Still the results are the same.
Switching to Microsoft XML, v3.0
resolves the issue completely.
However, I would prefer sticking to v6.0 since it's the one that is more actively being maintained and supported.
Why does this happen? Does it have to do with the XML itself? Does it have to do with my code? Am I missing something? Is there a way to make it work with Microsoft XML, v6.0
?
Any input would be appreciated.
To extend @CindyMeister's answer, the issue does appear to be namespace handling between the MSXML versions using getElementsByTagName()
. Specifically, your XML maintains an xmlns
attribute without colon identified prefix which requires DOM libraries to assign a prefix when parsing content:
<Publication_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0" ...
However, using SelectionNamespaces
+ SelectNodes
to define a temporary alias, such as doc, to default namespace prefix, both libraries print out expected results. And MS docs even advises the latter method (emphasis added):
The
getElementsByTagName
method simulates the matching of the provided argument against the result of thetagName
property ofIXMLDOMElement
. When executed, it does not recognize or support namespaces. Instead, you should use theselectNodes
method, which is faster in some cases and can support more complex searches.
MXSML v3.0 (prints unexpected getElementsByTagName
result)
Sub ParseXMLv3()
Dim respXML As New MSXML2.DOMDocument30
respXML.Load "C:\Path\To\Input.xml"
respXML.setProperty "SelectionLanguage", "XPath"
respXML.setProperty "SelectionNamespaces", "xmlns:doc='urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0'"
Debug.Print respXML.SelectNodes("//doc:price.amount").Length ' PRINTS 24
Debug.Print respXML.SelectNodes("//price.amount").Length ' PRINTS 0
Debug.Print respXML.getElementsByTagName("price.amount").Length ' PRINTS 24
Set respXML = Nothing
End Sub
MSXML v6.0
Sub ParseXMLv6()
Dim respXML As New MSXML2.DOMDocument60
respXML.Load "C:\Path\To\Input.xml"
respXML.setProperty "SelectionLanguage", "XPath"
respXML.setProperty "SelectionNamespaces", "xmlns:doc='urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0'"
Debug.Print respXML.SelectNodes("//doc:price.amount").Length ' PRINTS 24
Debug.Print respXML.SelectNodes("//price.amount").Length ' PRINTS 0
Debug.Print respXML.getElementsByTagName("price.amount").Length ' PRINTS 0
Set respXML = Nothing
End Sub