phpxmlxpathxmlreader

PHP - Fetching xml values with looping over n unbounded element


I'm fetching some xml and convert it to csv similar to below. Some of the records have additional n (unbounded) elements ("EntityEvents"). How can I fetch them as well and write them into a second (mm) csv file?

This is my structure:

XML File:

<abc:ABCData xmlns:abc="http://www.abc-example.com" xmlns:xyz="http:/www.xyz-example.com">
<abc:ABCRecords>
  <abc:ABCRecord>
    <abc:ABC>5EXZX4LPK</abc:ABC>
    <abc:Entity>
      <abc:Name>Bornheim</abc:Name>
      <abc:EntityEvents>
        <abc:EntityEvent>
          <abc:EntityEventType>TypeA</abc:EntityEventType>
          <abc:EntityEventName>EventA</abc:EntityEventName> 
        </abc:EntityEvent>
      </abc:EntityEvents>    
    </abc:Entity>
  </abc:ABCRecord>
  <abc:ABCRecord>
    <abc:ABC>5967007LI</abc:ABC>
    <abc:Entity>
      <abc:Name>MOON BANK</abc:Name>
      <abc:EntityEvents>
        <abc:EntityEvent>
          <abc:EntityEventType>TypeB</abc:EntityEventType>
          <abc:EntityEventName>EventB</abc:EntityEventName>         
        </abc:EntityEvent>
        <abc:EntityEvent>
          <abc:EntityEventType>TypeC</abc:EntityEventType>
          <abc:EntityEventName>EventC</abc:EntityEventName>         
        </abc:EntityEvent>
      </abc:EntityEvents>                   
    </abc:Entity>
  </abc:ABCRecord>
  <abc:ABCRecord>
    <abc:ABC>2792340TZ</abc:ABC>
    <abc:Entity>
      <abc:Name>SUN BANK</abc:Name>
      <abc:EntityEvents>
        <abc:EntityEvent>
          <abc:EntityEventType>TypeD</abc:EntityEventType>
          <abc:EntityEventName>EventD</abc:EntityEventName>         
        </abc:EntityEvent>
        <abc:EntityEvent>
          <abc:EntityEventType>TypeF</abc:EntityEventType>
          <abc:EntityEventName>EventF</abc:EntityEventName>         
        </abc:EntityEvent>
        <abc:EntityEvent>
          <abc:EntityEventType>TypeG</abc:EntityEventType>
          <abc:EntityEventName>EventG</abc:EntityEventName>         
        </abc:EntityEvent>
      </abc:EntityEvents>                   
    </abc:Entity>
  </abc:ABCRecord>   
</abc:ABCRecords>
</abc:ABCData>

PHP file:

<?php

$reader = new XMLReader();
$reader->open('php://stdin');

$output = fopen('php://stdout', 'w');
fputcsv($output, ['id', 'name']);

$xmlns = [
  'abc' => 'http://www.abc-example.com'
];

$dom   = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
  $xpath->registerNamespace($prefix, $namespaceURI);
}

while (
  $reader->read() && 
  (
    $reader->localName !== 'ABCRecord' || 
    $reader->namespaceURI !== $xmlns['abc']
  )
) {
  continue;
}

while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    $node = $reader->expand($dom);
    fputcsv(
      $output, 
      [
        $xpath->evaluate('string(abc:ABC)', $node),
        $xpath->evaluate('string(abc:Entity/abc:Name)', $node)
      ]
    );
  }

  $reader->next('ABCRecord');
}     

Output 1 (CSV):

5EXZX4LPK,Bornheim
5967007LI,"MOON BANK"
2792340TZ,"SUN BANK"  

Desired Output 2 (CSV):

5EXZX4LPK,TypeA,EventA
5967007LI,TypeB,EventB
5967007LI,TypeC,EventC
2792340TZ,TypeD,EventD
2792340TZ,TypeE,EventE
2792340TZ,TypeF,EventF  

How can I accomplish this? I thought of writing them into a separate file but I'm open how to accomplish this. I'm also open to do it in two steps, meaning in a separate php file.


Solution

  • Open a secondary file handle. Then after expanding the node into DOM, use an expression to fetch the events and write them to the second file.

    //...
    $node = $reader->expand($dom);
    // store the identifier
    $identifier = $xpath->evaluate('string(abc:ABC)', $node);
    fputcsv(
      $output, 
      [
        $identifier,
        $xpath->evaluate('string(abc:Entity/abc:Name)', $node)
      ]
    );
    // iterate the EntityEvent elements
    foreach ($xpath->evaluate('abc:Entity/abc:EntityEvents/abc:EntityEvent', $node) as $event) {
      fputcsv(
        $detailOutput, 
        [
          $identifier,
          $xpath->evaluate('string(abc:EntityEventType)', $event),
          $xpath->evaluate('string(abc:EntityEventName)', $event)
        ]
      ); 
    }
    //...
    

    The code in you question implements the first node list iteration in XMLReader to avoid loading the whole document into memory. After the XMLReader::expand() you got a DOM node.

    Reading DOM with Xpath is always one of two. A basic location path returns a node list (example: ancestor/parent/child). The result will always be a list, if the the expression does not match it will be an empty list. Xpath expressions can get a lot more complex - they allow for conditions, nesting and alternatives.

    If you need a single value you can cast the location path using an Xpath function (example: string(ancestor/parent/child)). Functions like string() or number() will cast the first value from the node list or return a default value. string() will return an empty string if the expression itself did not match. Other methods or the use of an operator can result in a type cast as well (example: count(ancestor/parent/child) > 0).

    However if you can read the values from the current node using DOM methods/properties I would suggest doing so. The Xpath is unnecessary overhead in this cases.

    // fetch and iterate nodes
    foreach ($xpath->evaluate($expression, $contextNode) as $childNode) {
      var_dump(
        // reading an attribute 
        $childNode->getAttribute('attribute-one'),
        // the node name (without the namespace prefix)
        $childNode->localName,
        // using Xpath for nested data
        $xpath->evaluate('string(child)', $childNode)
      );
    }