I'm fetching some xml and convert it to csv similar to below. Some of the records have additional n (unbounded) elements ("EntityEvents"). How can I fetch them as well and write them into a second (mm) csv file?
This is my structure:
XML File:
<abc:ABCData xmlns:abc="http://www.abc-example.com" xmlns:xyz="http:/www.xyz-example.com">
<abc:ABCRecords>
<abc:ABCRecord>
<abc:ABC>5EXZX4LPK</abc:ABC>
<abc:Entity>
<abc:Name>Bornheim</abc:Name>
<abc:EntityEvents>
<abc:EntityEvent>
<abc:EntityEventType>TypeA</abc:EntityEventType>
<abc:EntityEventName>EventA</abc:EntityEventName>
</abc:EntityEvent>
</abc:EntityEvents>
</abc:Entity>
</abc:ABCRecord>
<abc:ABCRecord>
<abc:ABC>5967007LI</abc:ABC>
<abc:Entity>
<abc:Name>MOON BANK</abc:Name>
<abc:EntityEvents>
<abc:EntityEvent>
<abc:EntityEventType>TypeB</abc:EntityEventType>
<abc:EntityEventName>EventB</abc:EntityEventName>
</abc:EntityEvent>
<abc:EntityEvent>
<abc:EntityEventType>TypeC</abc:EntityEventType>
<abc:EntityEventName>EventC</abc:EntityEventName>
</abc:EntityEvent>
</abc:EntityEvents>
</abc:Entity>
</abc:ABCRecord>
<abc:ABCRecord>
<abc:ABC>2792340TZ</abc:ABC>
<abc:Entity>
<abc:Name>SUN BANK</abc:Name>
<abc:EntityEvents>
<abc:EntityEvent>
<abc:EntityEventType>TypeD</abc:EntityEventType>
<abc:EntityEventName>EventD</abc:EntityEventName>
</abc:EntityEvent>
<abc:EntityEvent>
<abc:EntityEventType>TypeF</abc:EntityEventType>
<abc:EntityEventName>EventF</abc:EntityEventName>
</abc:EntityEvent>
<abc:EntityEvent>
<abc:EntityEventType>TypeG</abc:EntityEventType>
<abc:EntityEventName>EventG</abc:EntityEventName>
</abc:EntityEvent>
</abc:EntityEvents>
</abc:Entity>
</abc:ABCRecord>
</abc:ABCRecords>
</abc:ABCData>
PHP file:
<?php
$reader = new XMLReader();
$reader->open('php://stdin');
$output = fopen('php://stdout', 'w');
fputcsv($output, ['id', 'name']);
$xmlns = [
'abc' => 'http://www.abc-example.com'
];
$dom = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
$xpath->registerNamespace($prefix, $namespaceURI);
}
while (
$reader->read() &&
(
$reader->localName !== 'ABCRecord' ||
$reader->namespaceURI !== $xmlns['abc']
)
) {
continue;
}
while ($reader->localName === 'ABCRecord') {
if ($reader->namespaceURI === 'http://www.abc-example.com') {
$node = $reader->expand($dom);
fputcsv(
$output,
[
$xpath->evaluate('string(abc:ABC)', $node),
$xpath->evaluate('string(abc:Entity/abc:Name)', $node)
]
);
}
$reader->next('ABCRecord');
}
Output 1 (CSV):
5EXZX4LPK,Bornheim
5967007LI,"MOON BANK"
2792340TZ,"SUN BANK"
Desired Output 2 (CSV):
5EXZX4LPK,TypeA,EventA
5967007LI,TypeB,EventB
5967007LI,TypeC,EventC
2792340TZ,TypeD,EventD
2792340TZ,TypeE,EventE
2792340TZ,TypeF,EventF
How can I accomplish this? I thought of writing them into a separate file but I'm open how to accomplish this. I'm also open to do it in two steps, meaning in a separate php file.
Open a secondary file handle. Then after expanding the node into DOM, use an expression to fetch the events and write them to the second file.
//...
$node = $reader->expand($dom);
// store the identifier
$identifier = $xpath->evaluate('string(abc:ABC)', $node);
fputcsv(
$output,
[
$identifier,
$xpath->evaluate('string(abc:Entity/abc:Name)', $node)
]
);
// iterate the EntityEvent elements
foreach ($xpath->evaluate('abc:Entity/abc:EntityEvents/abc:EntityEvent', $node) as $event) {
fputcsv(
$detailOutput,
[
$identifier,
$xpath->evaluate('string(abc:EntityEventType)', $event),
$xpath->evaluate('string(abc:EntityEventName)', $event)
]
);
}
//...
The code in you question implements the first node list iteration in XMLReader to avoid loading the whole document into memory. After the XMLReader::expand()
you got a DOM node.
Reading DOM with Xpath is always one of two. A basic location path returns a node list (example: ancestor/parent/child
). The result will always be a list, if the the expression does not match it will be an empty list. Xpath expressions can get a lot more complex - they allow for conditions, nesting and alternatives.
If you need a single value you can cast the location path using an Xpath function (example: string(ancestor/parent/child)
). Functions like string()
or number()
will cast the first value from the node list or return a default value. string()
will return an empty string if the expression itself did not match. Other methods or the use of an operator can result in a type cast as well (example: count(ancestor/parent/child) > 0
).
However if you can read the values from the current node using DOM methods/properties I would suggest doing so. The Xpath is unnecessary overhead in this cases.
// fetch and iterate nodes
foreach ($xpath->evaluate($expression, $contextNode) as $childNode) {
var_dump(
// reading an attribute
$childNode->getAttribute('attribute-one'),
// the node name (without the namespace prefix)
$childNode->localName,
// using Xpath for nested data
$xpath->evaluate('string(child)', $childNode)
);
}