xsltaccumulator

How to match nodes in xslt only once and filter them out


I have a file with two types of "journal source". Below is a small sample file

<File>   
   <Record>
        <employee>935388</employee>
        <journal_source>Procurement_Card_Transaction_Verification</journal_source>
        <amount>26.31</amount>
        <wid>123</wid>
        <concat_values>935388|Procurement_Card_Transaction_Verification|26.31</concat_values>
        <Created_Moment>2020-12-31T20:45:45.415-08:00</Created_Moment>
        <Accounting_Date>2020-12-31-08:00</Accounting_Date>
   </Record>   
   <Record>
      <employee>935388</employee>
      <journal_source>Credit_Card_Transaction_Load</journal_source>
      <amount>-26.31</amount>
      <wid>abc</wid>
      <concat_values>935388|Credit_Card_Transaction_Load|26.31</concat_values>
      <Created_Moment>2020-12-20T20:45:45.415-08:00</Created_Moment>
      <Accounting_Date>2020-12-31-08:00</Accounting_Date>
   </Record>
   <Record>
      <employee>935388</employee>
      <journal_source>Credit_Card_Transaction_Load</journal_source>
      <amount>-26.31</amount>
      <wid>def</wid>
      <concat_values>935388|Credit_Card_Transaction_Load|26.31</concat_values>
      <Created_Moment>2020-12-20T20:45:45.415-08:00</Created_Moment>
      <Accounting_Date>2020-12-31-08:00</Accounting_Date>
   </Record>   
</File>

The goal is to only output nodes with a journal_source type of "Credit_Card_Transaction_Load" that do not have a matching "Procurement_Card_Transaction_Verification" that also has a "Created_Moment" that is greater than that of the credit card transaction. By matching, I mean they have the same value for "concat_values" field except one is Credit and the other Procurement.

The tricky part here is that I can only match a procurement transaction once. After it has been used, I can't take it into account for other credit card transactions even if they also match. Below is an example of what the output would need to be for the sample provided previously (only interested in getting the "wid" field in the output):

<File>   
    
        <wid>def</wid>

</File>

I first thought of keeping track of the used procurement transactions by updating a map or a variable in a foreach loop. I would then make sure to check if that transaction had already been used to match another credit card transaction previously. However, this doesn't work because variables are immutable.

I also thought about exploring XSLT 3 features and tried to look at accumulators but I didn't get very far.

Any help would be appreciated!


Solution

  • The following examples groups the procurement transactions into a map from a string key that concatenates the employee number and the absolute amout to a sequence of Record elements, this map is then used in an iteration over the credit card transactions to find a match, if there is none the credit card transaction is output, for the next iteration the match is removed from the map values for the used key:

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="3.0"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      xmlns:map="http://www.w3.org/2005/xpath-functions/map"
      xmlns:mf="http://example.com/mf"
      exclude-result-prefixes="#all"
      expand-text="yes">
      
      <xsl:function name="mf:get-key" as="xs:string">
        <xsl:param name="record" as="element(Record)"/>
        <xsl:sequence select="replace($record/concat_values, '\|[^|]+\|', '|')"/>
      </xsl:function>
    
      <xsl:output method="xml" indent="yes"/>
    
      <xsl:template match="/File">
        <xsl:copy>
          <xsl:variable name="procurement-map" as="map(xs:string, element(Record)*)">
            <xsl:map>
              <xsl:for-each-group 
                select="Record[journal_source = 'Procurement_Card_Transaction_Verification']" 
                group-by="mf:get-key(.)">
                <xsl:map-entry key="current-grouping-key()" select="current-group()"/>
              </xsl:for-each-group>
            </xsl:map>
          </xsl:variable>
          <xsl:iterate select="Record[journal_source = 'Credit_Card_Transaction_Load']">
            <xsl:param name="procurement-map" select="$procurement-map"/>
            <xsl:variable name="key" select="mf:get-key(.)"/>
            <xsl:variable name="match" select="$procurement-map($key)[xs:dateTime(Created_Moment) gt xs:dateTime(current()/Created_Moment)][1]"/>
            <xsl:if test="not($match)">
              <xsl:copy>
                <xsl:copy-of select="wid"/>
              </xsl:copy>
            </xsl:if>
            <xsl:next-iteration>
              <xsl:with-param name="procurement-map"
                 select="if (not($match)) 
                         then $procurement-map 
                         else map:put($procurement-map, $key,  $procurement-map($key) except $match)"/>
            </xsl:next-iteration>
          </xsl:iterate>
        </xsl:copy>
      </xsl:template>
    
    </xsl:stylesheet>