pentahokettleflattenpentaho-data-integrationdenormalized

Flatten Invoice Header with Invoice Lines in Kettle


If you have an invoice header with several values (invoice #, date, location) and an unknown amount of invoice lines with several values (product, price, tax), is there a way to flatten this data to one row that extends in cases where the quantity of invoice lines varies by invoice?


Input Example-

{"InvoiceRecords": [{
    "InvoiceDate": "8/9/2017 12:00:00 AM",
    "InvoiceLocation": "002",
    "InvoiceNumber": "2004085",
    "InvoiceRecordHeaderDetails": [{
        "InvNum": "2004085",
        "Location": "002",
        "InvDate": "8/9/2017 12:00:00 AM"
    }],
    "InvoiceRecordLineItemDetails": [{
        "UniqueID": "3939934",
        "InvNum": "2004085",
        "LINEITEM": "1",
        "CUSTID": "PREAA",
        "DEPTID": "320306",
        "PRODID": "088856",
        "ProdDesc": "STATE UST",
        "Unitprice": "0.003",
        "QuantShare": "237.5",
        "TaxRate": "7.25",
        "taxamount": "0.05"
    }],
    "InvoiceTaxCodeDetails": [{
        "InvNum": "2004085",
        "LineItem": "1",
        "UniqueID": "34",
        "taxCode": "SALES TAX",
        "taxrate": "7.25",
        "maxtax": "0"
    }]
}]}

I need all the items on the same row (allowing for there to be more than one line item and/or more than one Tax Code items on a given Invoice Record.


Output Example (note: "_n" below in reference to undetermined amount of invoice lines and tax rows possible):

{"InvoiceRecords": [{
    "InvoiceDate": "8/9/2017 12:00:00 AM",
    "InvoiceLocation": "002",
    "InvoiceNumber": "2004085",
    "InvoiceRecordHeaderDetailsInvNum": "2004085",
    "InvoiceRecordHeaderDetailsInvNumLocation": "002",
    "InvoiceRecordHeaderDetailsInvNumInvDate": "8/9/2017 12:00:00 AM",
    "InvoiceRecordLineItemDetailsUniqueID_1": "3939934",
    "InvoiceRecordLineItemDetailsInvNum_1": "2004085",
    "InvoiceRecordLineItemDetailsLINEITEM_1": "1",
    "InvoiceRecordLineItemDetailsCUSTID_1": "PREAA",
    "InvoiceRecordLineItemDetailsDEPTID_1": "320306",
    "InvoiceRecordLineItemDetailsPRODID_1": "088856",
    "InvoiceRecordLineItemDetailsProdDesc_1": "STATE UST",
    "InvoiceRecordLineItemDetailsUnitprice_1": "0.003",
    "InvoiceRecordLineItemDetailsQuantShare_1": "237.5",
    "InvoiceRecordLineItemDetailsTaxRate_1": "7.25",
    "InvoiceRecordLineItemDetailstaxamount_1": "0.05",
    "InvoiceTaxCodeDetailsInvNum_1": "2004085",
    "InvoiceTaxCodeDetailsLineItem_1": "1",
    "InvoiceTaxCodeDetailsUniqueID_1": "34",
    "InvoiceTaxCodeDetailstaxCode_1": "SALES TAX",
    "InvoiceTaxCodeDetailstaxrate_1": "7.25",
    "InvoiceTaxCodeDetailsmaxtax_1": "0",
    "InvoiceRecordLineItemDetailsUniqueID_n": "3939934",
    "InvoiceRecordLineItemDetailsInvNum_n": "2004085",
    "InvoiceRecordLineItemDetailsLINEITEM_n": "1",
    "InvoiceRecordLineItemDetailsCUSTID_n": "PREAA",
    "InvoiceRecordLineItemDetailsDEPTID_n": "320306",
    "InvoiceRecordLineItemDetailsPRODID_n": "088856",
    "InvoiceRecordLineItemDetailsProdDesc_n": "STATE UST",
    "InvoiceRecordLineItemDetailsUnitprice_n": "0.003",
    "InvoiceRecordLineItemDetailsQuantShare_n": "237.5",
    "InvoiceRecordLineItemDetailsTaxRate_n": "7.25",
    "InvoiceRecordLineItemDetailstaxamount_n": "0.05",
    "InvoiceTaxCodeDetailsInvNum_n": "2004085",
    "InvoiceTaxCodeDetailsLineItem_n": "1",
    "InvoiceTaxCodeDetailsUniqueID_n": "34",
    "InvoiceTaxCodeDetailstaxCode_n": "SALES TAX",
    "InvoiceTaxCodeDetailstaxrate_n": "7.25",
    "InvoiceTaxCodeDetailsmaxtax_n": "0"
}]}

Thanks!


Solution

  • You have an example of a similar question in the samples directory which sits nearby you spoon.bat. Have a look at the samples/transformation/XML Add and survive the first choc: they do something much more complex, just to show all what is possible.

    In your case, split with a Switch/Case, the input stream in header, items and manage to keep the InvoiceNumber on each (more on this later). Convert the three stream into JSON (with JSON Output or, maybe easier, with a Javascript). Then you Group by the items by InvoiceNumber. Join the three flows by InvoiceNumber, for which I suggest a lookup stream in the header stream then an other lookup stream in the footer stream. With an other javascript and treating the data as string, you can build the JSON row in the format { header, [item], footer}, which you can Group by with a concatenation to have only one row.

    Some work, but rather standard, except for the tricky part of the get the InvoiceNumber on the items and footer as they have disappeared from the flow. For that you can use fact that the javascript preserve the values unless redefined. Add a new start script [right click on the Script1 on the tab top, add a copy, right click on the Script1_0 just created, and define it as Start script].

    On this start script:

    var PrevInvoiceNumber = -1;
    

    On the main script:

    if(InvoiceNumber && PrevInvoiceNumber!=InvoiceNumber)
        PrevInvoiceNumber = InvoiceNumber
    

    Then you should see the data with on each line the PrevInvoiceNumber which is equal to the expected InvoiceNumber of the invoice.enter image description here