javatalendtalend-mdm

Generate 1 row from multiple rows Talend


I ve found on the forum exactly the inverse logic of what i need, so i have to ask :)

i have fields like this

|sector|EmployeeNr|Name|Reason|startTime|EndTime|

with this kind of sample data

|Marketing|1|Holydays|Henri|2019-10-03T07:00:00.000Z|2019-10-03T15:00:00.000Z|
|Marketing|1|Holydays|Henri|2019-10-04T07:00:00.000Z|2019-10-04T15:00:00.000Z|
|Marketing|1|Holydays|Henri|2019-10-05T07:00:00.000Z|2019-10-05T15:00:00.000Z|
|Marketing|1|Holydays|Henri|2019-10-06T07:00:00.000Z|2019-10-06T15:00:00.000Z|
|Marketing|1|sickness|Henri|2019-10-08T07:00:00.000Z|2019-10-08T15:00:00.000Z|
|Marketing|1|sickness|Henri|2019-10-09T07:00:00.000Z|2019-10-09T15:00:00.000Z|

|IT-Depart|2|Holydays|Paule|2019-11-08T07:00:00.000Z|2019-11-08T15:00:00.000Z|
|IT-Depart|2|Holydays|Paule|2019-11-09T07:00:00.000Z|2019-11-09T15:00:00.000Z|
|IT-Depart|2|Holydays|Paule|2019-11-10T07:00:00.000Z|2019-11-10T15:00:00.000Z|

|Marketing|1|Holydays|Henri|2019-10-17T07:00:00.000Z|2019-10-17T15:00:00.000Z|
|Marketing|1|Holydays|Henri|2019-10-18T07:00:00.000Z|2019-10-18T15:00:00.000Z|

i want has a Result, a single row for absences on consecutive days for the same reason

something like this

|sector|EmployeeNr|Name|Reason|startTime|EndTime|

|Marketing|1|Holydays|Henri|2019-10-03T07:00:00.000Z|2019-10-06T15:00:00.000Z|

|Marketing|1|sickness|Henri|2019-10-08T07:00:00.000Z|2019-10-09T15:00:00.000Z|

|IT-Depart|2|Holydays|Paule|2019-11-08T07:00:00.000Z|2019-11-10T15:00:00.000Z|

|Marketing|1|Holydays|Henri|2019-10-17T07:00:00.000Z|2019-10-18T15:00:00.000Z|

i think that should be something like an aggregate on Date, Reason and EmployeeNr but don t know really how to deal with this.

Any help please?


Solution

  • guys, i found a solution by creating a new column named Sequence, and give this column the same value when this condition is respected ( days are consecutive, and it is related to the same employee, and to the same reason) . I wrote the code in a tjavarow with the Numeric.sequence function and an if (condition)

    so for example

     |sector|EmployeeNr|Name|Reason|startTime|EndTime|Sequence
    
    |Marketing|1|Holydays|Henri|2019-10-03T07:00:00.000Z|2019-10-03T15:00:00.000Z|1
    |Marketing|1|Holydays|Henri|2019-10-04T07:00:00.000Z|2019-10-04T15:00:00.000Z|1
    |Marketing|1|Holydays|Henri|2019-10-05T07:00:00.000Z|2019-10-05T15:00:00.000Z|1
    |Marketing|1|Holydays|Henri|2019-10-06T07:00:00.000Z|2019-10-06T15:00:00.000Z|1
    |Marketing|1|sickness|Henri|2019-10-08T07:00:00.000Z|2019-10-08T15:00:00.000Z|2
    |Marketing|1|sickness|Henri|2019-10-09T07:00:00.000Z|2019-10-09T15:00:00.000Z|2
    
    |IT-Depart|2|Holydays|Paule|2019-11-08T07:00:00.000Z|2019-11-08T15:00:00.000Z|3
    |IT-Depart|2|Holydays|Paule|2019-11-09T07:00:00.000Z|2019-11-09T15:00:00.000Z|3
    |IT-Depart|2|Holydays|Paule|2019-11-10T07:00:00.000Z|2019-11-10T15:00:00.000Z|3
    
    |Marketing|1|Holydays|Henri|2019-10-17T07:00:00.000Z|2019-10-17T15:00:00.000Z|4
    |Marketing|1|Holydays|Henri|2019-10-18T07:00:00.000Z|2019-10-18T15:00:00.000Z|4
    

    and at the end, use the tAggregateRow as suggested by Jim Macaulay

    min(startTime)

    max(EndTime)

    group by EmployeeNr,Name,Reason, Sequence.

    And that works!

    Thanks all for your help.