talendtalend-mdm

How to filter rows based on a condition and if the condition isn't met, grab another row in Talend?


It was hard to think of a title for this question, so hopefully that did make sense.

I will explain further. I have a flow of data from an Excel file and each row has one of two words in the last column. It will either contain "Open" or "Current".

So lets say I have an input that looks like this:

NAME | SSN  | TYPE
John | 12345| Current
Katy | 99999| Current
Sam  | 33333| Current
John | 12345| Open
Cody | 55555| Open

And the goal is grab only a person once. Each person has their unique id as their SSN. I want to grab Open rows if both Open and Current exist for that person. If only Current exists, then grab that. So the final output should look like this:

NAME | SSN  | TYPE
Katy | 99999| Current
Sam  | 33333| Current
John | 12345| Open
Cody | 55555| Open

NOTE: As you can see, the first entry for John has been removed since he had an Open row.

I have attempted this already but it is sloppy and I figure there must be a better way. Here is an image of what I have done: Talend flow


Solution

  • Here's how you can do it: enter image description here

    First sort the data by Name, and Type descending (this is important so that for each person, the Open record is on the top); then in the tMap filter it like this:
    enter image description here

    Numeric.sequence(row2.name, 1, 1) == 1
    

    Only let the record through if this is the first we're seeing this name.