I have a Pentaho Data Integration job which has the following steps:
Generate row step which has an initial date (for e.g. 2010-01-01) and the limit as 10*366 = 3660 rows for 10 years.
Next step has an incrementer to increment the number of days.
Next step uses this information viz. initial date, limit, and the incrementer, to generate dates for each day for 10 years starting 2010-01-01 using javascript functions.
Final step loads a table with the generated dates.
All this works fine.
Now, I have a requirement where I do not want this table to be static with dates for 10 years only. If the max date in the date table is 2 years from today, I want to load dates for 10 more years in the table.
For the above example, with the 1st load loading dates for 10 years from 2010, I should be able to load 10 more years in 2018, the next 10 years in 2028 and so on and so forth.
What will be the best way to achieve this?
How can I:
1) Read the max date from my date table? - I know how to do this.
2) Use the read date to compare against today. And if the max date is within 2 years from today, I populate the table with next 10 years.
I don't know how to do 2 above in Pentaho data integration. Will really appreciate any pointers on a way to resolve this issue.
You need to read the current date (today) in a variable. For example with a Get system info
step.
Then you can compare the two fields, max date and today, with a Filter Rows
step.
As the previous step may give you more than one row, you need to either use a Unique Row
(no field to provide) either a Group by
(no group by field).
If any row gets by, then you launch you generate 10 years process. As you cannot have a hop from a step into this second Generate row
, you must use a Transformation executor
to launch your currently existing transformation.
Now, if your requirement gets a tiny little bit more complex than that, I strongly suggest you to use jobs to orchestrate your transformations.