I want to use Apache Nifi to ETL data from an Oracle Database, to a SQL Server database.
The target database, has 2 tables (with a relationship) and a 1-n cardinality (with a big value of n, such as 20k or something), let's name the tables Header
and Detail
.
I created 2 views on my source (Oracle) database, one returning Header
data and one Detail
data.
So I need to fetch all Header
s from Oracle view, insert each record in SQL Server, and then for each record, fetch all Detail
s from Oracle and insert them in SQL Server.
What is best solution to iterated over records in NiFi?
As I have lots of records in Header and lots of lots of more records in Detail (as I said maybe 20k records in Detail for each record of Header), joining all and fetching all is not efficient at all.
It's job which should be run periodically.
So I started with a QueryDatabaseTable
processor to read Header
view (and it work incrementally), and with a PutDatabaseRecord
processor I populated Header
table in SQL Server. That works fine.
Now I want to fetch records of Detail
view by a ExecuteSQL
processor, the problem is now I need to have ID
of each record of Header
to pass it to query of ExecuteSQL
, I have written something like select * from Detail d where d.header_id = ${ID}
but it failed as ${id}
was empty.
I added a SplitAvro
process after inserting Header
into target database, the good point is now I have a bunch of FlowFiles instead of one big one (1 for each record), then I added a FilterAttribute
process, so I can just have ID
attribute.
Now I don't know how to pass that attribute to next ExecuteSQL
process....
Well I have done the job, these were the steps: