sql-serveroracle-databaseetlapache-nifi

ETL Multiple Tables with relationships using Apache Nifi


I want to use Apache Nifi to ETL data from an Oracle Database, to a SQL Server database.

The target database, has 2 tables (with a relationship) and a 1-n cardinality (with a big value of n, such as 20k or something), let's name the tables Header and Detail.

I created 2 views on my source (Oracle) database, one returning Header data and one Detail data.

So I need to fetch all Headers from Oracle view, insert each record in SQL Server, and then for each record, fetch all Details from Oracle and insert them in SQL Server.

What is best solution to iterated over records in NiFi?

As I have lots of records in Header and lots of lots of more records in Detail (as I said maybe 20k records in Detail for each record of Header), joining all and fetching all is not efficient at all.

It's job which should be run periodically.

So I started with a QueryDatabaseTable processor to read Header view (and it work incrementally), and with a PutDatabaseRecord processor I populated Header table in SQL Server. That works fine.

Now I want to fetch records of Detail view by a ExecuteSQL processor, the problem is now I need to have ID of each record of Header to pass it to query of ExecuteSQL, I have written something like select * from Detail d where d.header_id = ${ID} but it failed as ${id} was empty.

I added a SplitAvro process after inserting Header into target database, the good point is now I have a bunch of FlowFiles instead of one big one (1 for each record), then I added a FilterAttribute process, so I can just have ID attribute.

Now I don't know how to pass that attribute to next ExecuteSQL process....


Solution

  • Well I have done the job, these were the steps:

    1. Execute SQL: just fetching non ETLed header IDs,
    2. Convert Records: from Avro to JSON
    3. Split JSON
    4. Extract Text: extracted header ID to an attribute, as attributes can be passed to queries
    5. Execute SQL: fetching header records (one by one)
    6. Put Database Record: inserting header record
    7. Execute SQL: fetch all detail record of the header
    8. Put Database Record: inserting detail records in batch mode
    9. Put SQL: mark header record, that it is ETLed, so won't be selected in future exections.