palantir-foundry

Optimal Data Sync Strategy for Palantir Foundry: Handling Incremental Updates Based on Timestamps


I have one table in the source and I want to sync it into Palantir Foundry by keeping in mind the below points

so which transaction type I can use, do I need to on Incremental, and which query I can write to achieve the above scenario?

I have tried using the Append Transaction type with Incremental on, but due to this configuration, I'm not able to update the old data which is in the foundry using the MODIFIED_DATE column.

Also, I want a better solution than using Snapshot which is not that much good.


Solution

  • Looking at the JDBC documentation (assuming you connect via JDBC) you will need to set the incremental on the MODIFIED_DATE column, as If your rows are mutable (...), you'll need a column that increases with every mutation of the data (e.g. an update_time column).

    The "good" solution depends on what you want to do with the data downstream, namely, if you need to have the history of changes or want to keep "the latest only".

    As the documentation suggests When you ingest an updated version of an existing row, the Foundry dataset will still include previous versions of the row (remember, we're using the APPEND transaction type). If you want only the latest version of each row, you will need to use another tool in Foundry, such as Transforms, to clean the data.

    You can very well ingest with "Append" and if a row has been "updated" then it will be re-added as a new row. Then you can deduplicate downstream, for instance:

    It is worth noting that Object syncing supports incremental, which means that new rows versions with "conflicting" primary key, will be reflecting in "only the latest version is available as an object".

    e.g. in Transaction 1

    PK 1 | Object A | Value 1  
    PK 2 | Object B | Value 2 
    

    and then you later receive in Transaction 2:

    PK 1 | Object A | Value 10 
    

    will result in your Ontology:

    PK 1 | Object A | Value 10  
    PK 2 | Object B | Value 2 
    

    Note: You need to deduplicate primary keys within each new transaction, as if you provide a new transaction with PK 1 | Object A | Value 30 AND PK 1 | Object A | Value 20 at the same time, then Foundry has no way to know which row version to keep.

    EDIT - Deletion of rows