I am planning to deploy Apache Atlas using Apache Cassandra as a storage backend and Elasticsearch as an index backend. I am wondering how I can save lineage info with this? It provides get API to get the lineage info but seems to have no way to save it.
In Atlas lineage is created when they are linked through processes using inputs and outputs.
Example: If you want to see a lineage between two hive_table types it would be like:
T1(hive_table)--->P1(hive_process)--->T2(hive_table)
So,basically the entities need to be linked through a process type.
In Atlas processes are entities and can be created using the API POST: /v2/entity
with inputs and outputs defined in them like for above hive_process
:
POST: /api/atlas/v2/entity
{
"entity": {
"typeName": "hive_process",
"attributes": {
"outputs": [
{
"guid": "2",
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "t2@primary"
}
}
],
"qualifiedName": "p1@primary",
"inputs": [
{
"guid": "1",
"typeName": "hive_table",
"uniqueAttributes": {
"qualifiedName": "t1@primary"
}
}
],
"name": "P1-Process"
}
}
}
Important thing to note before creating the process is that referenced entities(inputs,outputs) should pre-exists,else process creation will fail.
If your requirement doesn't consist of pre-existing types you can of course go ahead and define your own types for Atlas Entity and Process
More about Atlas type system on Apache site