As part of our Governance initiative and regulatory requirement, we need to produce a Lineage (tractability) report, outlining the flow of data into our Warehouse, and the Reports or Services consuming its data. We are aware that Information Governance Catalog can produce such a report automatically when DataStage is writing data to the Warehouse. Can Information Governance Catalog do the same when we use SQL Scripts or other tooling to read or write information to our Warehouse? Can I view a complete Lineage report, that incorporates such different information?
What are the steps within IGC to document or otherwise define the usage of information to support Data Lineage and Regulatory reporting?
Yes, while we can automate the production of Lineage (traceability) reports for DataStage, IGC does offer facility to document the flow of data for other data movement scripts, tools or processes. This will produce the same Lineage reports, that can be used to satisfy needs for compliance, or build confidence and trust in the use or consumption of data.
At it simplest, IGC allows one to draft a Mapping Document. Essentially a spreadsheet that delineates the Data Source and Data Target, and documentation to support the transformation, aggregation or other logic. The spreadsheet can be directly authored in IGC, or loaded from Excel (text file) which further supports automation of the process. Documentation for Extension Mapping Documents can be found here: https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.5.0/com.ibm.swg.im.iis.mdwb.doc/topics/c_extensionMappings.html (though suggest creating such a document from IGC, and exporting the results to Excel).
In addition, IGC supports a more formal process for extending the Catalog and introducing new types of Assets. This would go one step further, and properly document and catalog the Data Processes (SQL commands, other ETL tooling) and map the data movement thru those Processes. This will allow users to identify with the Data Process and even allow one to include operational data (as is supported for IGC). More information on this process can be found here: https://www-01.ibm.com/support/docview.wss?uid=swg21699130
Suggest to review the absolute requirements, and what information is required for the ensuing traceability report. Starting with the Extension Mapping Document should suffice, and would be the simplest to implement and drive immediate benefit.