oracle-databaseetlinformaticainformatica-powercenter

How to check the used space during run of an Informatica Powercenter Workflow?


I'm trying to find out more about the space that Informatica Powercenter takes up when a Workflow is executed.

At the moment, we have a table on an Oracle server that weighs about 300 MB and has almost 1.9 million rows: we have developed a very simple ETL that reads from an SQL view pointing to the aforementioned table without transformations, and that takes the data to another Oracle table on another server. When we open the Monitor software checking the Session info, we see the Bytes column showing this exaggerated number, which is about 23 GB.

Numbers in Monitor software

From what we understand, this is simply an estimate of the total bytes transmitted, both read and write. In fact, if we increased the precision of the columns (for example, from VARCHAR2(100) to VARCHAR2(4000)) and left the rows unchanged, this number would increase.

So this value of Bytes in the monitor does not represent the space actually used, is that correct? If so, is there any way to see it?

Thank you!


Solution

  • In Workflow Monitor what you can see is the number of rows multiplied by their total lenght defined. So if you want to have astonished audience surprised by ETL performance, declare 100 VARCHR(4000) fields with all nulls to achieve amazing performance that has nothing to do with the actual amount of data transfered.

    That's it for the Workflow Monitor. Your title question refers to space used, however. So how much space is used by PowerCenter to process the data?

    Check the logs. If you read the logs you will find entries mentioning the files getting created and space allocated. There is also another way, knowing the names of the files from the logs, but you can check that ONLY while the workflow is running. You need to check the cache (*.idx and *.dat) files created in temp and cache infashared folders (for joiners, sorters, lookups, aggregators). Do note that files are removed after workflow is successfully completed, so you'd need to catch that during the execution.

    Please also note that ideally, no data (almost) will be cached and all rows read from source will be written to target without storing data in the middle. This is of course ideal situation when no caching transformations are used.

    So, there are two topics here: