xquerymarklogic-corbsjs

How do I write multiple output files using CoRB?


By default when I run a CoRB job that returns data from the process function that data is streamed into a single file on the CoRB client. I have a problem where I need to write the output to different files, one file per URI that is being processed. How do I write CoRB outputs into multiple files instead of one large file?

I have a CoRB job that returns the URI today, and those URIs are streamed together into one output file with each URI on a new line. I would prefer to have a directory filled with files, and have one file per URI.


Solution

  • CoRB has two built-in Tasks that can be used to write the output of the PROCESS-MODULE to the filesystem.

    It is common for people to write CoRB jobs to generate a CSV and other reports that append the output of the PROCESS-MODULE execution into a single file. If you specify the EXPORT-FILE-NAME option, then CoRB will automatically use ExportBatchToFileTask by seting the PROCESS-TASK option for you (unless you have explicitly set the PROCESS-TASK option):

    PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
    

    However, if you would prefer to have the results of each process module execution saved as it's own output file, for a multi-threaded download/export, then you would want to configure the ExportToFileTask. It will use the URI sent to the process module to construct a directory structure an filename, and save the results of the transform to that file path.

    You can set the EXPORT-FILE-DIR to provide a base directory in which to write out those files.

    So, to configure CoRB to write the results of each PROCESS-MODULE execution to it's own file, you would want to have the following options set for your CoRB job:

    PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
    EXPORT-FILE-DIR=/tmp/export