marklogicmarklogic-corb

How to use pass in new field value to Corb routine to update the document in ML


I am using CoRB |ADHOC mode to batch-update existing documents in ML based on a CSV input file.

The input file has the URI and the new Field Value for each row. Basically, I need to update the corresponding document (URI) with the new Field Value. The input CSV file is sitting outside of the ML cluster. It is running from a windows PC.

Here is the clumsy solution I come up with:

  1. Develop a customized uris.xqy (URIS-MODULE) to read that input csv from the local drive. The file path passed in from one parameter from the CoRB option.properties file.
  2. Inside that uris.xqy, rebuild a composite new uri object, which contains both uri and new value fields.
  3. In the process.xqy (PROCESS-MODULE), it will read that the composed uri object passed in from the uris.xqy, decompose the uri with its new field value. The script will then make the update.

The solution works. However, it only works when I run it locally in windows DOS console. It won't work if I run from the gradle. The problem is that if I run from gradle, it seems the xquery (uris.xqy) will get evaluated in the host machine instead. That means it could not access that windows based input file path anymore. The workaround is to upload that input csv file to a web server and specify the file path as a URI.

Is there a better way to do that?

(I am trying not to load that input csv file to the content db. It is clean to run it outside of ML I guess.)


Solution

  • So, you have a CSV that you want to read for the set of URIs to process?

    For that, you don't need/want a URIS-MODULE, you instead will want to use the URIS-FILE and give it the path to that CSV.

    If defined instead of URIS-MODULE, URIs will be loaded from the file located on the client. There should only be one URI per line. This path may be relative or absolute. For example, a file containing a list of document identifiers can be used as a URIS-FILE and the PROCESS-MODULE can query for the document based on this document identifier.