marklogicmlcp

Can we replace the default document uri to a value from the document itself during mlcp ingestion in MarkLogic


I want to replace the default document uri of the file to a value from the file's content.

For example - the default uri is /test/Invoice.xml

I want to replace the doc uri to

/Invoice_{current date time from file from field DateCreated}.xml

The file looks like this

<?xml version="1.0" encoding="UTF-8"?>
<Test xsi:noNamespaceSchemaLocation="file:///D:/Mapforce/Projects/schema/Test.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <ID>f1258d4ae0df43d5a1e05ce9139f0ed2</ID>
    <SystemRef>22000041</SystemRef>
    <DateCreated>2022-09-06T19:07:46.3492849+01:00</DateCreated>
    <TimeSaved>240</TimeSaved>
    <ManyReasons/>
    <SubmissionUser>System</SubmissionUser>
    <InternalBusinessUnit>Finance</InternalBusinessUnit>
    <Direction>Inbound</Direction>
</Test>

How can I do it using mlcp ?


Solution

  • Controlling Database URIs During Ingestion

    By default, the document URIs created by mlcp during ingestion are determined by the input source. The tool supports several command line options for modifying this default behavior.

    If you are applying a custom transformation, then you can also control the URI of the document. Inside of the transform method, set the uri property of the $content map with whatever value you want. i.e. map:put($content, "uri", "myCustomURI.xml"). See: Example: Changing the URI and Document Type

    So, in your custom transform you could XPath to the DateCreated element and let a variable:

    let $created := map:get($content, "value")/Test/DateCreated
    

    and then use it to construct the desired URI (may want to normalize/format the DateCreated value for a clean URI)

    map:put($content, "/Invoice_"||$created||".xml")