javascriptmarklogicditamarklogic-dhf

Adding Data Hub custom envelope file to collection


I’m working on a MarkLogic Data Hub project and encountering issues when trying to add generated envelope documents to a collection (dita-envelope) during a custom step in an ingestion flow.

Context
Approaches Tried
1. Using xdmp.documentAddCollections
if (cts.docAvailable(envelopeUri)) {
    xdmp.documentAddCollections(envelopeUri, ["dita-envelope"]);
}

Issue: Throws XDMP-OWNTXN errors because declareUpdate() is not allowed within the managed transaction context of a Data Hub custom step and without it gives error that declareUpdate is missing.

2. Adding Collection Metadata in flowUtils.writeContentArray
const outputContent = {
    uri: envelopeUri,
    value: envelope,
    context: {
        metadata: {
            collections: ["dita-envelope"],
        },
    },
};
flowUtils.writeContentArray([outputContent], options.database);

Issue: While the document gets written to the database, it is not assigned to the dita-envelope collection, only adds the metadata to the envelope.

3. Using addMetadataToContent from flow-utils.mjs
flowUtils.addMetadataToContent(outputContent, flowName, stepName, jobId);
flowUtils.writeContentArray([outputContent], options.database);

Issue: Adds metadata to the file about the collection but doesn’t actually assign the document to the collection.

4. Creating Placeholder Documents in the Collection
const placeholderUri = "/placeholder.json";
xdmp.documentInsert(placeholderUri, {}, {collections: ["dita-envelope"]});

Issue: Successfully creates the collection but doesn’t assign the envelope documents to it.

Current Environment

MarkLogic Version: 11.0 Data Hub Version: 6.1.1 Database Context: Using staging and final databases as part of the ingestion flow.

Question
  1. How can I programmatically assign documents to a collection in a MarkLogic Data Hub custom step without using declareUpdate()?
  2. Is there a reliable way to assign collections during the document creation process using flowUtils.writeContentArray or any other Data Hub utility?

Any guidance or suggestions for best practices to achieve this would be greatly appreciated!


Solution

  • Shouldn't the collections be a child of the context and sibling to the metadata in the content object?

    const outputContent = {
        uri: envelopeUri,
        value: envelope,
        context: {
            metadata: {},
            collections: ["dita-envelope"]
        }
    };