marklogicmarklogic-9mlcp

MarkLogic - java heap space error while importing with mlcp


Marklogic version : 9.0-6.2 mlcp version: 9.0.6

I am trying to import XML file into marklogic using MLCP uisng below code.

#!/bin/bash
mlcp.sh import -ssl \
-host localhost \
-port 8010 \
-username uname \
-password pword \
-mode local \
-input_file_path /data/testsource/*.XML \
-input_file_type documents \
-aggregate_record_namespace "http://new.webservice.namespace" \
-output_collections testcol \
-output_uri_prefix /testuri/ \
-transform_module /ext/ingesttransform.sjs

The code is running successfully with a small file but giving 'java heap space' error when run with large file (450 MB).

ERROR contentpump.MultithreadedMapper: Error closing writer: Java heap space

How could we resolve this error?


Solution

  • The mlcp job is designed to send the whole input file as one single document (-input_file_type documents) of size 500 MB into the transform module. The transform module has logic to spit uris and value (content.uri and content.value) for each aggregate element. This is resulting in java heap space error even though the heap space available on server is around 3.4 GB.

    I tried two different designs that are working.

    1. Add aggregation in mlcp (-input_file_type aggregates, -aggregate_record_element CustId) to spit into multiple documents. This creates multiple documents in staging DB
    2. keep -input_file_type as documents and remove -transform_module, so the file is loaded as one single document into staging.

    Both approaches are working, but the second approach may create documents with size of 500 MB (I believe the size limit is 512 MB). So I opted to use the first approach (also, I need a better uri than the default created by mlcp).