marklogicmlcp

Fail to import large files size use MLCP utilities to MarkLogic database


I have a large pdf file size 1GB fail to load into MarkLogic.

Is there the way for mlcp split the large file into small files, then merge back into single file pdf after loading into database?

skipp record () in file:/data2022/ABO2022-129.pdf, reason: the file size too large: 13040600 use streaming option.


Solution

  • MarkLogic does not really care about the size of the binary. At that size, by default it will just be stored in the large-binary directory under the forest as a regular file and treated in the system like any other binary. No, there are no tools to break-apart the content - nor is there really any likely valid reason to do this for binaries - unless you hit some documented maximum size for MarkLogic or your filesystem or other system resource.

    The error you see is not a MarkLogic error. It is an MLCP imposed maximum size related to memory management. Assuming that you are reading this file from disk and not an MLCP command from server->server, then the error message already suggests your next fix.. the streaming option of MLCP. It basically streams from disk to server and does not need to build the whole document node into memory locally.

    For details, see section 4.13.5 here: MLCP User Guide - 4.13.5 Reducing Memory Consumption With Streaming