xmlxpathmarklogicmlcp

Unable to export single document in MarkLogic using MLCP


I am trying to use the mlcp.bat to extract the following document with URI: /category/[2014] xxx.xml

This is the mlcp command used with parameters:

mlcp.bat export -host localhost -port 8000 -username admin -password admin -mode local -database database-content -output_file_path C:/mlcp/bin/xmlexport -document_selector '/CaseReport/Metadata[id="16594-SSP-M"]' -indented true

After executing the above command, there are no document extracted :( Below is the mlcp output:

INFO contentpump.ContentPump: Job name: local_320491878_1
INFO mapreduce.MarkLogicInputFormat: Fetched 1 forest splits.
INFO mapreduce.MarkLogicInputFormat: Made 2 split(s).
INFO contentpump.LocalJobRunner:  completed 0%
INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
INFO contentpump.LocalJobRunner: ESTIMATED_INPUT_RECORDS: 35722
INFO contentpump.LocalJobRunner: INPUT_RECORDS: 0
INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 0
INFO contentpump.LocalJobRunner: Total execution time: 26 sec

== UPDATE == This is the first 3 lines of the XML document content with uri /category/[2014] xxx.xml

<?xml version="1.0" encoding="UTF-8"?>
<CaseReport xlink:type="extended" category="unreported" neutralcitation="[2014] xxx" year="" volume="" series="" pageno="" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:exslt="http://exslt.org/common">
  <Metadata id="16594-SSP-M">

Solution

  • The -document_selector option expects you to specify an XPath that would select documents from the database. You are providing the URI of a document.

    Instead, use -query_filter and specify a query that uses the cts:document-query() to select with that URI: cts:document-query("/category/[2014] xxx.xml")

    This is an example of that query serialized as XML:

    -query_filter
    <cts:document-query xmlns:cts="http://marklogic.com/cts"><cts:uri>/category/[2014] xxx.xml</cts:uri></cts:document-query>
    

    This is an example of that query serialized as JSON:

    -query_filter 
    {"documentQuery":{"uris":["/category/[2014] xxx.xml"]}} 
    

    In order to avoid quotes and escaping issues with the query on the commandline, you would be better off putting this option into an options file and then using the -option_file option with the path to the file.