MarkLogic is installed on Windows 10 machine.
We are using MarkLogic Content Pump (MLCP) to import data
It is working well with
<?xml version="1.0" encoding="UTF-8"?>
It is showing error while importing non UTF8 encoding i.e.
<?xml version="1.0" encoding="US-ASCII"?>
I looked at MLCP guide and found content_encoding parameter but its not working and throwing error for records contains special characters like ´ δ, “ & so on
ERROR mapreduce.ContentWriter: XDMP-DOCENTITYREF: Invalid entity reference "gamma"
I am passing it as follows
mlcp.bat -content_encoding "US-ASCII"
When i looked at this document, it says "Only UTF-8 is supported."
When i looked at this, it says "The option value must be a character set name accepted by your JVM;"
So i am confused and not sure how to solve this issue and how to set character set in JVM
Thanks grtjn for your reply.
-xml_repair_level full worked and all records are now committed and no failed records.
Special characters (with ;) are stored in ML with real character as follows
I am hoping that this should be acceptable content from business point of view.
Now only major challenge is to test with garbled characters in millions of xml records.
Thanks grtjn for your help.