It is about searching XML content in MarkLogic.
One of the fields has some double quote characters inside, which causes a problem for CSV export. It is decided to change the double quote to a single quote.
I used TDE with Optics API to perform the search. There are two ways to implement the above fix.
Both ways seem could solve the problem. However which is better and why? My gut feeling is that option 1 will make the query slow. In contrast, option 2 will make the data ingestion slow.
Pay me now, or may me later - but documents only need to be indexed once, so I would expect that it would be better to do the work in the TDE when projecting the column, but it may depend upon the volume of calls and the size of the result set in the optic query results.
The best way to know is to test and measure both ways.
Also, if you are looking to micro-optimize, you may find that translate()
performs better than replace()
. You can translate "
to '
and
(and
if necessary) into nothing in a single call. translate()
is simple character substitution and doesn't use regex, so can perform faster.
<val>
fn:translate(synopsis, '" ',"'")
</val>