marklogictdects-search

Marklogic Perfomance impact on where to place the string replacement in Optics query or TDE


It is about searching XML content in MarkLogic.

One of the fields has some double quote characters inside, which causes a problem for CSV export. It is decided to change the double quote to a single quote.

I used TDE with Optics API to perform the search. There are two ways to implement the above fix.

  1. Place the fn:replace in optics API part for query output.
  2. Place the fn:replace in TDE part when doing the indexing.

enter image description here

Both ways seem could solve the problem. However which is better and why? My gut feeling is that option 1 will make the query slow. In contrast, option 2 will make the data ingestion slow.


Solution

  • Pay me now, or may me later - but documents only need to be indexed once, so I would expect that it would be better to do the work in the TDE when projecting the column, but it may depend upon the volume of calls and the size of the result set in the optic query results.

    The best way to know is to test and measure both ways.

    Also, if you are looking to micro-optimize, you may find that translate() performs better than replace(). You can translate " to ' and 
 (and 
 if necessary) into nothing in a single call. translate() is simple character substitution and doesn't use regex, so can perform faster.

    <val>
    fn:translate(synopsis, '"&#10;',"'")
    </val>