ctstdemarklogic-10

How to find out which documents having compliance issue with TDE in Marklogic


I guess behind TDE, ML will still create different types of indexes. TDE greatly simplify the task to maintain the indexes.

However, the traditional ML Indexes are not mandatorily reinforced on all the documents in DB. For example, if some documents do not have the xml field, the field range indexes won't index those documents. If one needs to know which documents do not have that xml field, CTS query could be used to identify those outlier documents.

How to do that with TDE? The question is how to know which documents do not have that field? I guess I can not use CTS anymore.


Solution

  • If one needs to know which documents do not have that xml field, CTS query could be used to identify those outlier documents.

    You can use:

    cts:not-query(cts:element-query(xs:QName("theMissingElement"), cts:true-query()))
    

    That will give you a list of documents missing a particular element.

    How to do that with TDE? The question is how to know which documents do not have that field?

    One way I can think of to do this is to include a unique ID (perhaps the URI?) in each generated row from your TDE. Then, generate the list of all IDs from TDE. Next, generate the same list using CTS. Finally, take the values in the CTS list that don't appear in the TDE list to get the result of documents not indexed by the TDE.

    Another, simpler, way I can think of to do this is to allow the missing element to be invalid and nullable rather than valid and required. Then, you just need to get the list of all rows where that column is null.