marklogicdata-modelingtde

Which way is better to model TDE in MarkLogic? Two different data type for the same field or Cast the data type from OPTIC API?


I want to know which way is better to model TDE with MarkLogic.

We have XML documents with many different DateTime fields. Most of the time (99.99%), the timestamp part is of no biz use. I guess the remaining 0.01% use case is for data problem investigation like when this happened.

TDE is neat and easy to expose document data to external BI tools via ODBC. All the columnar-type of modem BI tools (Power BI) prefer separating the Date and Timestamp fields from one single Datetime field. That will improve the BI tool performance significantly.

There are two options to do that.

  1. Create two different fields in TDE from the same field. See the below screenshot. Most of the time, use Date type TDE field only.

  2. Create only one DateTime field in TDE and use type casting in OPTICS API or SQL (ML version of favour)

Which way is better?

TDE template xml


Solution

  • I would say model the data as you plan to use it. In your case, adding the extra TDE field. A few points:

    1. It should compress well.. Only one unique value per day per forest.

    2. MarkLogic is a clustered database. Queries are resolved per forest, then Per node and then on the evaluator node. You should always be careful about filter, sort, join on any dynamic value since sometimes to resolve the items, more data has to be pushed to the evaluator node. Storing the data as you plan to use it helps minimize the risk of suboptimal queries in general, but even more so on a clustered database.