marklogicmarklogic-dhf

How can I avoid range index error for fields with the same name as the index field in attached documents?


For my project I have to ingress and curate data from various sources. I do that through the data-hub framework using flows. All my different sources have a field called "date". However they all come in different forms e.g. yyyy-mm-dd, yyyymmdd, dd.mm.yyyy.

I do the curation through a mapping step to one common format yyyy-mm-dd. After the mapping the field is still called "date".

Since I want to be able to do range searches I need an index on my "date". However this leads to error when ingesting the data as the ingested data's "date" field is not yet mapped to the right format.

My solution was to not reject invalid values for the STAGING database. However because the old document is attached in the envelope of the curated new document that is moved to the FINAL database after the mapping I get a range index error for the attached document.

I want to reject invalid values in the FINAL database but I also want to keep the original document as an attachment in the final file.

The only solution I can see so far is to name the "date" element in the FINAL database something like iDate in order to avoid conflicts.

This doesn't seem like a clean solution to me. Do you have better suggestions?

I am using:


Solution

  • If you use a path range index, you can limit it to just those date elements that are in the top-level instance and not in the attachment.

    See https://docs.marklogic.com/guide/admin/range_index#id_40666 for details on using path range indexes.