xquerymarklogicxquery-3.0adhoc-queriescts-search

How to load the xhtml document as a text and search the keyword in marklogic


I have loaded the XHTML files in marklogic. But need to perform the search on attributes, elements and text. So I need to get/load the document as a text and perform the search on document.

Below is the XHTML file.

            <?xml version="1.0" encoding="UTF-8"?>
    <html xmlns="http://www.w3.org/1999/xhtml">
        <meta>
            </meta>
        <body class="Default">

        </body>
    </html>

Using below code I am ble to save text file but it will aloow to save (>0.2KB )small size file. I need to save upto 1 to 50MB files in marklogic DB.

Using below code I am able to save file as text but big file not able to save.
 ContentCreateOptions createOptions = ContentCreateOptions.newTextInstance();

 Content content = ContentFactory.newContent("/"+uID,filetext, createOptions);

 mlSession.insertContent(content);

Solution

  • Still not quite sure about the use case behind this, but here you go:

    If you really want to search full-text on element names, attributes, and text all together, without discrimination, you best insert it as text at ingest time. For instance with something like:

    xdmp:document-insert(
        "/my.xhtml",
        text {
            xdmp:quote(
                <html xmlns="http://www.w3.org/1999/xhtml">
                    ...
                </html>
            )
        }
    )
    

    Or:

    xdmp:document-load(
        "/server/path/to/my.xhtml",
        <options xmlns="xdmp:document-load">
            <format>text</format>
        </options>
    )
    

    After that you can simply do:

    cts:search(collection(), "mytagorattrorterm")
    

    Alternatively you could use xdmp:quote before doing a cts:contains or fn:contains, but that scales very badly, so you best do that only on one or a few docs at the same time.

    HTH!