xmlxpathlibxml2xpath-1.0schematron

Xpath 1.0 get unique values with limited scope


I am wokring with XML and XPATH v1.0, specifically using libxml2's Schematron module, that I am quite sure that uses the Xpath module for the assert tests. So in order to simplify the problem, lets just think on xpath expressions.

I have the following xml code:

<configuration
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <configuration_A>
        <slot_1>
            <type_a>
                <input_1>TEXT_1</input_1>
            </type_a>
        </slot_1>
    </configuration_A>

    <configuration_B>
        <slot_1>
            <type_a>
                <input_1>TEXT_1</input_1>
            </type_a>
        </slot_1>
        <slot_2>
            <type_a>
                <input_1>TEXT_2</input_1>
            </type_a>
        </slot_2>
        <slot_3>
            <type_a>
                <input_1>TEXT_2</input_1>
            </type_a>
        </slot_3>
        <slot_4>
            <type_a>
                <input_1>TEXT_3</input_1>
            </type_a>
        </slot_4>
        <slot_5>
            <type_b>
                <input_1>TEXT_3</input_1>
            </type_b>
        </slot_5>
    </configuration_B>
</configuration>

I am trying count the number of unique type_a/input_1's values within configuration_B. What would have been in XPATH 2.0 count(distinct-values(//configuration_B//type_a/input_1/text())

I found this and similar answers using preceding and so I have adapted it as //configuration_B//type_a/input_1[not(. = preceding::input_1)] which should give me unique type_a//input_1

However, preceding considers as well configuration_A's input_1. Leading to the following output (those should be unique input_1):

In case you don't see it, I am missing /configuration_B/slot_1/type_a/input_1 and /configuration_B/slot_2/type_a/input_1 should not be there.

I have been breaking my head for quite a long, and I start to think it isn't possible with XPATH 1.0, but I want to give a try with stackoverflow.

Note: In case some one finds it useful, I am using libxml2 example xpath program (slightly modified to provide a verbose output) to test expressions.

Thanks in advance.

Update

I accept Heiko Theißen answer as correct: it does obtain the same output than distinct-values(//configuration_B//type_a/input_1/text(). Although it is not what I was looking for. However, he show me the key for my solution. Therefore I want to share it with the community.

As I am trying to identify if there is duplication in order to trigger an assert, what I was missing is "type_a/input_1" after preceding and then apply Heiko Theißen's axi suggested test:

//configuration_B//type_a/input_1[(. = preceding::type_a/input_1[../../parent::configuration_B])]

What would be left is to count and triggering the assert if not 0.

The previous xpath expression doesn't show all instance a duplicated elements. That can be done adding the equivalent following test with an or operator:

//configuration_B//type_a/input_1[((. = following::type_a/input_1[../../parent::configuration_B]) or (. = preceding::type_a/input_1[../../parent::configuration_B])) ] 

Solution

  • Extend the preceding:: axis with a test whether the great-grandparent is configuration_B:

    count(/*/configuration_B/*
      /type_a[not(input_1 = preceding::input_1[../../parent::configuration_B])]
    )
    

    gives 3.

    I assumed that you want to disregard the slot_5/type_b completely.