xsltschematron

Accessing sequence of pairs in Schematron


I’m trying to build a Schematron to do very simple checks on paragraphs in HTML/XML documents for ”forbidden terms” and suggest a corresponding “preferred term”.

Example document snippet:

<p>A sentence containing the word Autobahn.</p>
<p>A sentence containing the word client.</p>
<p>A sentence containing the word newbie.</p>

My Schematron pattern looks like this:

<sch:pattern id="termChecker">
   <sch:let name="termPairs" value="(
      ('Autobahn', 'highway'),
      ('client', 'customer'),
      ('newbie', 'Beginner')
      )" />
   <sch:rule context="//p">
      <sch:let name="foundForbidden" value="(for $pair in $termPairs return if (contains(current(), $pair[1])) then $pair[1] else ())[1]" />
      <sch:let name="foundRecommended" value="(for $pair in $termPairs return if ($pair[1] = $foundForbidden) then $pair[2] else ())[1]" />
      <sch:report role="info" test="$foundForbidden">
         Warning: This text contains the forbidden word “<sch:value-of select='$foundForbidden'/>”.
         Alternative: “<sch:value-of select='$foundRecommended'/>”.
      </sch:report>
   </sch:rule>
</sch:pattern>

It ”almost” works, and I get three messages:

Info: This text contains the forbidden word “Autobahn”. Alternative: “”.
Info: This text contains the forbidden word “client”. Alternative: “”.
Info: This text contains the forbidden word “Newbie”. Alternative: “”.

So, it perfectly finds the ”forbidden terms” and returns them in the message (<sch:value-of select='$foundForbidden'/>).

However, the “preferred term” (<sch:value-of select='$foundRecommended'/>) always remains empty.

I have also tried several alternatives, like using indexed access:

<sch:value-of select="(for $i in 1 to count($termPairs) return if (contains(current(), $termPairs[$i][1])) then $termPairs[$i][2] else ())[1]"/>

or this:

<sch:value-of select="(for $pair in $termPairs return if (contains(current(), $pair[1])) then $pair[2] else ())[1]"/>

But I just can’t get it running.

I tried it in oXygen (23.1) with ISO Schematron configured to support XSLT2 (also tested XSLT3) and also tested Schematron 1.5 configured to support XSLT 2 (also tested XSLT 3.1). I also tried it with ph-schematron with the ph-schematron-xslt module, but with the exact same result. For some reason, the structure of $termPairs as a sequence of pairs and the way I’m accessing these pairs ($pair[1], $pair[2]) is only working for $pair[1] but never for $pair[2].

Does anyone have an idea?


Solution

  • If you want to use a "flat" sequence but consider it to be structured as pairs then the following might do:

    <schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt3">
        <pattern>
               <let name="termPairs" 
                    value="(
          ('Autobahn', 'highway'),
          ('client', 'customer'),
          ('newbie', 'Beginner')
          )" />
            <rule context="p">
                <let name="foundForbidden" value="$termPairs[position() mod 2 = 1][contains(current(), .)]"/>
                <report role="info"
                        test="exists($foundForbidden)">
                        Warning: This text "<value-of select="."/>" contains the forbidden words “<value-of select='$foundForbidden'/>”.
                        Alternatives: “<value-of select='for $pos in ($foundForbidden ! index-of($termPairs, .) + 1) return $termPairs[$pos]'/>”.
                </report>
            </rule>
        </pattern>
    </schema>
    

    A map as an alternative:

    <schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt3">
        <ns prefix="map" uri="http://www.w3.org/2005/xpath-functions/map"/>
        <pattern>
               <let name="termPairs" 
                    value="map {
                           'Autobahn' : 'highway',
                           'client' : 'customer',
                           'newbie' : 'Beginner'
                          }" />
            <rule context="p">
                <let name="foundForbidden" value="map:keys($termPairs)[contains(current(), .)]"/>
                <report role="info"
                        test="exists($foundForbidden)">
                        Warning: This text contains the forbidden words “<value-of select='$foundForbidden'/>”.
                        Alternatives: “<value-of select='$termPairs?($foundForbidden)'/>”.
                </report>
            </rule>
        </pattern>
    </schema>
    

    An array of arrays (of two items, each pair) would be

    <pattern>
           <let name="termPairs" 
                value="[['Autobahn', 'highway'], ['client', 'customer'], ['newbie', 'Beginner']]" />
        <rule context="p">
            <let name="foundForbidden" value="$termPairs?*[contains(current(), ?1)]"/>
            <report role="info"
                    test="exists($foundForbidden)">
                    Warning: This text contains the forbidden words “<value-of select='$foundForbidden?1'/>”.
                    Alternatives: “<value-of select='$foundForbidden?2'/>”.
            </report>
        </rule>