I’m trying to build a Schematron to do very simple checks on paragraphs in HTML/XML documents for ”forbidden terms” and suggest a corresponding “preferred term”.
Example document snippet:
<p>A sentence containing the word Autobahn.</p>
<p>A sentence containing the word client.</p>
<p>A sentence containing the word newbie.</p>
My Schematron pattern looks like this:
<sch:pattern id="termChecker">
<sch:let name="termPairs" value="(
('Autobahn', 'highway'),
('client', 'customer'),
('newbie', 'Beginner')
)" />
<sch:rule context="//p">
<sch:let name="foundForbidden" value="(for $pair in $termPairs return if (contains(current(), $pair[1])) then $pair[1] else ())[1]" />
<sch:let name="foundRecommended" value="(for $pair in $termPairs return if ($pair[1] = $foundForbidden) then $pair[2] else ())[1]" />
<sch:report role="info" test="$foundForbidden">
Warning: This text contains the forbidden word “<sch:value-of select='$foundForbidden'/>”.
Alternative: “<sch:value-of select='$foundRecommended'/>”.
</sch:report>
</sch:rule>
</sch:pattern>
It ”almost” works, and I get three messages:
Info: This text contains the forbidden word “Autobahn”. Alternative: “”.
Info: This text contains the forbidden word “client”. Alternative: “”.
Info: This text contains the forbidden word “Newbie”. Alternative: “”.
So, it perfectly finds the ”forbidden terms” and returns them in the message (<sch:value-of select='$foundForbidden'/>
).
However, the “preferred term” (<sch:value-of select='$foundRecommended'/>
) always remains empty.
I have also tried several alternatives, like using indexed access:
<sch:value-of select="(for $i in 1 to count($termPairs) return if (contains(current(), $termPairs[$i][1])) then $termPairs[$i][2] else ())[1]"/>
or this:
<sch:value-of select="(for $pair in $termPairs return if (contains(current(), $pair[1])) then $pair[2] else ())[1]"/>
But I just can’t get it running.
I tried it in oXygen (23.1) with ISO Schematron configured to support XSLT2 (also tested XSLT3) and also tested Schematron 1.5 configured to support XSLT 2 (also tested XSLT 3.1).
I also tried it with ph-schematron with the ph-schematron-xslt module, but with the exact same result.
For some reason, the structure of $termPairs as a sequence of pairs and the way I’m accessing these pairs ($pair[1]
, $pair[2]
) is only working for $pair[1]
but never for $pair[2]
.
Does anyone have an idea?
If you want to use a "flat" sequence but consider it to be structured as pairs then the following might do:
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt3">
<pattern>
<let name="termPairs"
value="(
('Autobahn', 'highway'),
('client', 'customer'),
('newbie', 'Beginner')
)" />
<rule context="p">
<let name="foundForbidden" value="$termPairs[position() mod 2 = 1][contains(current(), .)]"/>
<report role="info"
test="exists($foundForbidden)">
Warning: This text "<value-of select="."/>" contains the forbidden words “<value-of select='$foundForbidden'/>”.
Alternatives: “<value-of select='for $pos in ($foundForbidden ! index-of($termPairs, .) + 1) return $termPairs[$pos]'/>”.
</report>
</rule>
</pattern>
</schema>
A map as an alternative:
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt3">
<ns prefix="map" uri="http://www.w3.org/2005/xpath-functions/map"/>
<pattern>
<let name="termPairs"
value="map {
'Autobahn' : 'highway',
'client' : 'customer',
'newbie' : 'Beginner'
}" />
<rule context="p">
<let name="foundForbidden" value="map:keys($termPairs)[contains(current(), .)]"/>
<report role="info"
test="exists($foundForbidden)">
Warning: This text contains the forbidden words “<value-of select='$foundForbidden'/>”.
Alternatives: “<value-of select='$termPairs?($foundForbidden)'/>”.
</report>
</rule>
</pattern>
</schema>
An array of arrays (of two items, each pair) would be
<pattern>
<let name="termPairs"
value="[['Autobahn', 'highway'], ['client', 'customer'], ['newbie', 'Beginner']]" />
<rule context="p">
<let name="foundForbidden" value="$termPairs?*[contains(current(), ?1)]"/>
<report role="info"
test="exists($foundForbidden)">
Warning: This text contains the forbidden words “<value-of select='$foundForbidden?1'/>”.
Alternatives: “<value-of select='$foundForbidden?2'/>”.
</report>
</rule>