For this HTML / XML:
<div class="contentBlock">
<h2> </h2>
<h1></h1>
<h1>DBS055 - single module packages</h1>
</div>
I want to extract with XPath only DBS055
, not the entire text.
//h1[normalize-space()]/replace(normalize-space(),'^([\w\-]+).*', '$1')
will return all of the first words of the string values of those h1
elements that have a non-space character in their string value.
substring-before(
concat(
normalize-space(
translate(//h1[normalize-space()][1], ',;/.', ' ')), ' '), ' ')
approximates the more robust XPath 2.0 solution. Expand ',;/.'
as necessary for various characters you consider to define word boundaries.
Explanation:
h1
that has a non-whitespace-only string value.