atata

XPath with text() to get specific text node


Console screenshotenter image description here

<td style="width: 40.42%;" id="ember193" class="lt-cell align-left ember-view">
    <div class="inline-block ember-tooltip-target">
      <i class="fa fa-fw fa-lg valid"></i>
        <div id="ember195" class="ember-tooltip-base ember-view">  
      <div><!----></div>
      </div>
    </div>
    <a href="#" data-ember-action="" data-ember-action-196="196">UNIVERSITY OF VERMONT AND STATE AGRICULTURAL COLLEGE</a>
    (03-0179440)<!----><br>
    <div class="fa fa-fw fa-lg"></div>
    (UNIVERSITY OF VERMONT)
</td>

In the browser console, I execute this:

$x("//tbody/tr[2]/td[2]/text()[7]")

and I get the actual text node I am looking for (amongst other text nodes in td2) - is there a way to specify this in a Text<_> property attribute to get just this text node?


Solution

  • Currently, there is no attribute to get a value of child text node by index, only first or last. Anyway, I would try not to rely on child node index as it can vary. I would try to extract the number part from the given text.

    You can do it with a help of 2 properties:

    [FindByXPath("td")] // TODO: Change it to your <td> locator.
    [GetsContentFromSource(ContentSource.ChildTextNodesTrimmed)]
    private Text<_> OrganizationNameInfo { get; set; }
    
    public DataProvider<string, _> OrganizationNumber => GetOrCreateDataProvider(
        "Organization Number",
        () =>
        {
            string organizationNameInfo = OrganizationNameInfo.Value;
            return organizationNameInfo.Substring(1, organizationNameInfo.IndexOf(')') - 1);
        });
    

    OrganizationNameInfo property because of ContentSource.ChildTextNodesTrimmed returns "(03-0179440)(UNIVERSITY OF VERMONT)" value. So in OrganizationNumber property you can easily extract as a sub-string the part you need.

    Another option can be to use Regex to extract the number part.