xpathxidelxpathquery

Xidel: How to choose only 1 from many same value/class and remove unneeded elements from the result?


xidel -se '//strong[@class="n-heading"][1]/text()[1]' 'https://www.anekalogam.co.id/id'

will print out 3 same outputs

15 June 2020 
                     
15 June 2020 
                     
15 June 2020  

so, what should I do in order to choose only 1 of them?

edit:

inside the strong class, the value looks like this:

 15 June 2020 
                     

How to print only the "15 June 2020"?


Solution

  • Let me illustrate why this happens with the following example.

    'test.htm':

    <html>
      <body>
        <div>
          <span>test1</span>
          <span>test2</span>
          <span>test3</span>
        </div>
        <div>
          <span>test4</span>
        </div>
        <div>
          <span>test5</span>
        </div>
        <div>
          <span>test6</span>
        </div>
      </body>
    </html>
    
    $ xidel -s "test.htm" -e '//div[1]/span[1]'
    test1
    
    $ xidel -s "test.htm" -e '//span[1]'
    test1
    test4
    test5
    test6
    
    $ xidel -s "test.htm" -e '(//span)[1]'
    test1
    

    In other words, you have to put the "strong"-node between parentheses:

    $ xidel -s "https://www.anekalogam.co.id/id" \
      -e '(//strong[@class="n-heading"])[1]/text()[1]'
    

    This isn't needed if you grab the parent-node instead:

    $ xidel -s "https://www.anekalogam.co.id/id" \
      -e '//p[@class="n-smaller ngc-intro"]/strong/text()[1]'
    

    [Bonus]

    You've probably noticed already that the desired text-node spans 2 lines and ends with a &nbsp; (a "No-Break Space"). To have Xidel return just "15 June 2020":

    $ xidel -s "https://www.anekalogam.co.id/id" -e '
      //p[@class="n-smaller ngc-intro"]/strong/normalize-space(
        substring-before(text(),x:cps(160))
      )
    '