xidel

Xidel get json from HTML tag attribute


I am trying to extract an image URL from a div, where the link to the file is stored as a json object in data-settings attribute:

<div class="c-offerBox_galleryItem">
    <div data-component="magnifier" data-component-on="@load" data-settings="{
                image: '/media/cache/gallery/rc/p2vgiqwd/images/42/42542/KRHE7Z29X19.jpg',
                ratio: 1.5,
                outside: 0
            }"></div>
</div>

Currently I can access data-settings with:

xidel "https://example.com" -e "//div[@class='c-offerBox_galleryItem']/div/@data-setting

The output is the json object. How can I access the image object?

I thought something like:

xidel "https://example.com" -e "//div[@class='c-offerBox_galleryItem']/div/@data-setting/$json/image

would work, but not.


Solution

  • No, you can only use the global default variable $json when "https://example.com" itself returns a JSON-document ("Content-Type: application/json" in $headers). To parse a string as JSON use the function parse-json(). And in this case you'll need the option "liberal" as well:

    C:\>xidel -s "https://example.com" -e "parse-json(//div[@class='c-offerBox_galleryItem']/div/@data-settings,{'liberal':true()})"
    C:\>xidel -s "https://example.com" -e ^"^
      parse-json(^
        //div[@class='c-offerBox_galleryItem']/div/@data-settings,^
        {'liberal':true()}^
      )^
    "
    

    If you're a Linux user, then you're using the wrong quotation:

    $ xidel -s "https://example.com" -e 'parse-json(//div[@class="c-offerBox_galleryItem"]/div/@data-settings,{"liberal":true()})'
    $ xidel -s "https://example.com" -e '
      parse-json(
        //div[@class="c-offerBox_galleryItem"]/div/@data-settings,
        {"liberal":true()}
      )
    '
    

    If you're still using v0.9.8 (in which case you really need to update to v0.9.9), then use the function json().