htmlxpathcmdxidel

Retrieve value from object in Javascript in XPATH


I need to extract information from HTML files. For most of them, I just need to match a particular DOM element's content or attribute, so I use XPATH expressions like //a[@class="targeturl"]/@href and the command line tool xidel.

In a different batch of files the information I want is in a script, not so readily available:

<html>
<head><!-- ... --></head>
<body>
    ...
    <script>
        ...
        var o = {
            "numeric": 1234,
            "target": "TARGET",
            "urls": "http://example.com",
            // Commented pair "strings": "...",
            "arrays": [
               {
                  "more": true
               }
               ,
               { 
                  "itgoeson": true
               }
            ]
        };
    </script>
    ...
</body>
</html>

Note that the object containing the value I want to get is not valid JSON. However, it seems to respect one key-value pair per line.

What can I pass to xidel --xpath "???" to get this TARGET?

I've tried different thing with XPATH functions but I can't get to a solution without piping to other commands (match tells me yes/no, replace works line by line..., etc).


Solution

  • Try to implement below XPath:

    substring-before(substring-after(//script, '"target": '), ",")