To parse reddit.com
, I use
xidel -e '//div[@data-click-id="background"]/div[@data-adclicklocation="title"]/div/a[@data-click-id="body"]/@href|//div[@data-click-id="background"]/div[@data-adclicklocation="title"]/div/a[@data-click-id="body"]/div/h3/text()' "https://www.reddit.com/r/bash"
So the base XPath
is repeated 2 times, then I decided to use a xidel
variable:
xidel -se 'xp:=//div[@data-click-id="background"]/div[@data-adclicklocation="title"]/div/a[@data-click-id="body"]' \
-e '$xp/@href|$xp/div/h3/text()' 'https://www.reddit.com/r/bash'
but the output differs from previous command.
Bonus if someone can give a way to remove \n
concatenation but space concatenation, tried fn:string-join()
and fn:concat()
with no cigar.
Tried || " " ||
too, but not the expected url <description>
for each matches
The output doesn't differ if you would've added --extract-exclude=xp
. Please see my answer here and the quote from the readme in particular.
What you're probably seeing:
xp := set -x is your friend
Homework questions.
Need some help with bash to combine two lists
Sshto update
Cannot pipe the output to a file
Worked a lot on this script lately
These are the text-nodes from your XPath-expression. It does actually save the element-nodes, but --output-node-format=text
is the default afterall.
However, you really don't need these kind of internal variables for situations like this. I personally only use them for exporting to system variables. If you want to use variables, use a FLWOR expression:
$ xidel -s "https://www.reddit.com/r/bash" -e '
for $x in //div[@data-adclicklocation="title"]/div/a[@data-click-id="body"] return
($x/@href,$x/div/h3)
'
$ xidel -s "https://www.reddit.com/r/bash" -e '
let $a:=//div[@data-adclicklocation="title"]/div/a[@data-click-id="body"] return
$a/(@href,div/h3)
'
But the simplest query, without the need for variables, would probably be:
$ xidel -s "https://www.reddit.com/r/bash" -e '
//div[@data-adclicklocation="title"]/div/a[@data-click-id="body"]/(@href,div/h3)
'
String-joining is as simple as:
-e '.../join((@href,div/h3))'
-e '.../concat(@href," ",div/h3)'
-e '.../(@href||" "||div/h3)'
-e '.../x"{@href} {div/h3}"'
With ||
don't forget the parentheses, or there's no context-item for div/h3
.
The last one is Xidel's own extended-string-syntax.
Alternatively, you could parse the huge JSON, which surprisingly lists a lot more Reddit questions:
$ xidel -s "https://www.reddit.com/r/bash" -e '
parse-json(
extract(//script[@id="data"],"window.___r = (.+);",1)
)//posts/models/*[not(isSponsored)]/join((permalink,title))
'