Using Xidel, I need to extract all the image sizes in an @srcset attribute that contains a common pattern:
"(\d+)w"
./xidel "url_with_images" -e '?'
See this image example
<img ... @srcset="https://www.jewishpress.com/wp-content/uploads/Billionaire-Arnon-Milchan-and-PM-Benjamin-Netanyahu-696x457.jpg 696w, https://www.jewishpress.com/wp-content/uploads/Billionaire-Arnon-Milchan-and-PM-Benjamin-Netanyahu-220x144.jpg 220w, https://www.jewishpress.com/wp-content/uploads/Billionaire-Arnon-Milchan-and-PM-Benjamin-Netanyahu-300x197.jpg 300w, https://www.jewishpress.com/wp-content/uploads/Billionaire-Arnon-Milchan-and-PM-Benjamin-Netanyahu-768x504.jpg 768w, https://www.jewishpress.com/wp-content/uploads/Billionaire-Arnon-Milchan-and-PM-Benjamin-Netanyahu-475x312.jpg 475w, https://www.jewishpress.com/wp-content/uploads/Billionaire-Arnon-Milchan-and-PM-Benjamin-Netanyahu-741x486.jpg 741w, https://www.jewishpress.com/wp-content/uploads/Billionaire-Arnon-Milchan-and-PM-Benjamin-Netanyahu-640x420.jpg 640w, https://www.jewishpress.com/wp-content/uploads/Billionaire-Arnon-Milchan-and-PM-Benjamin-Netanyahu.jpg 800w" />
Xidel Output expected:
696w
220w
300w
768w
475w
741w
650w
800w
You could use Xidel's own extract() for that. To have it return all occurrences, don't forget to add the flag *. Alternatively tokenize() and optionally substring-after() work too:
$ xidel -s "<input>" -e 'extract(//img/@srcset,"\d+w",0,"*")'
$ xidel -s "<input>" -e 'extract(//img/@srcset,"(\d+w)",1,"*")'
$ xidel -s "<input>" -e 'tokenize(//img/@srcset,", ") ! tokenize(.)[2]'
$ xidel -s "<input>" -e 'tokenize(//img/@srcset,", ") ! substring-after(.,"jpg ")'
696w
220w
300w
768w
475w
741w
640w
800w