htmlbatch-filejscript

How to get value out of html page with javascript or windows batch


need to extraxt a value(dip) out of a html

</span><span class="pron dpron">/<span class="ipa dipa lpr-2 lpl-1">buːm</span>/</span></span></div><div class="pos-body">

my code leads into: microsoft jscript runtime error object doesn't support this property or method

@if (@CodeSection == @Batch) @then

@echo off
setlocal

curl https://dictionary.cambridge.org/de/worterbuch/englisch/boom >phoneme.html

set "htmlfile=phoneme.html"

rem // invoke JScript hybrid code and capture its output
for /f %%I in ('cscript /nologo /e:JScript "%~f0" "%htmlfile%"') do set "converted=%%I"

echo %converted%

rem // end main runtime
PAUSE
goto :EOF

@end // end batch / begin JScript chimera

var fso = WSH.CreateObject('scripting.filesystemobject'),
    DOM = WSH.CreateObject('htmlfile'),
    htmlfile = fso.OpenTextFile(WSH.Arguments(0), 1),
    html = htmlfile.ReadAll();

DOM.write(html);
htmlfile.Close();

var scrape = DOM.getElementsByTagName('pron dpron').getElementsByClassName('ipa dipa lpr-2 lpl-1')[0].innerText;
WSH.Echo(scrape.match(/^.*=\s+(\S+).*$/)[0]);

copy&pasted this and slightly edited.

need to get "bu:m" into a value or echoed.

Many thanks.


Solution

  • Thank you for all the tips. With @Reino and Echo-ing unicode character I was able to get what I need.

    @ECHO OFF
    chcp 65001
    
    xidel -s "https://dictionary.cambridge.org/de/worterbuch/englisch/boom" -e "(//span[@class='pron dpron']/span[@class='ipa dipa lpr-2 lpl-1'])[1]"
    
    PAUSE
    GOTO :EOF