htmldomfactor-lang

How do I get a div's text?


html.parser.analyzer is how to work with HTML, it seems:

( sc ) "google.com/search?q=vim" scrape-html

--- Data stack:
T{ response f "1.1" 200 "OK" H{ ~array~ ~array~ ~array~ ~array~...
V{ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~...
( sc ) nip "resultStats" find-by-id

--- Data stack:
258
T{ tag f "div" H{ ~array~ ~array~ } f f }
( sc )  dup .
T{ tag
    { name "div" }
    { attributes H{ { "class" "sd" } { "id" "resultStats" } } }
}

--- Data stack:
258
T{ tag f "div" H{ ~array~ ~array~ } f f }

Now, how do I get at that object's text? It should be something like About 53,000,000 results. html.parser.analyzer doesn't seem to expose the text...?

Edit: Oooh:

<div id="resultStats">About 310,000,000 results<nobr> (0.43 seconds)&nbsp;</nobr></div>

It's not a p, it's a div. So the question is really, how do I get at a div's text?

--- Data stack:
T{ tag f "div" H{ ~array~ ~array~ } f f }
( sc ) dup text>>

--- Data stack:
T{ tag f "div" H{ ~array~ ~array~ } f f }
f

Not so simple. :(


Solution

  • If you use find-by-id-between, that will give you everything inside as well as the tag itself (or so it looks like :).

    The text will be inside the result, so:

    ( sc ) "google.com/search?q=vim" scrape-html
    
    --- Data stack:
    T{ response f "1.1" 200 "OK" H{ ~array~ ~array~ ~array~ ~array~...
    V{ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~ ~tag~...
    ( sc ) nip "resultStats" find-by-id-between
    
    --- Data stack:
    T{ ~tag~ ~tag~ ~tag~ }
    ( sc )  dup .
    V{
        T{ tag
            { name "div" }
            { attributes
                H{ { "class" "sd" } { "id" "resultStats" } }
            }
        }
        T{ tag
        { name text }
            { text "Cerca de 41.500.000 resultados" }
        }
        T{ tag { name "div" } { attributes H{ } } { closing? t } }
    }
    
    --- Data stack:
    T{ ~tag~ ~tag~ ~tag~ }
    ( sc ) second text>>
    
    --- Data stack:
    "Cerca de 41.500.000 resultados"
    

    It's in spanish because nosy google found out who I am!