I want to retrieve the IUPAC name of a chemical on https://echa.europa.eu/brief-profile/-/briefprofile/100.000.685
I'd like the printed result to return Benzene in this example.
The code below pulls all elements with className.
Public Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.000.685", False
XMLReq.send
HTMLDoc.body.innerHTML = XMLReq.responseText
Set IUPACName = HTMLDoc.getElementsByClassName("col-sm-8")(0)
Debug.Print IUPACName.innerText
End Sub
This returns:
EC / List name:
IUPAC name: benzene
Substance names and other identifiers
Inspecting the page there doesn't seem to be an identifier to just return Benzene.
I can't test on other Office versions but 2019, at least, you can use an attribute selector as follows:
Set IUPACName = HTMLDoc.querySelector("[title*=IUPAC]")
Debug.Print IUPACName.innerText
I was expecting to use:
Debug.Print IUPACName.NextSibling.NodeValue
So, that latter one maybe what you need on your Office version.
The world of mshtml.dll is quite topsy-turvy as moment.