Back on May 19th 2021, I wrote this Q&A regarding recent (Apr-May-21) suspected changes to an interface in relation to mshtml.dll
and late bound referencing. This is a part 2, if you will.
Previously, in questions such as this and this, I have remarked upon the lack of support for various CSS selectors with mshtml.dll
, in particular regarding pseudo-classes. In the aforementioned questions, I highlighted that nth-child()
and nth-of-type()
were not implemented with respect to MSHTML
.
Typically, as demonstrated here, not supported selector syntax can result in:
Run-time error '-2140143604 (8070000c)': Could not complete the operation due to error 8070000c.
I expect some things to break as various versions/platforms are no longer supported in relation to Internet Explorer (IE)
(which MSHTML
is related to - see my this. What I did not expect
to find was a recent improvement in supported CSS selectors. Take the following example:
Option Explicit
''Required references:
'' Microsoft HTML Object Library
Public Sub CssTest()
Const URL = "https://books.toscrape.com/"
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
html.body.innerHTML = .responseText
End With
Debug.Print html.querySelector("meta:nth-of-type(2)").outerHTML
End Sub
Prior to Apr-May'21, this would have errored out due to the use of non-implemented syntax.
Now, on my set-up, where I saw an update to mshtml.dll
during early May (latest), I get the same result as had I run this via an automated Internet Explorer instance, where it was already supported:
<meta name="created" content="24th Jun 2016 09:29">
So, what are the currently supported CSS selectors available to VBA?
I have covered the 'why do we care?' in the previous Q&A so won't repeat here. I will however, re-state my set-up:
My set-up:
OS Name Microsoft Windows 10 Pro
Version 10.0.19042 Build 19042
System Type x64-based PC
Microsoft® Excel® 2019 MSO (16.0.13929.20206) 32-bit (Microsoft Office Professional Plus)
Version 2104 Build 13929.20373
mshtml.dll file 11.00.19041.985
ieframe.dll file 11.0.19041.964
Feedback:
As with the prior Q&A, any feedback on set-ups which do/do not see these changes I would appreciate. I will add feedback to this for others to be able to reference.
tl;dr;
There is much greater support for css selectors and for Element.querySelector
(allowing for greater flexibility in chaining querySelector(All)
calls. This enormously enhances the expressivity of the MSHTML
class, in terms of CSS selectors, and brings it on par with Selenium Basic
.
Motivation:
I have been wanting to write a list of supported selectors for some time, due to the lack of documentation on this in relation to VBA, and the trial and error nature of learning what does and doesn't work. This latest change has spurred me to do so, and include those libraries which currently support use of CSS selectors within them.
CAVEATS:
Before and After:
Traditionally, the expressivity of CSS selectors within VBA was as follows, with respect to the libraries supporting them:
Selenium implementing, by far, the most CSS selectors.
Current state:
The current state of implemented selectors I believe to be as follows (sorry for image quality, even when you click to enlarge table - please see JSFiddle for clearest table view):
I include this as a simplified HTML insert as well, so you can click on hyperlinks. Please click the Run code snippet below the code insert, then the Full page link. Apologies, the table is large and I haven't even covered all conceivable selectors - only the main ones I consider likely to be frequently of use. Inserting a fancy table threw me over the body character limit so here we are. For a fancy table please see this JSFiddle - the newly supported are shaded.
<!DOCTYPE html>
<html>
<head>
<title>VBA: Valid CSS Selectors 2021-05-30</title>
</head>
<body>
<h1>VBA: Valid CSS Selectors 2021-05-30</h1>
<table>
<tr>
<td colspan="2">
<a href="https://drafts.csswg.org/selectors-3/">Selectors Level 3 Specification</a>
</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Pattern</td>
<td>Represents</td>
<td>Description</td>
<td>Level</td>
<td>Microsoft HTML Object Library (MSHTML)</td>
<td>Microsoft Internet Explorer Controls (SHDocVw)</td>
<td>Selenium Type Library (Selenium)</td>
<td>Remarks</td>
</tr>
<tr>
<td>*</td>
<td>any element</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#universal-selector">Universal selector</a>
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E</td>
<td>an element of type E</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#type-selectors">Type selector</a>
</td>
<td>1</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo]</td>
<td>an E element with a "foo" attribute</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo="bar"]</td>
<td>an E element whose "foo" attribute value is exactly equal to "bar"</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo~="bar"]</td>
<td>an E element whose "foo" attribute value is a list of whitespace-separated values, one of which is exactly equal to "bar"</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo^="bar"]</td>
<td>an E element whose "foo" attribute value begins exactly with the string "bar"</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo$="bar"]</td>
<td>an E element whose "foo" attribute value ends exactly with the string "bar"</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo*="bar"]</td>
<td>an E element whose "foo" attribute value contains the substring "bar"</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo|="en"]</td>
<td>an E element whose "foo" attribute has a hyphen-separated list of values beginning (from the left) with "en"</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>2</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td> </td>
</tr>
<tr>
<td>E[attr operator value i]</td>
<td>value compared case-insensitively (ASCII range).</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>4</td>
<td>x</td>
<td>x</td>
<td>?</td>
<td>
<a href="https://www.w3.org/TR/selectors-4/#attribute-case">i identifier</a>
</td>
</tr>
<tr>
<td>E[attr operator value s]</td>
<td>value compared case-sensitively (ASCII range).</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#attribute-selectors">Attribute selectors</a>
</td>
<td>4</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>
<a href="https://www.w3.org/TR/selectors-4/#attribute-case">s identifier</a>
</td>
</tr>
<tr>
<td>E:root</td>
<td>an E element, root of the document</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td>HTML node only</td>
</tr>
<tr>
<td>E:nth-child(n)</td>
<td>an E element, the n-th child of its parent</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td>nth-child(odd) and (even) as well as nth-child(range) also supported</td>
</tr>
<tr>
<td>E:nth-last-child(n)</td>
<td>an E element, the n-th child of its parent, counting from the last one</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:nth-of-type(n)</td>
<td>an E element, the n-th sibling of its type</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:nth-last-of-type(n)</td>
<td>an E element, the n-th sibling of its type, counting from the last one</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:first-child</td>
<td>an E element, first child of its parent</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:last-child</td>
<td>an E element, last child of its parent</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:first-of-type</td>
<td>an E element, first sibling of its type</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:last-of-type</td>
<td>an E element, last sibling of its type</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:only-child</td>
<td>an E element, only child of its parent</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:only-of-type</td>
<td>an E element, only sibling of its type</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:empty</td>
<td>an E element that has no children (including text nodes)</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#structural-pseudos">Structural pseudo-classes</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:link</td>
<td rowspan="2">an E element being the source anchor of a hyperlink of which the target is not yet visited (:link) or already visited (:visited)</td>
<td rowspan="2">
<a href="https://drafts.csswg.org/selectors-3/#link">The link pseudo-classes</a>
</td>
<td>1</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:visited</td>
<td>1</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:not(s)</td>
<td>an E element that does not match simple selector s</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#negation">Negation pseudo-class</a>
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E F</td>
<td>an F element descendant of an E element</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#descendant-combinators">Descendant combinator</a>
</td>
<td>1</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E > F</td>
<td>an F element child of an E element</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#child-combinators">Child combinator</a>
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E + F</td>
<td>an F element immediately preceded by an E element</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#adjacent-sibling-combinators">Next-sibling combinator</a>
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E ~ F</td>
<td>an F element preceded by an E element</td>
<td>
<a href="https://drafts.csswg.org/selectors-3/#general-sibling-combinators">Subsequent-sibling combinator</a>
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>foo, bar</td>
<td>foo, bar will match both <foo> and <bar> elements.</td>
<td>
<a href="https://developer.mozilla.org/en-US/docs/Web/CSS/Selector_list">Selector list</a>
</td>
<td>1</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>element.querySelector</td>
<td>Expanded element.querySelector</td>
<td>
<a href="https://developer.mozilla.org/en-US/docs/Web/API/Element/querySelector">Element.querySelector</a>
</td>
<td>API</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td>Can now chain querySelector(All) calls on wider base node range</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Lib info:</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>Microsoft HTML Object Library (MSHTML)</td>
<td>MS Internet Explorer Controls (SHDocVw)</td>
<td>Selenium Type Library (Chromedriver)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Lib</td>
<td>mshtml.dll</td>
<td>ieframe.dll</td>
<td>selenium.dll</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>File Version</td>
<td>11.00.19041.985</td>
<td>11.0.19041.964</td>
<td>2.0.9.0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Date</td>
<td>2021-05-12</td>
<td>2021-05-12</td>
<td>2016-03-02</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
</body>
</html>
12 newly supported pseudo-classes and an expanded Element.querySelector:
If you run the above snippet, and view full page, you will see there are now, at least, 12 newly supported pseudo-classes supported, as well as mention of expanded Element.querySelector. Bam, kapow, ker-sploosh, shut the proverbial front door ... welcome to VBA CSS Canaan, Scraper's Shangri-la, Nerd Nirvana!
I think there may also have been interesting updates to ieframe.dll
; the focus here is on recent mshtml.dll
changes. You may wish to review the IE support under the Lifecyle announcements here and here, or search for Lifecycle FAQ - Internet Explorer and Microsoft Edge
.
As the benefit of the expanded Element.querySelector()
was not covered in the last Q&A, I will briefly mention it here. By expanded, I mean an increased number of elements which you can call querySelector
on, such that you can chain .querySelector()
i.e .querySelector(..).querySelector(..)
and .querySelector(..).querySelectorAll(..)
.
Previously, this was largely not possible. As exemplified by this question. Typically, the workaround was to chain traditional methods onto the returned node e.g.
html.querySelector("body").getElementsByTagName("li")
; this led to unsightly chaining and hard to follow, as well as limited, paths to target elements. Better, IMHO, was the idea of a surrogate MSHTML.HTMLDocument
variable, which would carry the innerHTML
of the current node returned by querySelector
, and thus allow you to call querySelector(All)
again; and thereby gain access to much faster matching, clearer syntax and greater versatility. Numerous examples of that approach here.
End Notes:
This is a document under revision. All feedback on improvements welcomed.
Thanks:
Finally, a big thanks to @SIM for running a test script of mine to examine this on a different set-up.