Can someone help me find a way to load a public web page that requires JavaScript and blocks access from developers tools? I had an automated process that that worked as follows.
$TdyDate = $(get-date -f yyyyMMdd)
$wsjurl = "https://www.wsj.com/print-edition/$TdyDate/frontpage"
$wsjweb = Invoke-WebRequest -Uri $wsjurl -UseBasicParsing
This recently started generating "Please enable JS and disable any ad blocker" errors.
Based on this Stack Overflow post I tried the following which gets me past these errors but is only able to pull down an "Access Blocked" landing page instead of the full web page that renders in my browser.
Set-Alias msedge 'C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe'
msedge --headless --dump-dom --disable-gpu $wsjurl
If anyone could help me figure out a way around this, it would be greatly appreciated. The web page I'm targeting is publicly accessible.
The following code snippet could help:
$wsjDate = Get-Date
if ( 0 -eq $wsjDate.DayOfWeek.value__ ) {
$TdyDate = "{0:yyyyMMdd}" -f $wsjDate.AddDays( -1) # Sunday -> Saturday
} else {
$TdyDate = "{0:yyyyMMdd}" -f $wsjDate
}
$wsjurl = "https://www.wsj.com/print-edition/$TdyDate/frontpage"
$wsjweb = Invoke-WebRequest -Uri $wsjurl -Method Options -UseBasicParsing
Explanation:
$TdyDate
respects that the pages are not defined on Sundays,-Method Options
circumvents the Please enable JS and disable any ad blocker
error, so that$wsjweb.Content
contains full web page code: <!DOCTYPE html><html lang="en-US"> … … … </script></body></html>
Moreover, $wsjweb.Headers
could enlighten the problem (see properties X-XSS-Protection
and X-Content-Type-Options
):
$wsjweb.Headers
# truncated
Key Value --- ----- … X-XSS-Protection 1; mode=block X-Content-Type-Options nosniff …