I'm trying to use node-fetch to capture the contents of a page, and running into an unexpected error. I checked a similar question but it doesn't seem relevant. I am trying to fetch a HTTPS site using an HTTPS agent and agents, but I'm getting an unexpected error about HTTP. I wonder whether this may be due to redirects, but I can't see anything that would cause it. This only fails for this particular URL (works fine, for example, with https://www.robinhood.com) , and I'm trying to figure out why. Here is a minimal example. I'd note that this uses some certificates I have saved locally, but I'm not sure how necessary that is to reproduce.
//start SO example
var siteURL = "https://robinhood.com/l/privacy";
import path from 'path';
import sslrootcas from 'ssl-root-cas';
const rootCas = sslrootcas.create();
import {fileURLToPath} from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
rootCas.addFile(path.resolve(__dirname,'intermediate.pem'));
import http from 'node:http';
import https from 'node:https';
import UserAgent from 'user-agents';
const myhttpsAgent = new https.Agent({ca: rootCas});
// const requestcheck = fetch("https://www.google.com", {
const requestcheck = fetch(siteURL, {
method: "GET"
,headers: {"User-Agent": new UserAgent() }
,agent: myhttpsAgent
})
Here is the error I'm getting:
node:internal/errors:477
ErrorCaptureStackTrace(err);
^
TypeError: Protocol "http:" not supported. Expected "https:"
at new NodeError (node:internal/errors:387:5)
at new ClientRequest (node:_http_client:177:11)
at request (node:http:96:10)
at file:///home/app/node_modules/node-fetch/src/index.js:94:20
at new Promise (<anonymous>)
at fetch (file:///home/app/node_modules/node-fetch/src/index.js:49:9)
at ClientRequest.<anonymous> (file:///home/app/node_modules/node-fetch/src/index.js:236:15)
at ClientRequest.emit (node:events:525:35)
at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:674:27)
at HTTPParser.parserOnHeadersComplete (node:_http_common:128:17)
at TLSSocket.socketOnData (node:_http_client:521:22)
at TLSSocket.emit (node:events:525:35)
at addChunk (node:internal/streams/readable:315:12)
at readableAddChunk (node:internal/streams/readable:289:9)
at TLSSocket.Readable.push (node:internal/streams/readable:228:10)
at TLSWrap.onStreamRead (node:internal/stream_base_commons:190:23) {
code: 'ERR_INVALID_PROTOCOL'
}
I wonder whether this may be due to redirects, but I can't see anything that would cause it.
https://robinhood.com/l/privacy
redirects to
https://robinhood.com/us/en/support/articles/privacy-policy
which then redirects to
http://robinhood.com/us/en/support/articles/privacy-policy/
The latter URL is plain HTTP and thus the wrong protocol by a https-only user agent.