I am trying to figure out the best way to download files with a speed limit using NodeJS's builtin HTTPS module. (There is a fully working python implementation of what I am trying to do at the bottom.) I have written two different functions and both of them seem to get the job as expected.
download1
function, checks whether the speed limit has been exceeded in the current second and if so, pauses the download and creates a timeout that gets triggered at the end of that second that resumes the download.
download2
however, instead of creating a timeout, creates an interval that gets triggered once every 1000 miliseconds and resumes the download if it has been paused.
I was wondering which one of these two approaches is better or if I should go about it with a whole diffent approach.
Here are the functions:
export const download1 = (url: string, fileName: string, speedLimitInKb: number) => {
return new Promise((resolve, _reject) => {
https.get(url, res => {
const stream = fs.createWriteStream(fileName);
let totalSize = 0;
let size = 0;
let speedLimit = kbToBytes(speedLimitInKb);
let startDate: number;
let lastSecond = Date.now();
res.pipe(stream);
res.once("resume", () => {
startDate = Date.now();
console.log(`Started at ${new Date(startDate)}`)
})
res.on("data", (chunk) => {
size += chunk.length;
const now = Date.now();
if (now - lastSecond > 1000) {
lastSecond = Date.now();
totalSize += size;
size = 0;
} else if (size >= speedLimit) {
res.pause();
setTimeout(() => res.resume(), 1000 - (now - lastSecond));
}
});
res.on("resume", () => {
lastSecond = Date.now();
totalSize += size;
size = 0;
})
res.on("end", () => {
const elapsed = (Date.now() - startDate) / 1000;
totalSize += size
stream.end();
console.log(`${bytesToMb(totalSize)} mb of data downloaded in ${elapsed} seconds with a speed of ${bytesToKb(totalSize) / elapsed}`)
resolve(undefined);
});
res.on("error", console.log);
})
})
};
export const download2 = (url: string, fileName: string, speedLimitInKb: number) => {
return new Promise((resolve, _reject) => {
https.get(url, res => {
const stream = fs.createWriteStream(fileName);
let totalSize = 0;
let size = 0;
let speedLimit = kbToBytes(speedLimitInKb);
let startDate: number;
res.pipe(stream);
res.once("resume", () => {
startDate = Date.now();
console.log(`Started at ${new Date(startDate)}`)
})
const interval = setInterval(() => {
if (res.isPaused()) {
res.resume();
}
totalSize += size;
size = 0;
}, 1000);
res.on("data", (chunk) => {
size += chunk.length;
if (size >= speedLimit) {
res.pause();
}
});
res.on("end", () => {
clearInterval(interval);
const elapsed = (Date.now() - startDate) / 1000;
totalSize += size
stream.end();
console.log(`${bytesToMb(totalSize)} mb of data downloaded in ${elapsed} seconds with a speed of ${bytesToKb(totalSize) / elapsed}`)
resolve(undefined);
});
res.on("error", console.log);
});
})
}
Additional functions:
export const bytesToKb = (bytes: number) => bytes / 1024;
export const kbToMb = (kb: number) => kb / 1024;
export const kbToBytes = (kb: number) => kb * 1024;
export const mbToKb = (mb: number) => mb * 1024;
export const mbToBytes = (mb: number) => mb * 1024 * 1024;
export const bytesToMb = (bytes: number) => bytes / 1024 / 1024;
export const bytesToGb = (bytes: number) => bytes / 1024 / 1024 / 1024;
export const secondsToMs = (seconds: number) => seconds * 1000;
export const msToSeconds = (ms: number) => ms / 1000;
I have written a Python version of what I am trying to achieve and this works with any speed limit and file size. I would like to figure out how I can implement this in nodejs:
import requests
import time
def download(url, file_name, speed_limit_in_kb):
start = time.time()
size = 0
total_size = 0
with open(file_name, "wb") as f:
with requests.get(url, stream=True) as res:
last_second = time.time()
for part in res.iter_content(1024):
f.write(part)
total_size += len(part)
size += len(part)
offset = time.time() - last_second
if offset > 1:
size = 0
last_second = time.time()
elif size > (1024 * speed_limit_in_kb):
time.sleep(1 - offset)
size = 0
last_second = time.time()
elapsed = time.time() - start
print(f"{total_size / 1024 / 1024} mb of data downloaded in {elapsed} seconds with a speed of {total_size / 1024 / elapsed}")
This type of question is bound to get opinionated answers. Personally, I would use nodejs' built-in streams capabilities to do the throttling. Observations using this approach:
import fs from "fs";
import https from "https";
import stream from "stream";
import util from "util";
async function downloadWithBackpressure(url, filename, byteRate) {
let totalBytesDownloaded = 0;
const timeBeforeStart = Date.now();
await util.promisify(stream.pipeline)(
// Start the download stream
await new Promise(resolve => https.get(url, resolve)),
// Throttle data by combining setTimeout with a stream.Transform
new stream.Transform({
transform: async (chunk, encoding, next) => {
// Accumulate the total number of bytes received
totalBytesDownloaded += chunk.byteLength;
// Sleep to throttle towards desired transfer speed
const sleepMs = Math.max(0, (totalBytesDownloaded / byteRate * 1000) - Date.now() + timeBeforeStart);
sleepMs && await new Promise(resolve => setTimeout(resolve, sleepMs));
// Propagate the chunk to the stream writable
next(null, chunk);
}
}),
// Save the file to disk
fs.createWriteStream(filename)
);
}