I have an app which has to extract color info from a video and it does this by analyzing each frame. First I extract the frames and then load an array of their locations in memory. As you might imagine, for even a small video it can be in the thousands.
The function I use to extract each frames color info is a promise so I opted to batch an array of promises with Promise.all
With each files absolute path, I read the file with fs
and then pass it along to be processed. I've done this with many stand alone images and know the process only takes about a second but suddenly it was taking almost 20min to process 1 image. I finally figured out that using promisify
on fs.readFile
was what caused the bottle neck. What I don't understand is why?
In the first one fs.readFile
is transformed inside of the promise that's returned while in the second one fs.readFile
is just used as it normally would be and I wait for resolve to be called. I don't mind using the non-promise one, I'm just curious why this would cause such a slow down?
The second I stopped using promisify
the app sped back up to 1 frame / second
The slow code:
async analyzeVideo(){
await this._saveVideo();
await this._extractFrames();
await this._removeVideo();
const colorPromises = this.frameExtractor.frames.map(file => {
return new Promise(resolve => {
//transform image into data
const readFile = promisify(fs.readFile);
readFile(file)
.then(data => {
const analyzer = new ColorAnalyzer(data);
analyzer.init()
.then(colors => {
resolve(colors)
})
})
.catch((e)=> console.log(e));
})
});
const colors = await runAllQueries(colorPromises);
await this._removeFrames();
this.colors = colors;
async function runAllQueries(promises) {
const batches = _.chunk(promises, 50);
const results = [];
while (batches.length) {
const batch = batches.shift();
const result = await Promise.all(batch)
.catch(e=>console.log(e));
results.push(result)
}
return _.flatten(results);
}
}
The fast code:
async analyzeVideo(){
await this._saveVideo();
await this._extractFrames();
await this._removeVideo();
const colorPromises = this.frameExtractor.frames.map(file => {
return new Promise(resolve => {
//transform image into data
fs.readFile(file, (err, data) => {
const analyzer = new ColorAnalyzer(data);
analyzer.init()
.then(colors => {
resolve(colors)
})
});
})
});
const colors = await runAllQueries(colorPromises);
await this._removeFrames();
this.colors = colors;
async function runAllQueries(promises) {
const batches = _.chunk(promises, 50);
const results = [];
while (batches.length) {
const batch = batches.shift();
const result = await Promise.all(batch)
.catch(e=>console.log(e));
results.push(result)
}
return _.flatten(results);
}
}
You don't need to promisify
in each loop iteration, just do it once at the top of the module.
Most likely the issue is caused by Promises that are never settled. You are not handling the error correctly, so Promise.all
may never finish if an error is thrown.
Instead of logging the error in .catch
, you'll have to reject
too, or resolve
at least if you don't care about the errors. Also analyzer.init()
errors are not being catched (if that function can reject)
const readFile = promisify(fs.readFile);
// ...
const colorPromises = this.frameExtractor.frames.map(file => {
return new Promise((resolve, reject) => {
//transform image into data
// const readFile = promisify(fs.readFile);
readFile(file)
.then(data => {
const analyzer = new ColorAnalyzer(data);
return analyzer.init()
})
.then(resolve) // colors
.catch((e)=> {
reject(e);
console.log(e)
});
})
})
Aside from that runAllQueries
is not doing what you think it's doing. You already executed all the promises.
I recommend you use p-limit instead
const pLimit = require('p-limit');
const limit = pLimit(50);
/* ... */
const colorPromises = this.frameExtractor.frames.map(file => {
return limit(() => {
return readFile(file)
.then(data => {
const analyzer = new ColorAnalyzer(data);
return analyzer.init()
})
.then(resolve) // colors
})
})
const colors = await Promise.all(colorPromises);
Furthermore, if you're executing 50 reads at a time, you should increase the value of UV_THREADPOOL_SIZE
which defaults to 4.
At your entry point, before any require:
process.env.UV_THREADPOOL_SIZE = 64 // up to 128
Or call the script as: UV_THREADPOOL_SIZE=64 node index.js