I'm encoding Varint with the following two pieces of code, NodeJS takes about 900ms while Rust takes about 2700ms. Why the performance gap is so big?
It looks like allocating memory is more time-consuming, does NodeJS have any additional optimizations for allocating memory?
NodeJS version: v20.10.0
Rust version: 1.75.0
JavaScript code:
// node index.js
const { performance } = require('node:perf_hooks')
function encode_varint(num) {
let buf_size = 0;
let cmp_number = num;
while (cmp_number) {
cmp_number >>>= 7
buf_size += 1;
}
const buf = new Array(buf_size)
let temp_num = num;
let index = 0;
while (temp_num > 0) {
if ((temp_num >>> 7) !== 0) {
buf[index++] = 0x80 | (temp_num & 0x7f)
temp_num >>>= 7;
} else {
buf[index++] = temp_num & 0x7f
break
}
}
return buf;
}
const N = 100_000_000;
const start = performance.now();
for (let i = 0; i < N; i++) {
const num = 999_999_999_999_999;
const buf = encode_varint(num);
// console.log(buf)
}
const end = performance.now();
const elapsed = end - start;
console.log(elapsed);
Rust code:
// cargo r --release
use std::time;
fn encode_varint(num: usize) -> Vec<u8> {
let buf_size = {
let mut size = 1;
let mut cmp_number = num;
while cmp_number > 0 {
cmp_number >>= 7;
size += 1;
}
size
};
let mut buf: Vec<u8> = Vec::with_capacity(buf_size);
let mut temp_num = num;
while temp_num > 0 {
if (temp_num >> 7) != 0 {
buf.push((0x80 | (temp_num & 0x7f)) as u8);
temp_num >>= 7;
} else {
buf.push((temp_num & 0x7f) as u8);
break;
}
}
buf
}
const N: u32 = 100_000_000;
fn main() {
let start = time::Instant::now();
let num = 999_999_999_999_999;
for _ in 0..N {
let _buf = encode_varint(num);
// dbg!(_buf);
}
let end = time::Instant::now();
let elapsed = end - start;
println!("{}", elapsed.as_millis());
}
The computation is not the same for two programs.
With 999_999_999_999_999
we obtain [255, 255, 153, 166, 234, 175, 227, 1]
in Rust while we obtain [ 255, 255, 153, 166, 10 ]
is JS.
This is due to the fact that in JS we cannot choose once for all the exact type of a numerical expression; it changes according to the operations.
For example, the initial value is quite big but stands in the mantissa of a double precision floating point (number
type) with no loss.
The operation >>> 7
, as all bitwise operations in JS, forces the conversion into a 32-bit integer (this is described in the documentation, and it was the original idea behind asm.js, which led to Wasm), then a truncation happens here in JS, while it does not in Rust.
For example, if we print the result of temp_num >>> 7
at the beginning of each iteration in JS we see
NEXT 21597439
NEXT 168729
NEXT 1318
NEXT 10
NEXT 0
while in Rust we see
NEXT 7812499999999
NEXT 61035156249
NEXT 476837158
NEXT 3725290
NEXT 29103
NEXT 227
NEXT 1
NEXT 0
Starting from here, we cannot compare the performances of the two versions, since they are not equivalent.
On another point, and as stated in the comments, the allocation strategy is probably better in JS due to the artificially repetitive shape of this micro-benchmark (but would it still be the case on a real problem?).
cargo flamegraph
confirms that, as most of the time is spent in allocating/freeing.
In order to improve the performance, we could make the function clear and mutate always the same vector (passed as &mut
, created once for all in main()
) instead of building a new one at each iteration (this would also remove the need for the computation of buf_size
).