When I run a loop in different Web Workers, the loop shares the counter variable across threads despite that the variable should be thread-local. It should not do this, but I don't know how to fix it.
The offending loop is in the run
function, as follows in the Rust code being compiled to WASM:
#![no_main]
#![no_std]
use core::panic::PanicInfo;
use js::*;
mod js {
#[link(wasm_import_module = "imports")]
extern "C" {
pub fn abort(msgPtr: usize, filePtr: usize, line: u32, column: u32) -> !;
pub fn _log_num(number: usize);
}
}
#[no_mangle]
pub unsafe extern "C" fn run(worker_id: i32) {
let worker_index = worker_id as u32 - 1;
let chunk_start = 100 * worker_index;
let chunk_end = chunk_start + 100; //Total pixels may not divide evenly into number of worker cores.
for n in chunk_start as usize..chunk_end as usize {
_log_num(n);
}
}
#[panic_handler]
unsafe fn panic(_: &PanicInfo) -> ! { abort(0, 0, 0, 0) }
run
is passed the thread id, ranging from 1 to 3 inclusive, and prints out a hundred numbers - so all three threads should log the numbers 0 to 299, albeit in mixed order. I expect to see 1, 2, 3... from thread 1, 101, 102, 103... from thread 2, and 201, 202, 203 from thread 3. If I run the functions sequentially, that is indeed what I see. But if I run them in parallel, I get each thread helping each other thread, so they'll log something like 1, 4, 7 ... on the first thread, 2, 6, 9 on the second, and 3, 5, 8 on the third thread; up to 99, where all three threads will stop. Each thread is behaving like it is sharing chunk_start
, chunk_end
, and n
with the other threads.
It should not do this, because .cargo/config.toml
specifies --shared-memory
so the compiler should use the appropriate locking mechanisms when allocating memory.
[target.wasm32-unknown-unknown]
rustflags = [
"-C", "target-feature=+atomics,+mutable-globals,+bulk-memory",
"-C", "link-args=--no-entry --shared-memory --import-memory --max-memory=2130706432",
]
I know this is being picked up, because if I change the --shared-memory
flag to something else, rust-lld
complains it does not know what it is.
wasm-bindgen's parallel demo works fine, so I know it's possible to do this. I just can't spot what they've set to make theirs work.
Perhaps it is something in the way I load my module in the web worker?
const wasmSource = fetch("sim.wasm") //kick off the request now, we're going to need it
//See message sending code for why we use multiple messages.
let messageArgQueue = [];
addEventListener("message", ({data}) => {
messageArgQueue.push(data)
if (messageArgQueue.length === 4) {
self[messageArgQueue[0]].apply(0, messageArgQueue.slice(1))
}
})
self.start = async (workerID, worldBackingBuffer, world) => {
const wasm = await WebAssembly.instantiateStreaming(wasmSource, {
env: { memory: worldBackingBuffer },
imports: {
abort: (messagePtr, locationPtr, row, column) => {
throw new Error(`? (?:${row}:${column}, thread ${workerID})`)
},
_log_num: num => console.log(`thread ${workerID}: n is ${num}`),
},
})
//Initialise thread-local storage, so we get separate stacks for our local variables.
wasm.instance.exports.__wasm_init_tls(workerID-1)
//Loop, running the Rust logging loop when the "tick" advances.
let lastProcessedTick = 0
while (1) {
Atomics.wait(world.globalTick, 0, lastProcessedTick)
lastProcessedTick = world.globalTick[0]
wasm.instance.exports.run(workerID)
}
}
worldBackingBuffer
here is the shared memory for the WASM module, and it's created in the main thread.
//Let's count to 300. We'll have three web workers, each taking ⅓rd of the task. 0-100, 100-200, 200-300...
//First, allocate some shared memory. (The original task wants to share some values around.)
const memory = new WebAssembly.Memory({
initial: 23,
maximum: 23,
shared: true,
})
//Then, allocate the data views into the memory.
//This is shared memory which will get updated by the worker threads, off the main thread.
const world = {
globalTick: new Int32Array(memory.buffer, 1200000, 1), //Current global tick. Increment to tell the workers to count up in scratchA!
}
//Load a core and send the "start" event to it.
const startAWorkerCore = coreIndex => {
const worker = new Worker('worker/sim.mjs', {type:'module'})
;['start', coreIndex+1, memory, world].forEach(arg => worker.postMessage(arg)) //Marshal the "start" message across multiple postMessages because of the following bugs: 1. Must transfer memory BEFORE world. https://bugs.chromium.org/p/chromium/issues/detail?id=1421524 2. Must transfer world BEFORE memory. https://bugzilla.mozilla.org/show_bug.cgi?id=1821582
}
//Now, let's start some worker threads! They will work on different memory locations, so they don't conflict.
startAWorkerCore(0) //works fine
startAWorkerCore(1) //breaks counting - COMMENT THIS OUT TO FIX COUNTING
startAWorkerCore(2) //breaks counting - COMMENT THIS OUT TO FIX COUNTING
//Run the simulation thrice. Each thread should print a hundred numbers in order, thrice.
//For thread 1, it should print 0, then 1, then 2, etc. up to 99.
//Thread 2 should run from 100 to 199, and thread 3 200 to 299.
//But when they're run simultaneously, all three threads seem to use the same counter.
setTimeout(tick, 500)
setTimeout(tick, 700)
setTimeout(tick, 900)
function tick() {
Atomics.add(world.globalTick, 0, 1)
Atomics.notify(world.globalTick, 0)
}
But this looks pretty normal. Why am I seeing memory corruption in my Rust for-loop?
there is some magic being done in wasm-bindgen - the start is replaced/injected with code fixing memory. Although there seem to be issues with it -