I see frequent mention that Swift arrays, due to copy-on-write, are not threadsafe, but have found this works, as it updates different and unique elements in an array from different threads simultaneously:
//pixels is [(UInt8, UInt8, UInt8)]
let q = DispatchQueue(label: "processImage", attributes: .concurrent)
q.sync {
DispatchQueue.concurrentPerform(iterations: n) { i in
... do work ...
pixels[i] = ... store result ...
}
}
(simplified version of this function)
If threads never write to the same indexes, does copy-on-write still interfere with this? I'm wondering if this is safe since the array itself is not changing length or memory usage. But it does seem that copy-on-write would prevent the array from staying consistent in such a scenario.
If this is not safe, and since doing parallel computations on images (pixel arrays) or other data stores is a common requirement in parallel computation, what is the best idiom for this? Is it better that each thread have its own array and then they are combined after all threads complete? It seems like additional overhead and the memory juggling from creating and destroying all these arrays doesn't feel right.
No. If we run the following code, the thread sanitizer (see this guide) reports a race condition when writing to values[idx]
. However, in my testing, it does work in practice more or less every single time. I ran it in a loop, running thousands upon thousands of times, and had one crash. But this is clearly not what we're meant to do.
let NUM = 1_000_000
func a() {
var values = [Int](repeating: 0, count: NUM)
DispatchQueue.concurrentPerform(iterations: NUM) { idx in
values[idx] = idx // <- not thread safe
}
}
However, it does not seem to be related to any copy-on-write mechanics. Since the threads access the array via closure capture, they are all in fact accessing the same array. We can also put the array in a reference type Box
, and we still have problems with race conditions. To me, this indicates that it's not the copy-on-write behaviour of arrays that is the root of the problem.
class Box {
var values: [Int]
init(values: [Int]) {
self.values = values
}
func update(at index: Int, value: Int) {
values[index] = value // <- not thread safe
}
}
func b() {
let box = Box(values: [Int](repeating: 0, count: NUM))
DispatchQueue.concurrentPerform(iterations: NUM) { idx in
box.update(at: idx, value: idx)
}
}
b()
If we access the underlying buffer directly via withUnsafeMutableBufferPointer
, it should however work correctly. The thread sanitizer doesn't complain, at least.
func c() {
var values = [Int](repeating: 0, count: NUM)
values.withUnsafeMutableBufferPointer { buffer in
DispatchQueue.concurrentPerform(iterations: NUM) { idx in
buffer[idx] = idx // <- *is* thread safe
}
}
}
The copy on write behaviour of arrays in Swift should be thread safe. The reason this isn't safe is because COW isn't being triggered in the first place.
The array value references an underlying memory buffer. Before editing the buffer, it checks that the reference count of the buffer is greater than 1. If it is, it copies the buffer to a new memory location and uses that new buffer instead. This is thread-safe since the original buffer will be untouched.
In the example above, the reference count for the underlying buffer remains at 1, since we don't actually have multiple arrays all referencing the same buffer; we have multiple "references" to the same array, referencing the buffer only once. Since there's only the one array value, no copy on write will happen.