I was playing around with some simple C++ code for an exam question about threading, where the threads were calling rand()
(and yes, I am aware there are good reasons to use other generators, but wanted to keep the exam question simple and to-the-point). The code was using srand()
to seed the prng before creating the threads, but I was surprised to notice that every thread was getting the same sequence of numbers from the prng (independently) as if it hadn't been seeded (or rather seeded with 1
as is the default). It was not until I started seeding the prng in each thread that the sequence changed.
I know that rand()
is not thread-safe, so I'm not exactly surprised that the threads would all get the same sequence (as I imagine the generator would need some cache syncing to avoid that), but I am surprised that seeding the prng in the host process before creating the threads does not seem to have any effect on the sequence.
For example, in a comment to this question: Threads, how to seed random number generator independently? it is stated that you should "not seed your generator in the threads. Seed it before starting any thread. The generator you are using with rand() and srand() is unique for the whole program."
So my question is this: Does each thread get a separate copy of the generator, or why does seeding it in the host process not affect the sequence?
A minimal code example reproducing the issue follows. I am aware these are not good seeds or best practice, but it demonstrates what I'm talking about.
When I run this (no seeding at all):
#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>
using namespace std;
void test(int id, mutex &channel)
{
// srand(id);
channel.lock();
cout << id << ": " << rand() % 1000 << endl;
channel.unlock();
this_thread::sleep_for(chrono::milliseconds(100));
channel.lock();
cout << id << ": " << rand() % 1000 << endl;
channel.unlock();
}
int main()
{
// srand(5);
cout << rand() % 1000 << endl;
cout << rand() % 1000 << endl;
int n = 4;
thread threads[n];
mutex output_channel;
for (int i = 0; i < n; ++i)
{
threads[i] = thread(test, i, ref(output_channel));
}
for (int i = 0; i < n; ++i)
{
threads[i].join();
}
return 0;
}
I get this:
41
467
0: 41
1: 41
2: 41
3: 41
3: 467
2: 467
0: 467
1: 467
When I run this (only seeding once):
#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>
using namespace std;
void test(int id, mutex &channel)
{
// srand(id);
channel.lock();
cout << id << ": " << rand() % 1000 << endl;
channel.unlock();
this_thread::sleep_for(chrono::milliseconds(100));
channel.lock();
cout << id << ": " << rand() % 1000 << endl;
channel.unlock();
}
int main()
{
srand(5);
cout << rand() % 1000 << endl;
cout << rand() % 1000 << endl;
int n = 4;
thread threads[n];
mutex output_channel;
for (int i = 0; i < n; ++i)
{
threads[i] = thread(test, i, ref(output_channel));
}
for (int i = 0; i < n; ++i)
{
threads[i].join();
}
return 0;
}
I get this:
54
693
0: 41
1: 41
2: 41
3: 41
2: 467
3: 467
0: 467
1: 467
which is what I find surprising. As you see, the threads exhibit the same behavior as when there's no seeding.
And when I run this (seeding in each thread):
#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>
using namespace std;
void test(int id, mutex &channel)
{
srand(id);
channel.lock();
cout << id << ": " << rand() % 1000 << endl;
channel.unlock();
this_thread::sleep_for(chrono::milliseconds(100));
channel.lock();
cout << id << ": " << rand() % 1000 << endl;
channel.unlock();
}
int main()
{
// srand(5);
cout << rand() % 1000 << endl;
cout << rand() % 1000 << endl;
int n = 4;
thread threads[n];
mutex output_channel;
for (int i = 0; i < n; ++i)
{
threads[i] = thread(test, i, ref(output_channel));
}
for (int i = 0; i < n; ++i)
{
threads[i].join();
}
return 0;
}
I get this:
41
467
0: 38
1: 41
2: 45
3: 48
2: 216
3: 196
0: 719
1: 467
which achieves independent randomization in the threads (and one could easily produce better seeds etc.), but I would like to understand the reason for the behavior witnessed above.
In case it's relevant, I'm running this on Windows 10 on a dual-core machine (Intel(R) Core(TM) i7-7600U CPU), compiling with g++.
There is an aspect you're overlooking: Hardware is magic.
You've seeded srand
in one thread, and the CPU running that thread updated the globals, which were written to that CPU's private L1 cache.
Then your code spawned new threads, each of which tried to read from those same globals. From RAM. And thus read the value from before the write.
Reads and writes using locks or atomic
will use special CPU instructions forcing writes to shared caches/RAM, and special CPU instructions forcing reads to read from shared caches/RAM, but since your rand
implementation isn't thread-safe (and thus this is undefined behavior), the compiler did not emit those instructions, and thus the values read and written by different threads are inconsistent.