I am a bit confused about the C++11 random library.
What I understand: we need two separate concepts:
std::mt19937
)std::random_device
)I know how to use the true random number generator random_device
to seed a PRNG like mt19937
.
But what I don't understand is why not just use the true random number generator alone:
std::random_device rd;
std::uniform_int_distribution<int> dist(1, 5);
// get random numbers with:
dist(rd);
As far as I can tell this works well.
Instead, what I found on most examples/sites/articles is that a PRNG is used (with the true random number generator being used just to seed the PRNG):
std::random_device rd;
std::mt19937 e{rd()}; // or std::default_random_engine e{rd()};
std::uniform_int_distribution<int> dist{1, 5};
// get random numbers with:
dist(e);
I am not talking about special use, e.g. cryptography, just your basic getting started articles.
My suspicion is because std::mt19937
(or std::default_random_engine
) accepts a seed, it can be easier to debug by providing the same seed during a debug session.
Also, why not just:
std::mt19937 e{std::random_device{}()};
Also, why not just:
std::mt19937 e{std::random_device{}()};
It might be fine if you only will do this once, but if you will do it many times, it's better to keep track of your std::random_device
and not create / destroy it unnecessarily.
It may be helpful to look at the libc++ source code for implementation of std::random_device
, which is quite simple. It's just a thin wrapper over std::fopen("/dev/urandom")
. So each time you create a std::random_device
you are getting another filesystem handle, and pay all associated costs. (And whenever you read from it, you are making a system call.)
On windows, as I understand, std::random_device
represents some call to a microsoft crypto API, so you are going to be initializing and destroying some crypto library interface everytime you do this.
It depends on your application, but for general purposes I wouldn't think of this overhead as always negligible. Sometimes it is, and then this is great.
I guess this ties back into your first question:
Instead, this is what I found on most examples/sites/articles:
std::random_device rd;
std::mt19937 e{rd()}; // or std::default_random_engine e{rd()};
std::uniform_int_distribution<int> dist{1, 5};
At least the way I think about it is:
std::mt19937
is a very simple and reliable random generator. It is self-contained and will live entirely in your process, not calling out to the OS or anything else. The implementation is mandated by the standard, and at least in boost, it used the same code everywhere, derived from the original mt19937
paper. This code is very stable and it's cross-platform. You can be pretty confident that initializing it, querying from it, etc. is going to compile to similar code on any platform that you compile it on, and that you will get similar performance.
std::random_device
by contrast is pretty opaque. You don't really know exactly what it is, what it's going to do, or how efficient it will be. You don't even know if it can actually be acquired -- it might throw an exception when you attempt to create it. It might make a system call whenever you read from it, so it may have much worse performance in terms of cycles-per-byte than std::mt19937
. You know that it doesn't require a seed. You're not usually supposed to pull tons and tons of data from it, just use it to generate seeds. Sometimes, it acts as a nice interface to cryptographic APIs, but it's not actually required to do that and sadly sometimes it doesn't. It might correspond to /dev/random
on unix, it might correspond to /dev/urandom/
. It might correspond to some MSVC crypto API (visual studio), or it might just be a fixed constant (mingw). If you cross-compile for some phone, who knows what it will do. (And even when you do get /dev/random
, you still have the problem that performance may not be consistent -- it may appear to work great, until the entropy pool runs out, and then it runs slow as a dog.)
The way I think about it is, std::random_device
is supposed to be like an improved version of seeding with time(NULL)
-- that's a low bar, because time(NULL)
is a pretty crappy seed all things considered. I usually use it where I would have used time(NULL)
to generate a seed, back in the day. I don't really consider it all that useful outside of that.