randomarduinomicrocontrolleravrentropy

Is it possible to generate random numbers using physical sensors?


I've heard about people using light sensors, geiger counters, and other physical sensors to generate random numbers, but I'm skeptical. Is there really a way to generate random numbers from taking measurements of the physical world (using an Arduino or any other microcontroller)? If so, would these numbers ever be really random?

to clarify: the question is about the feasibility of using microcontroller-gathered data to generate random numbers that could be applied soundly to cryptography-- an alternative to relying on a device's entropy.


Solution

  • Taking analog "real world" measurements usually is a good source of entropy (a.k.a. real random data). Analog sources always have some unpredictable noise superimposed which can be "harvested". However, as was stated before, the measured data is rarely un-biased.

    Analog electrical measurements may also be more or less susceptible to uncontrollable influence or even attacks from outside, e.g. by causing saturation of the sensor(s). EMI is also likely to interfere with the measurement; a regular cell phone placed reasonably close to the circuit during a call will most likely wreak havoc on any analog signals.

    Bias

    Un-biased, uniformly distributed numbers of high entropy are commonly those one wants, because their properties (not values) are somewhat normalized and can therefore be more reliably predicted.

    When measuring analog input with, say, 10 bit resolution, ideally the range of numbers gathered from the measurement will cover all values from 0 to 1023 and each value will occur with the same frequency (or probability) as any other value from that range.

    In reality, those values will often be (more or less) normally distributed (Gauss distributed) around some avarage value with some characteristic standard deviation, for example around 500 @ 10 bit per sample.

    De-biasing

    So, in order to generate random values with the desired properties (see above), some de-biasing needs to be done: A randomness extractor of some kind is needed.

    Using cryptographic functions, like (one way) hash functions or cipher algorithms, usually easily produces the desired result as well; this comes at a cost though: The "mixing" of those functions is by design extremely strong, which makes it impossible to determine the quality (= randomness) of the source data after the transformation. Even the simplest deterministic sequences of values, like {1,2,3,4,5...}, when hashed produce data that will most likely pass any and all statistical randomness tests pretty well, although it is not "random" at all.

    Generating entropy on a µC

    Timing

    In microcontroller environments a good source of entropy that is seldom thought of is the timing of events. Using a high-speed timer, true entropy can be gathered by reading the timer value in response to some asynchronous event. Such an event, which is uncorrelated with the running timer, may be the push of a button by a user, the start of communication initiated by another (sub-)system or IC, or basically any other event not triggered by the µC (or any synchronous subsystem) itself.

    In fact, entropy can even be harvested from just two independent clock sources; for instance by timing cycles of one clock via the other clock. This opens a couple of very interesting possibilities on Atmel AVR µCs (which are used in the Arduino) depending on the µC's capabilities:

    The latter seems to be promising for its potential of relatively high bandwidth: When timing each 32kHz clock cycle with a system timer running significantly faster (a factor of 600+ can be achieved on current AVRs @ 20 MHz!) and conservatively assuming only 1 bit of entropy per measurement, this results in 32000+ bits of entropy per second - far more than a µC will ever consume by itself.

    EDIT: Meanwhile, I have conducted some simple tests of the 32kHz timer approach, and the short-term results seem to be of pretty low quality. The upper boundary on the generated entropy per sample seems to be really low, while I have not even tested the samples for non-obvious patterns originating from more or less regular phase shifts. This result may be due to the fact that my DUT had its main clock driven by an external crystal which may be expected to be (within the precison of the measurements) equally stable in frequency as the 32kHz quartz when observed over a limited time range. Extending the the time between taking two samples (minutes?) will probably return good entropy, yet at a very low bandwith. (N.b.: The jitter measured may also be partly due to varying interrupt latency depending on the machine instruction executed right at the time the interrupt is triggered.)

    EDIT #2: It appears that the internal RC oscillator of my DUT (ATmega1284) produces significant frequency jitter (several kHz/s); running on this oscillator indeed seems to produce pretty much entropy (kBits/s) when timed by the external 32kHz crystal.

    In a little experiment I recently investigated the former two methods. On my DUT the EEPROM timing would generally be advantageous over the WDT:

    Timing the EEPROM write produced about 4.82 bits of entropy per write operation, while the watchdog timer seems more stable frequency-wise yielding about 3.92 bits per watchdog timeout. Additionally, the EEPROM's write times seem much more smoothly Gauss-distributed where the WDT's distribution seems somewhat asymmetric and with a lot of aberrations.

    N.b.: Aggregating multiple "random" events for a single entropy measurement may actually degrade the entropy obtained: Fast, random fluctuations in the source may partially compensate each other, yielding result values with lower deviation from the mean value. So, instead of timing, for instance, one real time second (32k cycles of the RTC crystal) much more entropy can be expected from taking 32k timings (one for each cycle of the crystal) during the same time.

    Uninitialized RAM

    Avr-gcc compiled applications usually have the whole on-chip RAM cleared to 0x00 before executing user code, i.e. main(). Putting code into an early .init section provides access to the raw uninitialized RAM content before it is overwritten by gcc's initialization routines.

    Due to miniscule variances in the RAM's physical storage cells (bits) and depending on some true random thermal noise (and other effects), not every cell will initialize itself to the same known state when power is (re-)applied to the chip. Combining the contents of the chip's RAM right after power up with some function can yield significant amounts of entropy to be used later. - The downside of this is that it will only work reliably when power has been turned off for some time and is then turned on again. A normal chip reset, by hardware, software, or external signal, will preserve the RAM's previous content and is therefore not (always) a good source of entropy. However, since the state of the whole system (RAM) at the time of the reset can hardly be predicted in a reasonably complex application some entropy may be gathered immediately after a reset anyway.

    Bandwidth requirements

    The quality of an entropy source has to be seen in relation to its bandwidth and the bandwidth of the use of entropy by the application. Some methods of entropy gathering may not yield more than one bit of entropy during some seconds time while others (not really on µCs...) may produce 100 kbit/s or more.

    It must be noted that one cannot algorithmically "create" new entropy from existing entropy! - One bit of entropy cannot be computationally transformed to two bits of entropy.

    Thus, one cannot (on avarage) consume more (real) entropy per time unit than what is gathered from the entropy source(s) in the same time.

    Proposal

    When in need of strong random numbers, it is not uncommon to combine one or more sources of real entropy with a strong PRNG, using the entropy gathered to (re-)seed the PRNG each time new entropy is available.

    The PRNG can be used to generate much more basically unpredictable data than the entropy source would actually provide in the same time.

    As a special kind of PRNG, cryptographic cipher functions can be used, where entropy is used to initialize and update the cipher's key.

    Linux's /dev/urandom is commonly implemented this way.

    Conclusion

    As discussed above, it is quite feasible to generate truly random numbers on a common microcontroller. As for all other sources of entropy, one needs to analyze the raw numbers provided by the entropy source(s) for the amount of real entropy they contain and for the amount of entropy generated per time unit to determine if the source is suitable for the use case or not.

    The combination of a true entropy source and a strong PRNG is the approach that is commonly implemented and which should be used on a microcontroller aswell.

    EDIT:

    The PRNG approach may not be the best choice for cryptographic key generation. For that one should only use truly random bits to produce a secure key. Gathering this amount of entropy may take some time (seconds maybe), but since key generation is usually not performed very frequently on a µC this may well be acceptable. (On a heavily loaded server witch hundreds or more SSL (HTTPS) connections per second this will be quite another issue...)

    To produce quality high entropy bitstreams suitable for key generation a randomness extractor as mentioned above should be employed.

    (On the other hand, if the amount of entropy in the source's output can be measured or estimated one may simply scale the key length by the factor of (bitlength of key)/(entropy per bit sampled) and then use the raw low entropy data from the entropy source directly to generate this longer key, which will then have the same overall entropy as a fully random key of the original length. If this really does the trick depends, however, on how the cipher handles keys of different lenghts.)