pythonrandomrandom-seedpython-sounddevice

Is this attempt to implement real randomness valid?


Pseudo-randomness becomes real randomness with the lack of an actual pattern in the series of generated values; so essentially the sequence of random elements that repeats itself is potentially infinite.

I know that the way random.py seed()s is designed to get as far as possible from the 'pseudo' character (i.e. using current timestamp, machine parameters, etc.) which is fine for the widest majority of cases, but what if one needs to mathematically ensure zero predictability?

I've read that real randomness can be achieved when we seed() based on particular physical events such as radioactive decay, but what if, e.g., I used an array deriving from a recorded audio stream?

The following is an example of how I'm overriding default random.seed() behaviour for this purpose. I'm using sounddevice library which implements bindings to services responsible for managing I/O sound devices.

# original random imports here
# ...

from sounddevice import rec

__all__ = ["module level functions here"]

# original random constants here
# ...

# sounddevice related constants
# ----------------------------------------------------------------------
# FS: Sampling Frequency in Hz (samples per second);
# DURATION: Duration of the recorded audio stream (seconds);
# *Note: changing the duration will result in a slower generator, since
# the seed method must wait for the entire stream to be recorded
# before processing further.
# CHANNELS: N° of audio channels used by the recording function (_rec);
# DTYPE: Data type of the np.ndarray returned by _rec;
# *Note: dtype can also be a np.dtype object. E.g., np.dtype("float64").

FS = 48000 
DURATION = 0.1
CHANNELS = 2 
DTYPE = 'float64'


# ----------------------------------------------------------------------
# The class implements a custom random generator with a seed obtained
# through the default audio input device.
# It's a subclass of random.Random that overrides only the seed method;
# it records an audio stream with the default parameters and returns the
# content in a newly created np.ndarray.
# Then the array's elements are added together and some transformations
# are performed on the sum, in order to obtain a less uniform float.
# This operation causes the randomness to concern the decimal part in
# particular, which is subject to high fluctuation, even when the noise
# of the surrounding environment is homogeneous over time.
# *Note: the blocking parameter suspends the execution until the entire
# stream is recorded, otherwise the np array will be partially empty.
# *Note: when the seed argument is specified and different than None,
# SDRandom will behave exactly like its superclass

class SDRandom(Random):

    def seed(self, a=None, version=2):
        if isinstance(a, type(None)):
            stream = rec(frames=round(FS * DURATION),
                         samplerate=FS,
                         channels=CHANNELS,
                         dtype=DTYPE,
                         blocking=True
                         )

            # Sum and Standard Deviation of the flattened ndarray.
            sum_, std_ = stream.sum(), stream.std() 

            # round() determines the result's sign.
            b = sum_ - round(sum_)

            # Collecting a number of exponents based on the std' digits.
            e = [1 if int(c) % 2 else -1 for c in str(std_).strip("0.")]

            a = b * 10 ** sum(e)

        super().seed(a)


# ----------------------------------------------------------------------
# Create one instance, seeded from an audio stream, and export its
# methods as module-level functions.
# The functions share state across all uses.

_inst = SDRandom()
# binding class methods to module level functions here
# ...

## ------------------------------------------------------
## ------------------ fork support  ---------------------

if hasattr(_os, "fork"):
    _os.register_at_fork(after_in_child=_inst.seed)


if __name__ == '__main__':
    _test() # See random._test() definition.

And my implementation still doesn't achieve genuine randomness according to the theory. How is this possible? How could audio inputs be deterministic in any way, even when considering the following?

This operation causes the randomness to concern the decimal part in particular, which is subject to high fluctuation, even when the noise of the surrounding environment is homogeneous over time.


Solution

  • You're better off just using the secrets module for "real" randomness. This provides you with data from your kernel's CSPRNG, which should be constantly be gathering and mixing in new entropy in a way designed to make life very hard of any attacker.

    Your use of infinite isn't appropriate either, you can't run something for "infinitely long" the heat death of the universe will happen a long time before then.

    Using the standard Mersenne Twister (as Python's random module does) also seems inappropriate, as an attacker can recover the state after drawing just 624 variates. Using a CSPRNG would make this much harder, and constantly mixing in new state, as your kernel probably does, further hardens this.

    Finally, treating the samples as floats then taking the mean and standard-deviation doesn't seem appropriate. You'd be much better off leaving them as ints and just passing them through a cryptographic hash. For example:

    import hashlib
    import random
    
    import sounddevice as sd
    
    samples = sd.rec(
        frames=1024,
        samplerate=48000,
        channels=2,
        dtype='int32',
        blocking=True,
    )
    
    rv = int.from_bytes(hashlib.sha256(samples).digest(), 'little')
    print(rv)
    
    random.seed(rv)
    print(random.random())
    

    But then again, please just use secrets, it's a much better option.

    Note: recent versions of Linux, Windows, OSX, FreeBSD, OpenBSD kernels all work as I've described above. They make good attempts at gathering entropy, and mix into a CSPRNG in sensible way; for example, see Fortuna.