pythonalgorithmmathfloyd-cycle-finding

Detect period of unknown source


How to detect repeating digits in an infinite sequence? I tried Floyd & Brent detection algorithm but come to nothing... I have a generator that yields numbers ranging from 0 to 9 (inclusive) and I have to recognize a period in it.

Example test case:

import itertools

# of course this is a fake one just to offer an example
def source():
    return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 1))

>>> gen = source()
>>> period(gen)
(1, 0, 1, 4, 8, 2, 1, 3, 3, 1)

Solution

  • Empirical methods

    Here's a fun take on the problem. The more general form of your question is this:

    Given a repeating sequence of unknown length, determine the period of the signal.

    The process to determine the repeating frequencies is known as the Fourier Transform. In your example case the signal is clean and discrete, but the following solution will work even with continuous noisy data! The FFT will try to duplicate the frequencies of the input signal by approximating them in the so-called "wave-space" or "Fourier-space". Basically a peak in this space corresponds to a repeating signal. The period of your signal is related to the longest wavelength that is peaked.

    import itertools
    
    # of course this is a fake one just to offer an example
    def source():
        return itertools.cycle((1, 0, 1, 4, 8, 2, 1, 3, 3, 2))
    
    import pylab as plt
    import numpy as np
    import scipy as sp
    
    # Generate some test data, i.e. our "observations" of the signal
    N = 300
    vals = source()
    X = np.array([vals.next() for _ in xrange(N)])
    
    # Compute the FFT
    W    = np.fft.fft(X)
    freq = np.fft.fftfreq(N,1)
     
    # Look for the longest signal that is "loud"
    threshold = 10**2
    idx = np.where(abs(W)>threshold)[0][-1]
    
    max_f = abs(freq[idx])
    print "Period estimate: ", 1/max_f
    

    This gives the correct answer for this case, 10 though if N didn't divide the cycles cleanly, you would get a close estimate. We can visualize this via:

    plt.subplot(211)
    plt.scatter([max_f,], [np.abs(W[idx]),], s=100,color='r')
    plt.plot(freq[:N/2], abs(W[:N/2]))
    plt.xlabel(r"$f$")
    
    plt.subplot(212)
    plt.plot(1.0/freq[:N/2], abs(W[:N/2]))
    plt.scatter([1/max_f,], [np.abs(W[idx]),], s=100,color='r')
    plt.xlabel(r"$1/f$")
    plt.xlim(0,20)
    
    plt.show()
    

    enter image description here