pythonnumpypandasspc

Reasoning about consecutive data points without using iteration


I am doing SPC analysis using numpy/pandas.

Part of this is checking data series against the Nelson rules and the Western Electric rules.

For instance (rule 2 from the Nelson rules): Check if nine (or more) points in a row are on the same side of the mean.

Now I could simply implement checking a rule like this by iterating over the array.


Solution

  • As I mentioned in a comment, you may want to try using some stride tricks.

    Example with x=np.random.rand(10) and N=3

    >>> x = array([ 0.57016436,  0.79360943,  0.89535982,  0.83632245,  0.31046202,
                0.91398363,  0.62358298,  0.72148491,  0.99311681,  0.94852957])
    >>> signs = np.sign(x-x.mean()).astype(np.int8)
    array([-1,  1,  1,  1, -1,  1, -1, -1,  1,  1], dtype=int8)
    >>> strided = as_strided(signs,strides=(1,1),shape=(signs.size,3))
    array([[  -1,    1,    1],
           [   1,    1,    1],
           [   1,    1,   -1],
           [   1,   -1,    1],
           [  -1,    1,   -1],
           [   1,   -1,   -1],
           [  -1,   -1,    1],
           [  -1,    1,    1],
           [   1,    1, -106],
           [   1, -106,  -44]], dtype=int8)
    >>> consecutive=strided[:-N+1].sum(axis=-1)
    array([ 1,  3,  1,  1, -1, -1, -1,  1])
    >>> np.nonzero(np.abs(consecutive)==N)
    (array([1]),)