pythonnumpystatistics

Python implementation of statistical Sweep operator


I am learning some techniques for doing statistics with missing data from a book (Statistical Analysis with Missing Data by Little and Rubin). One particularly useful function for working with monotone non-response data is the Sweep Operator (details on page 148-151). I know that the R module gmm has the swp function which does this but I was wondering if anyone has implemented this function in Python, ideally for Numpy matrices to hold the input data. I searched StackOverflow and also did several web searches without success. Thanks for any help.

Here is the definition.

A PxP symmetric matrix G is said to be swept on row and column k if it is replaced by another symmetric PxP matrix H with elements defined as follows:

    h_kk = -1/g_kk
    h_jk = h_kj = g_jk/g_kk for j != k
    h_jl = g_jl - g_jk g_kl / g_kk j != k, l != k
        
        
    G = [g11, g12, g13
         g12, g22, g23
         g13, g23, g33]   
    H = SWP(1,G) = [-1/g11, g12/g11, g13/g11
                   g12/g11, g22-g12^2/g11, g23-g13*g12/g11
                   g13/g11, g23-g13*g12/g11, g33-g13^2/g11]
    kvec = [k1,k2,k3]
    SWP[kvec,G] = SWP(k1,SWP(k2,SWP(k3,G)))

    Inverse function
    H = RSW(k,G)
    h_kk = -1/g_kk
    h_jk = h_kj = -g_jk/g_kk for j != k
    h_jl = g_jk g_kl / g_kk j != k, l != k    

    G == SWP(k,RSW(k,G)) == RSW(k,SWP(k,G))

Solution

  • def sweep(g, k):
        g = np.asarray(g)
        n = g.shape[0]
        if g.shape != (n, n):
            raise ValueError('Not a square array')
        if not np.allclose(g - g.T, 0):
            raise ValueError('Not a symmetrical array')
        if k >= n:
            raise ValueError('Not a valid row number')
        #  Fill with the general formula
        h = g - np.outer(g[:, k], g[k, :]) / g[k, k]
        # h = g - g[:, k:k+1] * g[k, :] / g[k, k]
        # Modify the k-th row and column
        h[:, k] = g[:, k] / g[k, k]
        h[k, :] = h[:, k]
        # Modify the pivot
        h[k, k] = -1 / g[k, k]
        return h
    

    I have no way of testing the above code, but I found an alternativee description here, which is valid for non-symmetrical matrices, which can be calculated as follows:

    def sweep_non_sym(a, k):
        a = np.asarray(a)
        n = a.shape[0]
        if a.shape != (n, n):
            raise ValueError('Not a square array')
        if k >= n:
            raise ValueError('Not a valid row number')
        #  Fill with the general formula
        b = a - np.outer(a[:, k], a[k, :]) / a[k, k]
        # b = a - a[:, k:k+1] * a[k, :] / a[k, k]
        # Modify the k-th row and column
        b[k, :] = a[k, :] / a[k, k]
        b[:, k] = -a[:, k] / a[k, k]
        # Modify the pivot
        b[k, k] = 1 / a[k, k]
        return b
    

    This one does give the correct results for the examples in that link:

    >>> a = [[2,4],[3,1]]
    >>> sweep_non_sym(a, 0)
    array([[ 0.5,  2. ],
           [-1.5, -5. ]])
    >>> sweep_non_sym(sweep_non_sym(a, 0), 1)
    array([[-0.1,  0.4],
           [ 0.3, -0.2]])
    >>> np.dot(a, sweep_non_sym(sweep_non_sym(a, 0), 1))
    array([[  1.00000000e+00,   0.00000000e+00],
           [  5.55111512e-17,   1.00000000e+00]])