performancejuliamarkov-chains

Faster way to compute distributions from Markov chain?


Suppose that I have a probability transition matrix, say a matrix of dimensions 2000x2000, that represents a homogeneous Markov chain, and I want to get some statistics of each probability distribution of the first 200 steps of the chain (the distribution of the first row at each step), then I've written the following

using Distributions, LinearAlgebra

# This function defines our transition matrix:
function tm(N::Int, n0::Int)
    [pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
end

# This computes the 5-percentile of a probability vector
function percentile5(M::Vector)
    s=0
    i=0
    while s <= 0.05
        i += 1
        s += M[i]
    end
    return i-1
end

# This function compute a matrix with three rows: means, 5-percentiles 
# and standard deviations. Each column represent a session.
function stats(N::Int, n0::Int, m::Int)    
    A = tm(N,n0)
    B = I # Initilizing B with the identity matrix
    sup = 0:N # The support of each distribution
    sup2 = [k^2 for k in sup]
    stats = zeros(3,m)
    for i in 1:m
        C = B[1,:]
        stats[1,i] = sum(C .* sup) # Mean
        stats[2,i] = percentile5(C) # 5-percentile
        stats[3,i] = sqrt(sum(C .* sup2) - stats[1,i]^2) # Standard deviation
        B = A*B
    end
    return stats
end

data = stats(2000,50,200)

My question is, there is a more efficient (faster) way to do the same computation? I don't see a better way to do it but maybe there are some tricks that speed-up this computation.


Solution

  • This is what I have running so far:

    using Distributions, LinearAlgebra, SparseArrays
    
    # This function defines our transition matrix:
    function tm(N::Int, n0::Int)
        [pdf(Hypergeometric(N-l,l,n0),k-l) for l in 0:N, k in 0:N]
    end
    
    # This computes the 5-percentile of a probability vector
    function percentile5(M::AbstractVector)
        s = zero(eltype(M))
        res = length(M)
        @inbounds for i = 1:length(M)
            s += M[i]
            if s > 0.05
                res = i - 1
                break
            end
        end
        return res
    end
    
    # This function compute a matrix with three rows: means, 5-percentiles 
    # and standard deviations. Each column represent a session.
    function stats(N::Int, n0::Int, m::Int)
        A = sparse(transpose(tm(N, n0)))
        C = zeros(size(A, 1))
        C[1] = 1.0
        sup = 0:N # The support of each distribution
        sup2 = sup .^ 2
        stats = zeros(3, m)
        for i = 1:m
            stats[1, i] = sum(C .* sup) # Mean
            stats[2, i] = percentile5(C) # 5-percentile
            stats[3, i] = sqrt(sum(C .* sup2) - stats[1, i]^2) # Standard deviation
            C = A * C
        end
        return stats
    end
    

    It is around 4x faster (on smaller parameters - possibly much more speedup on large parameters). Basically uses the tips I've made in the comment:

    Further improvement are possible (like simulation/ensemble method I've mentioned).