juliabayesianprobabilistic-programming

Passing `missing` argument to a bernoulli model in Turing.jl


Turing.jl has a guide that shows how to write a model that allows you to either pass data or a missing argument. In the first case it will get you the posterior, in the second case, instead, it will draw from all the variables.

Setup

using Turing
iterations = 1000
ϵ = 0.05
τ = 10

The working example is like this:

@model gdemo(x, ::Type{T} = Float64) where {T} = begin
    if x === missing
        # Initialize `x` if missing
        x = Vector{T}(undef, 2)
    end
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    for i in eachindex(x)
        x[i] ~ Normal(m, sqrt(s))
    end
end

# Construct a model with x = missing
model = gdemo(missing)
c = sample(model, HMC(0.01, 5), 500);

I tried to adapt this code to a simple coin flip. The bernoulli distribution, I think, should have data type Bool (I tried with Int64 too though). I tried doing this:

@model coinflip2(y) = begin
    if y === missing
        y = Vector{Bool}(undef, 4)
    end
    p ~ Beta(1, 1)
    N = length(y)
    for n in 1:N
        y[n] ~ Bernoulli(p)
    end
end;

chain = sample(coinflip2(missing), HMC(ϵ, τ), iterations);

But it gave me a long error message I don't get but quite likely point to a type problem:

MethodError: Bool(::ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:p, :y),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:p,Tuple{}},Int64},Array{Beta{Float64},1},

Trying to mimic the syntax from the working gdemo function above (which I don't fully get), gives you:

@model coinflip2(y, ::Type{T} = Bool) where {T} = begin
    if y === missing
        y = Vector{T}(undef, 4)
    end
    p ~ Beta(1, 1)
    N = length(y)
    for n in 1:N
        y[n] ~ Bernoulli(p)
    end
end;

chain = sample(coinflip2(missing), HMC(ϵ, τ), iterations);

which also fails with an error message that starts with:

MethodError: Bool(::ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:p, :y),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:p,Tuple{}},Int64},Array{Beta{Float64},1}

How do I write this correctly? Bonus points for explaining what I'm doing wrong :D Thanks!


Solution

  • Hamiltonian Monte Carlo works only on continuous distributions, since it differentiates a function of the PDF. When you sample from the joint model, (p, y), this also applies to the random variable y, which, being Bernoulli, is certainly not continuous. Hence, the AD system used in the background (ForwardDiff by default) will complain.

    You can solve this by specifying a non-Hamiltonian sampler for y using Gibbs:

    chain = sample(coinflip2(missing), Gibbs(HMC(ϵ, τ, :p), MH(:y)), iterations)
    

    MH might not be the best choice, you'd have to think about that yourself, but it works.

    Note that is not necessary for parameter estimation: just p given y is continuous and makes HMC happy:

    chain = sample(coinflip2([0,1,1,0]), HMC(ϵ, τ), iterations)