machine-learningjulialogistic-regressionflux-machine-learning

Logistic regression using Flux.jl


I have a dataset consisting of student marks in 2 subjects and the result if the student is admitted in college or not. I need to perform a logistic regression on the data and find the optimum parameter θ to minimize the loss and predict the results for the test data. I am not trying to build any complex non linear network here.

The data looks like this enter image description here

I have the loss function defined for logistic regression like this which works fine

predict(X) = sigmoid(X*θ)
loss(X,y) = (1 / length(y)) * sum(-y .* log.(predict(X)) .- (1 - y) .* log.(1 - predict(X)))

I need to minimize this loss function and find the optimum θ. I want to do it with Flux.jl or any other library which makes it even easier. I tried using Flux.jl after reading the examples but not able to minimize the cost.

My code snippet:

function update!(ps, η = .1)
  for w in ps
    w.data .-= w.grad .* η
    print(w.data)
    w.grad .= 0
  end
end

for i = 1:400
  back!(L)
  update!((θ, b))
  @show L
end

Solution

  • You can use either GLM.jl (simpler) or Flux.jl (more involved but more powerful in general). In the code I generate the data so that you can check if the result is correct. Additionally I have a binary response variable - if you have other encoding of target variable you might need to change the code a bit.

    Here is the code to run (you can tweak the parameters to increase the convergence speed - I chose ones that are safe):

    using GLM, DataFrames, Flux.Tracker
    
    srand(1)
    n = 10000
    df = DataFrame(s1=rand(n), s2=rand(n))
    df[:y] = rand(n) .< 1 ./ (1 .+ exp.(-(1 .+ 2 .* df[1] .+ 0.5 .* df[2])))
    model = glm(@formula(y~s1+s2), df, Binomial(), LogitLink())
    
    x = Matrix(df[1:2])
    y = df[3]
    W = param(rand(2,1))
    b = param(rand(1))
    predict(x) = 1.0 ./ (1.0+exp.(-x*W .- b))
    loss(x,y) = -sum(log.(predict(x[y,:]))) - sum(log.(1 - predict(x[.!y,:])))
    
    function update!(ps, η = .0001)
      for w in ps
        w.data .-= w.grad .* η
        w.grad .= 0
      end
    end
    
    i = 1
    while true
      back!(loss(x,y))
      max(maximum(abs.(W.grad)), abs(b.grad[1])) > 0.001 || break
      update!((W, b))
      i += 1
    end
    

    And here are the results:

    julia> model # GLM result
    StatsModels.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Binomial{Float64},GLM.LogitLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
    
    Formula: y ~ 1 + s1 + s2
    
    Coefficients:
                 Estimate Std.Error z value Pr(>|z|)
    (Intercept)  0.910347 0.0789283 11.5338   <1e-30
    s1            2.18707  0.123487 17.7109   <1e-69
    s2           0.556293  0.115052 4.83513    <1e-5
    
    
    julia> (b, W, i) # Flux result with number of iterations needed to converge
    (param([0.910362]), param([2.18705; 0.556278]), 1946)