machine-learningjuliarandom-forest

Julia MLJ Forest Load:Error: MethodError: no method matching BetaML.Bmlj.RandomForestRegressor()


Helllo,

I have trouble using any kind of decision tree model from MLJ. I have tried 3 packages from MLJ, DecisionTree, Scikit, and now this BetaML. This only happens when I'm trying to train some kind of decision tree. I works fine with other MLJLinearModels and with XGBoost. I always get the same error. The error is coming from the following function:

 function machine_train_predict(df::DataFrame, df_train::DataFrame, model_name::String; args...)
        models = Dict(
        "xgb_reg"=> ["XGBoost" => "XGBoostRegressor"],
        "ridge_reg"=> ["MLJLinearModels" => "RidgeRegressor"],
        "lasso_reg"=> ["MLJLinearModels" => "LassoRegressor"],
        "rf_reg" => ["BetaML" => "RandomForestRegressor"],
        "lin_reg" => ["MLJLinearModels" => "LinearRegressor"],
        "log_class" => ["MLJLinearModels" => "LogisticClassifier"],
        "rf_class" => ["DecisionTree" => "RandomForestClassifier"],
        "xgb_class" => ["XGBoost" => "XGBoostClassifier"]
        )

        y, X =  machine_input(df_train; rng=123)
        y = coerce(y, Continuous)

        mod = models[model_name][1]
        p = mod[1]
        m = mod[2]
        mname = model_name
        Model = @eval @load $(m) pkg=$(p) verbosity=0
        model = Model()

        # train machine and get parameters

        m1 = machine(model, X, y) |> fit!

    #     prepare test set for machine predictions
        y, X =  machine_input(df)
        y = coerce(y, Continuous)
    #     predict
        yhat = MLJ.predict_mode(m1, X)
        return yhat
    end

And the error:

Training. Dataset: global. Iteration N: 1ERROR: LoadError: MethodError: no method matching BetaML.Bmlj.RandomForestRegressor()
The applicable method may be too new: running in world age 33750, while current world is 33793.

Closest candidates are:
  BetaML.Bmlj.RandomForestRegressor(; n_trees, max_depth, min_gain, min_records, max_features, splitting_criterion, β, rng) (method too new to be called from this world context.)
   @ BetaML ~/.julia/packages/BetaML/8WVUG/src/Bmlj/Trees_mlj.jl:219
  BetaML.Bmlj.RandomForestRegressor(::Int64, ::Int64, ::Float64, ::Int64, ::Int64, ::Function, ::Float64, ::Random.AbstractRNG) (method too new to be called from this world context.)
   @ BetaML ~/.julia/packages/BetaML/8WVUG/src/Bmlj/Trees_mlj.jl:193
  BetaML.Bmlj.RandomForestRegressor(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) (method too new to be called from this world context.)
   @ BetaML ~/.julia/packages/BetaML/8WVUG/src/Bmlj/Trees_mlj.jl:193

Stacktrace:
  [1] (::var"#machine_train_predict#38"{var"#machine_train_predict#11#39"})(df::DataFrame, df_train::DataFrame, model_name::String; args::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Main ~/exports/10fold_ml_model-nocache-by-iter.jl:250
  [2] (::var"#machine_train_predict#38"{var"#machine_train_predict#11#39"})(df::DataFrame, df_train::DataFrame, model_name::String)
    @ Main ~/exports/10fold_ml_model-nocache-by-iter.jl:230
  [3] (::var"#train_rescore#36"{var"#train_rescore#10#37"})(df::DataFrame, df_train::DataFrame, model_name::String; args::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Main ~/exports/10fold_ml_model-nocache-by-iter.jl:223
  [4] (::var"#train_rescore#36"{var"#train_rescore#10#37"})(df::DataFrame, df_train::DataFrame, model_name::String)
    @ Main ~/exports/10fold_ml_model-nocache-by-iter.jl:219
  [5] (::var"#proto_train#32"{var"#proto_train#7#33"})(df::DataFrame, df_t::DataFrame, model_name::String; nflds::Int64, args::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:nfolds,), Tuple{Int64}}})
    @ Main ~/exports/10fold_ml_model-nocache-by-iter.jl:211
  [6] (::var"#evaluate_model#42"{var"#evaluate_model#13#43"})(paths::String, output::String, dss::String, niter::Int64, model_name::String; nfolds::Int64, args::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Main ~/exports/10fold_ml_model-nocache-by-iter.jl:284
  [7] (::var"#global_evaluate#40"{var"#global_evaluate#12#41"})(paths::String, output::String, ds::Vector{String}, itern::Int64, model_name::String; nfolds::Int64, args::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Main ~/exports/10fold_ml_model-nocache-by-iter.jl:270
  [8] global_evaluate
    @ ~/exports/10fold_ml_model-nocache-by-iter.jl:267 [inlined]
  [9] main(args::Vector{String})
    @ Main ~/exports/10fold_ml_model-nocache-by-iter.jl:475
 [10] top-level scope
    @ ~/exports/10fold_ml_model-nocache-by-iter.jl:479 

The error is always related to world age and the corresponding MLJInterface of the model in use.

Please help. I have been trying to find a solution for days.

I'm trying to make predictions. The function in question corresponds to the training and prediction step of my script. I wasn't expecting to error because previous regressor models (linear, lasso, ridge, xgboost) under the MLJ framework worked fine.


Solution

  • I have tried just now and with a recent version of MLJ (v0.20.5) it works with the trick below:

    Type this on a script Foo.jl in an empty directory:

    using Pkg
    Pkg.activate(@__DIR__) # Activate the environment on the directory of this script. The first time the environment is "created" by making in this directory 2 files, Project.toml and Manifest.toml
    Pkg.add("MLJ")
    Pkg.add("BetaML")
    
    using MLJ
    import BetaML  # <--- trick here
    X = rand(100,5)
    y = [r[2]+r[3]^2-r[5] for r in eachrow(X)]
    model_name = "rf_reg"
    function predict_y(model_name,X,y)
        models = Dict(
            "xgb_reg"=> ["XGBoost" => "XGBoostRegressor"],
            "ridge_reg"=> ["MLJLinearModels" => "RidgeRegressor"],
            "lasso_reg"=> ["MLJLinearModels" => "LassoRegressor"],
            "rf_reg" => ["BetaML" => "RandomForestRegressor"],
            "lin_reg" => ["MLJLinearModels" => "LinearRegressor"],
            "log_class" => ["MLJLinearModels" => "LogisticClassifier"],
            "rf_class" => ["DecisionTree" => "RandomForestClassifier"],
            "xgb_class" => ["XGBoost" => "XGBoostClassifier"]
        )
        mod = models[model_name][1]
        p = mod[1]
        m = mod[2]
        Model = @eval @load $(m) pkg=$(p) verbosity=0
        model = Model()
    
        # train machine and get parameters
        m1 = machine(model, X, y) |> fit!
        ŷ  = predict_mode(m1, X)
        return ŷ
    end
    ŷ = predict_y(model_name,X,y)
    hcat(y,ŷ)
    

    As I wrote in the comment, it is always best to start a project on a dedicated environment. Also BetaML works on standard arrays, not dataframes.

    [EDIT] Indeed, if @evel @load is done within a function, you end up with the issue you discovered. I don't know exactly what MLJ.@load does and the reason behind it, but the trick is just to import the package providing the model, here BetaML, before calling that function.