gpuxgboost

XGBoost training on gpu using dataframe structures


I'm getting the error below, I think because my X_train and y_train values are relative to being a dataframe on my cpu.

''Falling back to prediction using DMatrix due to mismatched devices. This might lead to higher memory usage and slower performance. XGBoost is running on: cuda:0, while the input data is on: cpu. Potential solutions:''

I'm running a gridsearch currently which is via sckitlearn's api - which does not allow for the usage of non numpy/dataframe inputs. How can I continue to train on my GPU for a gridserach?


Solution

  • I believe that the problem you're facing is also referenced here. trivialfis provided some demo code in his reply, which I shall paste down here.

    import cupy as cp
    import xgboost as xgb
    from sklearn.datasets import make_regression
    
    X, y = make_regression()
    
    reg = xgb.XGBRegressor()
    reg.fit(X, y)
    
    # No warning, reg and X are on CPU
    reg.predict(X)
    
    # Put X into GPU
    X = cp.array(X)
    # Put reg to GPU
    reg.set_params(device="cuda")
    # No warning, both on GPU
    reg.predict(X)
    
    # Warning, reg is on CPU, but X on GPU
    reg.set_params(device="cpu")
    reg.predict(X)
    
    X = cp.asnumpy(X)
    reg.set_params(device="cuda")
    # Warning, reg is on GPU, but X on CPU
    reg.predict(X)
    

    anonymousTechpreneur also provided his code in the Github issue and has just the code you need, I however think you should try it for yourself.