pythonmultithreadingmachine-learningxgboostnon-deterministic

XGBRegressor: how to fix exploding train/val loss (and effectless random_state)?


I used XGBRegressor to fit a small dataset, with (data_size, feature_size) = (156, 328). Although random_state is given, the train/val history can not be reproduced each time I executed the program, sometimes the training process was fine, sometimes there was train/val exploding issue. Why was the random_state effectless? How can I fix the exploding-loss issue?

Code:

from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
import pandas as pd

SEED = 123
dfs = pd.read_csv('./dir')
dfs_train, dfs_cv = train_test_split(dfs, train_size=0.8, shuffle=False, random_state=SEED)
df_train_x = dfs_train.drop(columns='Y')
df_train_y = dfs_train['Y']
df_cv_x = dfs_cv.drop(columns='Y')
df_cv_y = dfs_cv['Y']


params = {"booster":"gblinear",
          "eval_metric": "rmse",
          "predictor": "cpu_predictor",
          "max_depth": 16,
          "n_estimators":100,
          'random_state':SEED
         }
model = XGBRegressor(**params)
model.fit(df_train_x.values, df_train_y.values,
        eval_set=[(df_train_x.values, df_train_y.values), (df_cv_x.values, df_cv_y.values)],
        eval_metric='rmse',
        verbose=True)

output 1 (exploding):

[0] validation_0-rmse:1.75475   validation_1-rmse:1.88660
[1] validation_0-rmse:1.25838   validation_1-rmse:1.67099
[2] validation_0-rmse:1.09559   validation_1-rmse:1.52534
[3] validation_0-rmse:1.13592   validation_1-rmse:1.36564
[4] validation_0-rmse:1.17923   validation_1-rmse:1.18143
[5] validation_0-rmse:1.02157   validation_1-rmse:1.34878
[6] validation_0-rmse:0.83439   validation_1-rmse:1.26116
[7] validation_0-rmse:0.75650   validation_1-rmse:1.32562
[8] validation_0-rmse:0.69412   validation_1-rmse:1.26147
[9] validation_0-rmse:0.65568   validation_1-rmse:1.11168
[10]    validation_0-rmse:0.62501   validation_1-rmse:1.13932
[11]    validation_0-rmse:0.61957   validation_1-rmse:1.17217
[12]    validation_0-rmse:0.58313   validation_1-rmse:1.17873
[13]    validation_0-rmse:0.69826   validation_1-rmse:1.28131
[14]    validation_0-rmse:0.65318   validation_1-rmse:1.23954
[15]    validation_0-rmse:0.69506   validation_1-rmse:1.17325
[16]    validation_0-rmse:0.90857   validation_1-rmse:1.16924
[17]    validation_0-rmse:1.29021   validation_1-rmse:1.23918
[18]    validation_0-rmse:0.86403   validation_1-rmse:1.10940
[19]    validation_0-rmse:0.74296   validation_1-rmse:1.09483
[20]    validation_0-rmse:0.66514   validation_1-rmse:1.03155
[21]    validation_0-rmse:0.60940   validation_1-rmse:0.97993
[22]    validation_0-rmse:0.57345   validation_1-rmse:0.91434
[23]    validation_0-rmse:0.56455   validation_1-rmse:0.95662
[24]    validation_0-rmse:0.51317   validation_1-rmse:0.91908
[25]    validation_0-rmse:0.61795   validation_1-rmse:1.19921
[26]    validation_0-rmse:0.52034   validation_1-rmse:0.96785
[27]    validation_0-rmse:0.79248   validation_1-rmse:1.32662
[28]    validation_0-rmse:0.61955   validation_1-rmse:1.02642
[29]    validation_0-rmse:0.59526   validation_1-rmse:1.12646
[30]    validation_0-rmse:0.78931   validation_1-rmse:1.28633
[31]    validation_0-rmse:0.50458   validation_1-rmse:1.08621
[32]    validation_0-rmse:0.83105   validation_1-rmse:1.56490
[33]    validation_0-rmse:0.62568   validation_1-rmse:1.38425
[34]    validation_0-rmse:0.59277   validation_1-rmse:1.32925
[35]    validation_0-rmse:0.54544   validation_1-rmse:1.30204
[36]    validation_0-rmse:0.54612   validation_1-rmse:1.34128
[37]    validation_0-rmse:0.54343   validation_1-rmse:1.36388
[38]    validation_0-rmse:2.05047   validation_1-rmse:2.63729
[39]    validation_0-rmse:7.35043   validation_1-rmse:7.61231
[40]    validation_0-rmse:6.88989   validation_1-rmse:5.74990
[41]    validation_0-rmse:6.68002   validation_1-rmse:6.98875
[42]    validation_0-rmse:8.64272   validation_1-rmse:6.07278
[43]    validation_0-rmse:5.42061   validation_1-rmse:4.87993
[44]    validation_0-rmse:6.02975   validation_1-rmse:6.28529
[45]    validation_0-rmse:5.53219   validation_1-rmse:6.61440
[46]    validation_0-rmse:21.73743  validation_1-rmse:12.64479
[47]    validation_0-rmse:14.01517  validation_1-rmse:15.05459
[48]    validation_0-rmse:9.78612   validation_1-rmse:12.35174
[49]    validation_0-rmse:8.14741   validation_1-rmse:10.34468
[50]    validation_0-rmse:7.37258   validation_1-rmse:9.14025
[51]    validation_0-rmse:13.28054  validation_1-rmse:15.57369
[52]    validation_0-rmse:9.72434   validation_1-rmse:8.82560
[53]    validation_0-rmse:7.43478   validation_1-rmse:8.69813
[54]    validation_0-rmse:6.99072   validation_1-rmse:7.90911
[55]    validation_0-rmse:6.33418   validation_1-rmse:7.16309
[56]    validation_0-rmse:5.98817   validation_1-rmse:6.86138
[57]    validation_0-rmse:6.63810   validation_1-rmse:7.32003
[58]    validation_0-rmse:12.34689  validation_1-rmse:17.12449
[59]    validation_0-rmse:11.46232  validation_1-rmse:11.11735
[60]    validation_0-rmse:8.22308   validation_1-rmse:8.42130
[61]    validation_0-rmse:8.03585   validation_1-rmse:9.78268
[62]    validation_0-rmse:6.08736   validation_1-rmse:9.08017
[63]    validation_0-rmse:5.65990   validation_1-rmse:9.01591
[64]    validation_0-rmse:4.94540   validation_1-rmse:8.60943
[65]    validation_0-rmse:12.16186  validation_1-rmse:9.97841
[66]    validation_0-rmse:24.36063  validation_1-rmse:30.90603
[67]    validation_0-rmse:23.63998  validation_1-rmse:15.92554
[68]    validation_0-rmse:38.54043  validation_1-rmse:49.15125
[69]    validation_0-rmse:26.96050  validation_1-rmse:35.93348
[70]    validation_0-rmse:36.68499  validation_1-rmse:35.61835
[71]    validation_0-rmse:44.18962  validation_1-rmse:41.25709
[72]    validation_0-rmse:35.57274  validation_1-rmse:36.54894
[73]    validation_0-rmse:32.26445  validation_1-rmse:37.02519
[74]    validation_0-rmse:38.02793  validation_1-rmse:60.88339
[75]    validation_0-rmse:29.93598  validation_1-rmse:46.07689
[76]    validation_0-rmse:26.86872  validation_1-rmse:41.39200
[77]    validation_0-rmse:24.87459  validation_1-rmse:41.77614
[78]    validation_0-rmse:29.63828  validation_1-rmse:27.51796
[79]    validation_0-rmse:23.43373  validation_1-rmse:36.54044
[80]    validation_0-rmse:21.80307  validation_1-rmse:38.42451
[81]    validation_0-rmse:45.01890  validation_1-rmse:63.13959
[82]    validation_0-rmse:32.98600  validation_1-rmse:48.51588
[83]    validation_0-rmse:1154.83826    validation_1-rmse:1046.83862
[84]    validation_0-rmse:596.76422 validation_1-rmse:899.20294
[85]    validation_0-rmse:8772.32227    validation_1-rmse:14788.31152
[86]    validation_0-rmse:15234.09082   validation_1-rmse:14237.62500
[87]    validation_0-rmse:12527.86426   validation_1-rmse:13914.09277
[88]    validation_0-rmse:11000.84277   validation_1-rmse:13445.76074
[89]    validation_0-rmse:15696.28613   validation_1-rmse:13946.85840
[90]    validation_0-rmse:85210.62500   validation_1-rmse:127271.79688
[91]    validation_0-rmse:116500.62500  validation_1-rmse:215355.65625
[92]    validation_0-rmse:149855.62500  validation_1-rmse:147734.62500
[93]    validation_0-rmse:151028.76562  validation_1-rmse:97522.35938
[94]    validation_0-rmse:286164.06250  validation_1-rmse:359728.84375
[95]    validation_0-rmse:149474.23438  validation_1-rmse:182052.50000
[96]    validation_0-rmse:156148.78125  validation_1-rmse:217708.90625
[97]    validation_0-rmse:114551.62500  validation_1-rmse:151682.79688
[98]    validation_0-rmse:104612.85156  validation_1-rmse:170244.31250
[99]    validation_0-rmse:256178.57812  validation_1-rmse:246638.64062

output 2 (fine):

[0] validation_0-rmse:2.73642   validation_1-rmse:2.73807
[1] validation_0-rmse:0.49221   validation_1-rmse:0.80462
[2] validation_0-rmse:0.31022   validation_1-rmse:0.73898
[3] validation_0-rmse:0.26974   validation_1-rmse:0.76231
[4] validation_0-rmse:0.22617   validation_1-rmse:0.61529
[5] validation_0-rmse:0.20344   validation_1-rmse:0.66840
[6] validation_0-rmse:0.18369   validation_1-rmse:0.62763
[7] validation_0-rmse:0.17476   validation_1-rmse:0.64966
[8] validation_0-rmse:0.16620   validation_1-rmse:0.60988
[9] validation_0-rmse:0.16017   validation_1-rmse:0.62756
[10]    validation_0-rmse:0.15479   validation_1-rmse:0.61354
[11]    validation_0-rmse:0.15247   validation_1-rmse:0.63041
[12]    validation_0-rmse:0.14641   validation_1-rmse:0.58863
[13]    validation_0-rmse:0.14544   validation_1-rmse:0.55724
[14]    validation_0-rmse:0.16165   validation_1-rmse:0.54285
[15]    validation_0-rmse:0.14305   validation_1-rmse:0.59282
[16]    validation_0-rmse:0.13728   validation_1-rmse:0.57130
[17]    validation_0-rmse:0.13325   validation_1-rmse:0.56199
[18]    validation_0-rmse:0.12974   validation_1-rmse:0.53802
[19]    validation_0-rmse:0.12596   validation_1-rmse:0.54721
[20]    validation_0-rmse:0.12342   validation_1-rmse:0.54109
[21]    validation_0-rmse:0.12143   validation_1-rmse:0.53365
[22]    validation_0-rmse:0.11954   validation_1-rmse:0.53702
[23]    validation_0-rmse:0.11721   validation_1-rmse:0.52632
[24]    validation_0-rmse:0.11521   validation_1-rmse:0.52671
[25]    validation_0-rmse:0.11325   validation_1-rmse:0.51527
[26]    validation_0-rmse:0.11148   validation_1-rmse:0.51392
[27]    validation_0-rmse:0.10978   validation_1-rmse:0.49357
[28]    validation_0-rmse:0.10803   validation_1-rmse:0.50030
[29]    validation_0-rmse:0.10657   validation_1-rmse:0.49821
[30]    validation_0-rmse:0.10624   validation_1-rmse:0.47754
[31]    validation_0-rmse:0.10450   validation_1-rmse:0.48614
[32]    validation_0-rmse:0.10336   validation_1-rmse:0.47555
[33]    validation_0-rmse:0.10213   validation_1-rmse:0.47663
[34]    validation_0-rmse:0.10139   validation_1-rmse:0.47462
[35]    validation_0-rmse:0.09979   validation_1-rmse:0.46085
[36]    validation_0-rmse:0.09875   validation_1-rmse:0.46658
[37]    validation_0-rmse:0.09780   validation_1-rmse:0.46026
[38]    validation_0-rmse:0.09702   validation_1-rmse:0.45724
[39]    validation_0-rmse:0.09638   validation_1-rmse:0.46206
[40]    validation_0-rmse:0.09570   validation_1-rmse:0.46017
[41]    validation_0-rmse:0.09500   validation_1-rmse:0.45447
[42]    validation_0-rmse:0.09431   validation_1-rmse:0.45097
[43]    validation_0-rmse:0.09371   validation_1-rmse:0.45112
[44]    validation_0-rmse:0.09322   validation_1-rmse:0.44389
[45]    validation_0-rmse:0.09271   validation_1-rmse:0.45073
[46]    validation_0-rmse:0.09199   validation_1-rmse:0.44402
[47]    validation_0-rmse:0.09145   validation_1-rmse:0.44305
[48]    validation_0-rmse:0.09091   validation_1-rmse:0.43982
[49]    validation_0-rmse:0.09028   validation_1-rmse:0.43441
[50]    validation_0-rmse:0.09004   validation_1-rmse:0.44175
[51]    validation_0-rmse:0.08931   validation_1-rmse:0.43299
[52]    validation_0-rmse:0.09034   validation_1-rmse:0.41695
[53]    validation_0-rmse:0.08860   validation_1-rmse:0.41444
[54]    validation_0-rmse:0.08798   validation_1-rmse:0.40965
[55]    validation_0-rmse:0.08734   validation_1-rmse:0.41013
[56]    validation_0-rmse:0.08744   validation_1-rmse:0.39615
[57]    validation_0-rmse:0.08636   validation_1-rmse:0.40437
[58]    validation_0-rmse:0.08597   validation_1-rmse:0.40617
[59]    validation_0-rmse:0.08559   validation_1-rmse:0.40638
[60]    validation_0-rmse:0.08518   validation_1-rmse:0.41139
[61]    validation_0-rmse:0.08472   validation_1-rmse:0.40855
[62]    validation_0-rmse:0.08427   validation_1-rmse:0.40601
[63]    validation_0-rmse:0.08386   validation_1-rmse:0.40446
[64]    validation_0-rmse:0.08357   validation_1-rmse:0.40676
[65]    validation_0-rmse:0.08347   validation_1-rmse:0.39509
[66]    validation_0-rmse:0.08295   validation_1-rmse:0.40182
[67]    validation_0-rmse:0.08269   validation_1-rmse:0.40343
[68]    validation_0-rmse:0.08294   validation_1-rmse:0.39187
[69]    validation_0-rmse:0.08231   validation_1-rmse:0.39857
[70]    validation_0-rmse:0.08200   validation_1-rmse:0.39805
[71]    validation_0-rmse:0.08178   validation_1-rmse:0.39975
[72]    validation_0-rmse:0.08200   validation_1-rmse:0.40522
[73]    validation_0-rmse:0.08104   validation_1-rmse:0.40048
[74]    validation_0-rmse:0.08073   validation_1-rmse:0.39871
[75]    validation_0-rmse:0.08041   validation_1-rmse:0.39395
[76]    validation_0-rmse:0.08022   validation_1-rmse:0.39725
[77]    validation_0-rmse:0.07989   validation_1-rmse:0.39610
[78]    validation_0-rmse:0.07964   validation_1-rmse:0.39375
[79]    validation_0-rmse:0.07942   validation_1-rmse:0.38979
[80]    validation_0-rmse:0.07920   validation_1-rmse:0.39015
[81]    validation_0-rmse:0.07914   validation_1-rmse:0.38749
[82]    validation_0-rmse:0.07890   validation_1-rmse:0.38585
[83]    validation_0-rmse:0.07868   validation_1-rmse:0.38665
[84]    validation_0-rmse:0.07842   validation_1-rmse:0.38147
[85]    validation_0-rmse:0.07819   validation_1-rmse:0.38246
[86]    validation_0-rmse:0.07805   validation_1-rmse:0.38351
[87]    validation_0-rmse:0.07796   validation_1-rmse:0.37884
[88]    validation_0-rmse:0.07770   validation_1-rmse:0.38242
[89]    validation_0-rmse:0.07750   validation_1-rmse:0.37763
[90]    validation_0-rmse:0.07724   validation_1-rmse:0.37871
[91]    validation_0-rmse:0.07702   validation_1-rmse:0.37974
[92]    validation_0-rmse:0.07679   validation_1-rmse:0.38147
[93]    validation_0-rmse:0.07664   validation_1-rmse:0.37735
[94]    validation_0-rmse:0.07644   validation_1-rmse:0.37873
[95]    validation_0-rmse:0.07632   validation_1-rmse:0.37661
[96]    validation_0-rmse:0.07610   validation_1-rmse:0.37877
[97]    validation_0-rmse:0.07587   validation_1-rmse:0.37659
[98]    validation_0-rmse:0.07572   validation_1-rmse:0.37648
[99]    validation_0-rmse:0.07556   validation_1-rmse:0.37356

UPDATED:

By using the public Boston housing dataset, and setting nthread=1, the training process became reproducible without exploding issues. It seems that the problem lies in my dataset. Code and output are as following:

Code:

from sklearn.datasets import load_boston
import sklearn 
from xgboost import XGBRegressor
import pandas as pd
import numpy as np

SEED = 123

X, y = load_boston(return_X_y=True)
np.random.seed(SEED)
indices = np.random.permutation(X.shape[0])
training_idx, test_idx = indices[:80], indices[80:]
train_X, test_X = X[training_idx,:], X[test_idx,:]
train_y, test_y = y[training_idx], y[test_idx]


params = {"booster":"gblinear",
          "eval_metric": "rmse",
          "predictor": "cpu_predictor",
          "max_depth": 16,
          "n_estimators":100,
          'random_state':SEED,
          'nthread':1,
          'early_stopping_rounds':5
         }

model = XGBRegressor(**params)
model.get_xgb_params()

model.fit(train_X, train_y,
        eval_set=[(train_X, train_y), (test_X, test_y)],
        eval_metric='rmse',
        verbose=True)

output:

{'objective': 'reg:squarederror',
 'base_score': None,
 'booster': 'gblinear',
 'colsample_bylevel': None,
 'colsample_bynode': None,
 'colsample_bytree': None,
 'gamma': None,
 'gpu_id': None,
 'interaction_constraints': None,
 'learning_rate': None,
 'max_delta_step': None,
 'max_depth': 16,
 'min_child_weight': None,
 'monotone_constraints': None,
 'n_jobs': None,
 'num_parallel_tree': None,
 'random_state': 123,
 'reg_alpha': None,
 'reg_lambda': None,
 'scale_pos_weight': None,
 'subsample': None,
 'tree_method': None,
 'validate_parameters': None,
 'verbosity': None,
 'eval_metric': 'rmse',
 'predictor': 'cpu_predictor',
 'nthread': 1,
 'early_stopping_rounds': 5}
Parameters: { early_stopping_rounds, max_depth, predictor } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[0] validation_0-rmse:8.38695   validation_1-rmse:8.88360
[1] validation_0-rmse:7.56356   validation_1-rmse:8.06591
[2] validation_0-rmse:7.24844   validation_1-rmse:7.71700
[3] validation_0-rmse:7.03799   validation_1-rmse:7.46547
[4] validation_0-rmse:6.86494   validation_1-rmse:7.25173
[5] validation_0-rmse:6.71517   validation_1-rmse:7.06397
[6] validation_0-rmse:6.58385   validation_1-rmse:6.89819
[7] validation_0-rmse:6.46814   validation_1-rmse:6.75184
[8] validation_0-rmse:6.36585   validation_1-rmse:6.62274
[9] validation_0-rmse:6.27512   validation_1-rmse:6.50893
[10]    validation_0-rmse:6.19437   validation_1-rmse:6.40863
[11]    validation_0-rmse:6.12223   validation_1-rmse:6.32025
[12]    validation_0-rmse:6.05754   validation_1-rmse:6.24240
[13]    validation_0-rmse:5.99930   validation_1-rmse:6.17386
[14]    validation_0-rmse:5.94666   validation_1-rmse:6.11355
[15]    validation_0-rmse:5.89892   validation_1-rmse:6.06053
[16]    validation_0-rmse:5.85546   validation_1-rmse:6.01398
[17]    validation_0-rmse:5.81576   validation_1-rmse:5.97318
[18]    validation_0-rmse:5.77938   validation_1-rmse:5.93750
[19]    validation_0-rmse:5.74595   validation_1-rmse:5.90638
[20]    validation_0-rmse:5.71514   validation_1-rmse:5.87933
[21]    validation_0-rmse:5.68669   validation_1-rmse:5.85592
[22]    validation_0-rmse:5.66035   validation_1-rmse:5.83575
[23]    validation_0-rmse:5.63591   validation_1-rmse:5.81850
[24]    validation_0-rmse:5.61321   validation_1-rmse:5.80385
[25]    validation_0-rmse:5.59208   validation_1-rmse:5.79153
[26]    validation_0-rmse:5.57239   validation_1-rmse:5.78130
[27]    validation_0-rmse:5.55401   validation_1-rmse:5.77294
[28]    validation_0-rmse:5.53685   validation_1-rmse:5.76626
[29]    validation_0-rmse:5.52081   validation_1-rmse:5.76107
[30]    validation_0-rmse:5.50579   validation_1-rmse:5.75723
[31]    validation_0-rmse:5.49174   validation_1-rmse:5.75458
[32]    validation_0-rmse:5.47856   validation_1-rmse:5.75300
[33]    validation_0-rmse:5.46621   validation_1-rmse:5.75237
[34]    validation_0-rmse:5.45463   validation_1-rmse:5.75258
[35]    validation_0-rmse:5.44376   validation_1-rmse:5.75354
[36]    validation_0-rmse:5.43355   validation_1-rmse:5.75516
[37]    validation_0-rmse:5.42396   validation_1-rmse:5.75736
[38]    validation_0-rmse:5.41496   validation_1-rmse:5.76008
[39]    validation_0-rmse:5.40649   validation_1-rmse:5.76324
[40]    validation_0-rmse:5.39853   validation_1-rmse:5.76679
[41]    validation_0-rmse:5.39104   validation_1-rmse:5.77068
[42]    validation_0-rmse:5.38399   validation_1-rmse:5.77484
[43]    validation_0-rmse:5.37735   validation_1-rmse:5.77926
[44]    validation_0-rmse:5.37111   validation_1-rmse:5.78387
[45]    validation_0-rmse:5.36522   validation_1-rmse:5.78865
[46]    validation_0-rmse:5.35967   validation_1-rmse:5.79357
[47]    validation_0-rmse:5.35443   validation_1-rmse:5.79859
[48]    validation_0-rmse:5.34949   validation_1-rmse:5.80369
[49]    validation_0-rmse:5.34483   validation_1-rmse:5.80885
[50]    validation_0-rmse:5.34042   validation_1-rmse:5.81403
[51]    validation_0-rmse:5.33626   validation_1-rmse:5.81924
[52]    validation_0-rmse:5.33231   validation_1-rmse:5.82444
[53]    validation_0-rmse:5.32859   validation_1-rmse:5.82962
[54]    validation_0-rmse:5.32505   validation_1-rmse:5.83477
[55]    validation_0-rmse:5.32170   validation_1-rmse:5.83988
[56]    validation_0-rmse:5.31853   validation_1-rmse:5.84493
[57]    validation_0-rmse:5.31551   validation_1-rmse:5.84992
[58]    validation_0-rmse:5.31265   validation_1-rmse:5.85483
[59]    validation_0-rmse:5.30992   validation_1-rmse:5.85966
[60]    validation_0-rmse:5.30732   validation_1-rmse:5.86441
[61]    validation_0-rmse:5.30485   validation_1-rmse:5.86906
[62]    validation_0-rmse:5.30249   validation_1-rmse:5.87362
[63]    validation_0-rmse:5.30024   validation_1-rmse:5.87808
[64]    validation_0-rmse:5.29808   validation_1-rmse:5.88244
[65]    validation_0-rmse:5.29602   validation_1-rmse:5.88668
[66]    validation_0-rmse:5.29404   validation_1-rmse:5.89082
[67]    validation_0-rmse:5.29215   validation_1-rmse:5.89485
[68]    validation_0-rmse:5.29033   validation_1-rmse:5.89877
[69]    validation_0-rmse:5.28858   validation_1-rmse:5.90257
[70]    validation_0-rmse:5.28690   validation_1-rmse:5.90626
[71]    validation_0-rmse:5.28527   validation_1-rmse:5.90984
[72]    validation_0-rmse:5.28371   validation_1-rmse:5.91331
[73]    validation_0-rmse:5.28219   validation_1-rmse:5.91666
[74]    validation_0-rmse:5.28073   validation_1-rmse:5.91990
[75]    validation_0-rmse:5.27931   validation_1-rmse:5.92303
[76]    validation_0-rmse:5.27794   validation_1-rmse:5.92605
[77]    validation_0-rmse:5.27661   validation_1-rmse:5.92896
[78]    validation_0-rmse:5.27531   validation_1-rmse:5.93176
[79]    validation_0-rmse:5.27405   validation_1-rmse:5.93445
[80]    validation_0-rmse:5.27282   validation_1-rmse:5.93704
[81]    validation_0-rmse:5.27163   validation_1-rmse:5.93953
[82]    validation_0-rmse:5.27046   validation_1-rmse:5.94192
[83]    validation_0-rmse:5.26932   validation_1-rmse:5.94420
[84]    validation_0-rmse:5.26820   validation_1-rmse:5.94639
[85]    validation_0-rmse:5.26711   validation_1-rmse:5.94848
[86]    validation_0-rmse:5.26604   validation_1-rmse:5.95048
[87]    validation_0-rmse:5.26499   validation_1-rmse:5.95238
[88]    validation_0-rmse:5.26396   validation_1-rmse:5.95420
[89]    validation_0-rmse:5.26294   validation_1-rmse:5.95592
[90]    validation_0-rmse:5.26195   validation_1-rmse:5.95756
[91]    validation_0-rmse:5.26097   validation_1-rmse:5.95912
[92]    validation_0-rmse:5.26000   validation_1-rmse:5.96059
[93]    validation_0-rmse:5.25905   validation_1-rmse:5.96198
[94]    validation_0-rmse:5.25811   validation_1-rmse:5.96329
[95]    validation_0-rmse:5.25718   validation_1-rmse:5.96453
[96]    validation_0-rmse:5.25627   validation_1-rmse:5.96569
[97]    validation_0-rmse:5.25537   validation_1-rmse:5.96678
[98]    validation_0-rmse:5.25447   validation_1-rmse:5.96779
[99]    validation_0-rmse:5.25359   validation_1-rmse:5.96874

Solution

  • Weird. Here's what I'd try, in order:

    1. Before you start training, use get_params/get_xgb_params() on your XGBRegressor model to make sure it actually used the random_state parameter you passed in. Ditto, look at the verbose log to make sure training used it.
    2. Look at the target variable y. Is its distribution very weird, non-continuous? Please show us a plot or histogram? or at least some summary statistics (min, max, mean, median, sd, 1st and 3rd quartiles)? Is the un-stratified split affecting your training? (show the descriptive statistics before and after split, also on the eval set, these three sets shouldn't differ wildly). Is it easier to try to model log(y), sqrt(y), exp(y) or somesuch? Can you debug which rows are contributing to the CV error?
    3. Also, for full determinism, set nthread parameter to 1 (single-core). The default is nthread==-1 (use all cores). Then rerun runs 1 and 2 and update the results in your question.
    4. Fail all that, can you make a reproducible example (MCVE)? Either tell us where your dataset is publicly-available (source URL, not cloud link please), or make a reproducible example using any publicly-available dataset?