pythonjupyter-notebookxgbclassifier

OSError: exception: access violation reading 0x0000000000000008 with XGBOOST classifier


I initially trained my model using an XGBOOST classifier and everything worked fine. Now, I am trying to train the model on the same data set using an XGBOOST classifier but I am running into this error: OSError: exception: access violation reading 0x0000000000000008.

This time around, I am using sklearn's bootstrapping method to randomly sample from the dataset. I first split the data set into a train set and a test set. Then I randomly sampled from the train and test sets to create 50 samples each for training and testing respectively.

The model is catching error around the .fit() line.

Kindly direct me on how I can fix this error, please.

I tried running the model outside the for loop and everything works fine but when I try with the bootstrap method then I catch the error again.enter image description here

# Read each file and do analysis
for i in range(50):

    # read train and test data
    train_data = pd.read_csv(train_path + "\\" + "train" + str(i) + ".csv")
    test_data = pd.read_csv(test_path + "\\" + "test" + str(i) + ".csv")

    # Covert gender to binary
    train_data['gender'] = train_data['gender'].map({1:1, 2:0})
    test_data['gender'] = test_data['gender'].map({1:1, 2:0})

    # Apply standard scalar to numerical columns
    sc = StandardScaler()
    train_data[['age', 'RXDCOUNT', 'income', 'RXDDAYS', 'ALQ130', 'OCD270', 'BMXBMI', 'BMXHT', 'BMXWT']] =  sc.fit_transform(train_data[['age', 'RXDCOUNT', 'income', 'RXDDAYS', 'ALQ130', 'OCD270', 'BMXBMI', 'BMXHT', 'BMXWT']])
    test_data[['age', 'RXDCOUNT', 'income', 'RXDDAYS', 'ALQ130', 'OCD270', 'BMXBMI', 'BMXHT', 'BMXWT']] =  sc.fit_transform(test_data[['age', 'RXDCOUNT', 'income', 'RXDDAYS', 'ALQ130', 'OCD270', 'BMXBMI', 'BMXHT', 'BMXWT']])

    # Create X_train, X_test, y_train, y_test
    y_train = train_data["depression"]
    y_test = test_data["depression"]
    X_train = train_data.drop("depression", axis=1, inplace=True)
    X_test = test_data.drop("depression", axis=1, inplace=True)
    #print(y_train)

    # Create model
    model = XGBClassifier(use_label_encoder=False)

    # Fit model with train data
    _= model.fit(X_train, y_train)

    # Predict on test set
    y_pred = model.predict(X_test)

    # Get accuracy of model
    acc = model.score(X_test, y_test)

    # get balanced accuracy
    balAcc = balanced_accuracy_score(y_test, y_pred)

    # roc_auc
    roc_auc = roc_auc_score(y_true=y_test,y_score=model.predict_proba(X_test)[:,1])

    # add y_pred to test set
    predict_dataframe = prediction_dataframe(test_data, y_pred)

    # define protected attributes.
    p_attr1 = "gender"
    p_attr2 = "ethnicity"

    # compute TP, FP, TN, FN based on single protected attributes
    tp, fp, tn, fn = compute_metrics_s(predict_dataframe, p_attr1)

    # compute TPR based on single protected attributes
    tpr_male = list(tp.values())[0] / np.add(list(tp.values())[0], list(fn.values())[0])
    tpr_female = list(tp.values())[1] / np.add(list(tp.values())[1], list(fn.values())[1])

    EOD = np.subtract(tpr_male, tpr_female)

    dic_data["roc_auc"].append(roc_auc)
    dic_data["bacc"].append(balAcc)
    dic_data["EOD"].append(EOD)
    dic_data["tpr_male"].append(tpr_male)
    dic_data["tpr_female"].append(tpr_female)

    i += 1
    if i == 49:
        df = pd.DataFrame.from_dict(dic_data)
        df.to_csv(results\dataframe\suppression\gender.csv", index=True)

Solution

  • The issue was with my X_train and X_test were returning None datatypes. So when I modified the following lines;

    X_train = train_data.drop("depression", axis=1, inplace=True)
    X_test = test_data.drop("depression", axis=1, inplace=True)
    

    to: X_train = train_data.drop("depression", axis=1)

     X_test = test_data.drop("depression", axis=1)
    

    then the problem was solved.