pythonh2oh2o.ai

GLRM in H2O - Performance Metrics return NaN


I'm using a generalized low-rank estimator to infer missing values in a data set regarding sensor readings. I'm using H2O to create and train the model:

glrm = H2OGeneralizedLowRankEstimator(k=10,
                                      loss="quadratic",
                                      gamma_x=0.5,
                                      gamma_y=0.5,
                                      max_iterations=2000,
                                      recover_svd=True,
                                      init="SVD",
                                      transform="standardize")
glrm.train(training_frame=train)

After the model is trained, the information provided regarding the performance metrics (MSE and RMSE) both return NaN. Does anybody know why? Firstly I thought it could be related to NaN entries in the data set, but I have already tried with one that is complete, and the same problem occurs. I need this information to perform a grid search over some of the model parameters to select the best one.

Thank you very much,

Luísa Nogueira


Solution

  • Below is the example found in the docs. It is expected to get MSE as NaN. It may be better to exclude it from the output. Check to see if you get Sum of Squared Error (Numeric) or use the loss function (objective) as you defined as "quadratic".

    import h2o
    from h2o.estimators import H2OGeneralizedLowRankEstimator
    h2o.init()
    
    # Import the USArrests dataset into H2O:
    arrestsH2O = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv")
    
    # Split the dataset into a train and valid set:
    train, valid = arrestsH2O.split_frame(ratios=[.8], seed=1234)
    
    # Build and train the model:
    glrm_model = H2OGeneralizedLowRankEstimator(k=4,
                                                loss="quadratic",
                                                gamma_x=0.5,
                                                gamma_y=0.5,
                                                max_iterations=700,
                                                recover_svd=True,
                                                init="SVD",
                                                transform="standardize")
    glrm_model.train(training_frame=train)
    

    Returns MSE and RMSE and NaN:

    Model Details ============= H2OGeneralizedLowRankEstimator : Generalized Low Rank Modeling Model Key: GLRM_model_python_1617769810268_1

    Model Summary: number_of_iterations final_step_size final_objective_value 0 58.0 0.00005 8.250804e-31

    ModelMetricsGLRM: glrm ** Reported on train data. **

    MSE: NaN RMSE: NaN
    Sum of Squared Error (Numeric): 1.9833472629189004e-13
    Misclassification Error (Categorical): 0.0