pythonyamlmlflow

Hi. I am very new to MLFlow, and want to implement MLFlow project on my own ML model. However I am getting ""Could not find main among entry points""


The full error message is as below:

ERROR mlflow.cli: === Could not find main among entry points [] or interpret main as a runnable script. Supported script file extensions: ['.py', '.sh'] ===

I also try the solutions suggested here https://github.com/mlflow/mlflow/issues/1094, but the result is the same.

Below I provide all the required files to run MLflow project.

The conda.yaml file

name: lightgbm-example
channels:
  - conda-forge
dependencies:
  - python=3.6
  - pip
  - pip:
      - mlflow>=1.6.0
      - lightgbm
      - pandas
      - numpy

The MLProject file

name: lightgbm-example
conda_env: ~/Desktop/MLflow/conda.yaml
entry-points:
    main:
      parameters:
        learning_rate: {type: float, default: 0.1}
        colsample_bytree: {type: float, default: 1.0}
        subsample: {type: float, default: 1.0} 
      command: |
          python3 ~/Desktop/MLflow/Test.py \
            --learning-rate={learning_rate} \
            --colsample-bytree={colsample_bytree} \
            --subsample={subsample}

My Test.py file

import pandas as pd
import lightgbm as lgb
import numpy as np
import mlflow
import mlflow.lightgbm
import argparse
from sklearn.metrics import accuracy_score, confusion_matrix


def parse_args():
    parser = argparse.ArgumentParser(description="LightGBM example")
    parser.add_argument(
        "--learning-rate",
        type=float,
        default=0.1,
        help="learning rate to update step size at each boosting step (default: 0.3)",
    )
    parser.add_argument(
        "--colsample-bytree",
        type=float,
        default=1.0,
        help="subsample ratio of columns when constructing each tree (default: 1.0)",
    )
    parser.add_argument(
        "--subsample",
        type=float,
        default=1.0,
        help="subsample ratio of the training instances (default: 1.0)",
    )
    return parser.parse_args()

def find_specificity(c_matrix):
    specificity = c_matrix[1][1]/(c_matrix[1][1]+c_matrix[0][1])
    return specificity
    
    
def main():

    args = parse_args()

    df = pd.read_csv('~/Desktop/MLflow/Churn_demo.csv')
    train_df = df.sample(frac=0.8, random_state=25)
    test_df = df.drop(train_df.index)


        
    train_df.drop(['subscriberid'], axis = 1, inplace = True)
    test_df.drop(['subscriberid'], axis = 1, inplace = True)

    TrainX = train_df.iloc[:,:-1]
    TrainY = train_df.iloc[:,-1]

    TestX = test_df.iloc[:,:-1]
    TestY = test_df.iloc[:,-1]
    
    mlflow.lightgbm.autolog()
    
    dtrain = lgb.Dataset(TrainX, label=TrainY)
    dtest = lgb.Dataset(TestX, label=TestY)
    
    with mlflow.start_run():

        parameters = {
            'objective': 'binary',
            'device':'cpu',
            'num_threads': 6,
            'num_leaves': 127,
            'metric' : 'binary',
            'lambda_l2':5,
            'max_bin': 63,
            'bin_construct_sample_cnt' :2*1000*1000,
            'learning_rate': args.learning_rate,
            'colsample_bytree': args.colsample_bytree,
            'subsample': args.subsample,
            'verbose': 1
        }



        model = lgb.train(parameters,
                       dtrain,
                       valid_sets=dtest,
                       num_boost_round=10000,
                       early_stopping_rounds=10)
                       
               
        y_proba=model.predict(TestX)
        pred=np.where(y_proba>0.25,1,0) 
        conf_matrix = confusion_matrix(TestY,pred)
        
        specificity = find_specificity(conf_matrix)
        acc = accuracy_score(TestY,pred)
        
        mlflow.log_metric({"specificity" : specificity, "accuracy" : acc})


if __name__ == "__main__":
    main()
        

Solution

  • Fortunately, I have been resolved my problem. I list some solutions for the same error which can help you in the future if you face the same problem.

    1. File names. The file names should be the same suggested in MLFlow docs. For example not conda.yamp, but conda.yaml, as there was such problem here.
    2. The conda.yaml file does not support Tab, please consider using spaces instead.
    3. In the MLProject file name 'P' should be the upper case before MLFlow 1.4. But the later versions it does not matter as explained here.
    4. (In my case) MLProject file is space sensitive. Let these GitHub examples guide you.