machine-learningpysparkapache-spark-mlmulticlass-classification

Get all evaluation metrics after classification in pyspark


I have trained a model and want to calculate several important metrics such as accuracy, precision, recall, and f1 score.

The process I followed is:

from pyspark.ml.classification import LogisticRegression

lr = LogisticRegression(featuresCol='features',labelCol='label')
lrModel = lr.fit(train)
lrPredictions = lrModel.transform(test)

from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.evaluation import BinaryClassificationEvaluator

eval_accuracy = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")
eval_precision = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="precision")
eval_recall = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="recall")
eval_f1 = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="f1Measure")

eval_auc = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="prediction")

accuracy = eval_accuracy.evaluate(lrPredictions)
precision = eval_precision.evaluate(lrPredictions)
recall = eval_recall.evaluate(lrPredictions)
f1score = eval_f1.evaluate(lrPredictions)

auc = eval_accuracy.evaluate(lrPredictions)

However, it can only calculate accuracy and auc, but not the three others. What should I modify here?


Solution

  • According to the docs, for the F1 measure, precision, and recall, the relevant arguments of MulticlassClassificationEvaluator should be respectively

    metricName="f1"
    metricName="precisionByLabel"
    metricName="recallByLabel"