In a previous question, I asked about optimizing the decision threshold of a prediction model. The solution led me to the z3py
library.
I am now trying a similar setup as before, but want to optimize the decision threshold of a binary prediction model to maximize the accuracy.
However, I found that optimization on the threshold results in worse performance than with the default threshold (which could also be chosen by the optimizer).
My MWP is below (it uses fixed-seed random targets and probabilities to replicate my findings):
import numpy as np
from z3 import z3
def compute_eval_metrics(ground_truth, predictions):
from sklearn.metrics import accuracy_score, f1_score
accuracy = accuracy_score(ground_truth, predictions)
macro_f1 = f1_score(ground_truth, predictions, average="macro")
return accuracy, macro_f1
def optimization_acc_target(
predictions: np.array,
ground_truth: np.array,
default_threshold=0.5,
):
tp = np.sum((predictions > default_threshold) & (ground_truth == 1))
tn = np.sum((predictions <= default_threshold) & (ground_truth == 0))
initial_accuracy = (tp + tn) / len(ground_truth)
print(f"Accuracy: {initial_accuracy:.3f}")
_, initial_macro_f1_score = compute_eval_metrics(
ground_truth, np.where(predictions > default_threshold, 1, 0)
)
n = len(ground_truth)
iRange = range(n)
threshold = z3.Real("threshold")
opt = z3.Optimize()
predictions = predictions.tolist()
ground_truth = ground_truth.tolist()
true_positives = z3.Sum(
[
z3.If(predictions[i] > threshold, 1, 0)
for i in iRange
if ground_truth[i] == 1
]
)
true_negatives = z3.Sum(
[
z3.If(predictions[i] <= threshold, 1, 0)
for i in iRange
if ground_truth[i] == 0
]
)
acc = z3.Sum(true_positives, true_negatives) / n
# Add constraints
opt.add(threshold >= 0.0)
opt.add(threshold <= 1.0)
# Maximize accuracy
opt.maximize(acc)
if opt.check() == z3.sat:
m = opt.model()
t = m[threshold].as_decimal(10)
if type(t) == str:
if len(t) > 1:
t = t[:-1]
t = float(t)
print(f"Optimal threshold: {t}")
optimized_accuracy, optimized_macro_f1_score = compute_eval_metrics(
ground_truth, np.where(np.array(predictions) > t, 1, 0)
)
print(f"Accuracy: {optimized_accuracy:.3f} (was: {initial_accuracy:.3f})")
print(
f"Macro F1 Score: {optimized_macro_f1_score:.3f} (was: {initial_macro_f1_score:.3f})"
)
print()
else:
print("Failed to optimize")
np.random.seed(42)
ground_truth = np.random.randint(0, 2, size=50)
predictions = np.random.rand(50)
optimization_acc_target(
predictions=predictions,
ground_truth=ground_truth,
)
In my code, I am using the true positive and true negative count to yield the accuracy.
The output is:
Accuracy: 0.600
Optimal threshold: 0.9868869366
Accuracy: 0.480 (was: 0.600)
Macro F1 Score: 0.355 (was: 0.599)
It always returns a worse solution than the default threshold of 0.5
). I am puzzled why this could be the case? Should it not performe at least as good as the default solution?
To solve this, I tried using constructs from z3py
(e.g. z3.If
in the z3.Sum
parts), thinking that maybe different data types lead to wrong results? But this turned out to not make a difference (which is good, as this aligns with an official example). I also found this GitHub issue, but that seems to relate to a case with non-linear constraints (which I am not using).
I am now wondering: what causes the results with the optimized threshold to be worse than the default threshold? I appreciate pointers to further resources and background information.
I found the solution, and it was simple, I am afraid:
In the posted question, I used integer division.
acc = z3.Sum(true_positives, true_negatives) / n
Following more checking, I found another SO question here. This brough me to the line causing troubles given above.
What worked in the end was:
# yes
# acc = z3.ToReal(true_positives + true_negatives) / n
# alternatively, only maxiimize TP and TN count (gives same results):
acc = true_positives + true_negatives
This gives following expected output (accuracy and F1 improvements):
Optimal threshold: 0.3886772896
Accuracy: 0.620 (was: 0.600)
Macro F1 Score: 0.616 (was: 0.599)
For the records and further searchers: following attempts do not work:
# No: acc = z3.ToReal((true_positives + true_negatives) / n)
# No: acc = z3.Real(true_positives + true_negatives) / n