I'm new on this, but I'd like to plot a ROC curve for a small dataset of active compounds versus decoys. I based myself on this link: ROC curve for binary classification in python In this case, this small dataset is a result of a virtual screening that ranked and scored the compounds with known activity or inactivity from experimental data (IC50).
I'm not sure if the plot and the AUC are correct. I noticed that even if there was only one-value difference between the test (true) predicted values, the AUC was only 0.5. For the true and predicted values in the code I inserted below, it was around 0.49 only. Perhaps the model was not properly identifying the compounds. However, I noticed that for the first ten compounds in the rank, it identified correctly, besides some in other positions. Maybe it better identified active compounds than negative ones, or maybe it was because there were more active compounds to be considered. Also, would it be better to use another classification system for the tested and predicted values, other than a binary classification? For example, ranking the IC50 values from best to worst and comparing with the virtual screening rank, creating a score for the true and predicted results, considering the similarity between the ranks of each compound (for IC50 and virtual screening)?
I also thought in doing a precision-recall curve, considering the data imbalance between the quantity of active compounds and decoys.
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc, roc_auc_score
test = [1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,0,1,1,1]
pred = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0]
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(2):
fpr[i], tpr[i], _ = roc_curve(test, pred)
roc_auc[i] = auc(fpr[i], tpr[i])
print(roc_auc_score(test, pred))
plt.figure()
plt.plot(fpr[1], tpr[1])
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.show()
The code required to plot the ROC curve is very similar but simpler than yours. There is no need to store fpr and tpr as dictionaries, they are arrays. I think the problem is your predictions are absolute True/False, and not a probability that can be used to generate the threshold values using the roc_curve function. I changed the pred values to a probability (> 0.5 is True, < 0.5 is False) and the curve now looks closer to what you probably expect. Also, only 66% of the predictions are correct, and that makes the curve be relatively close to the 'no-discrimination' line (random event with 50% probability).
test = [1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,0,1,1,1]
pred = [0.91,0.87,0.9,0.75,0.85,0.97,0.99,0.98,0.66,0.97,0.98,0.57,0.89,0.62,0.93,0.97,0.55,0.99,0.11,0.84,0.45,0.35,0.3,0.39]
fpr, tpr, _ = roc_curve(test, pred)
roc_auc = auc(fpr, tpr)
print(roc_auc_score(test, pred))
plt.figure()
plt.plot(fpr, tpr)
plt.plot([0.0, 1.0], [0.0, 1.0], ls='--', lw=0.3, c='k')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.show()
Now the AUC value is 0.5842105263157894.