Say that I want to have annotations for documents. Every document can be annotated with multiple labels. In this example, I have 2 annotators (a and b), and they each label two documents.
from sklearn.metrics import cohen_kappa_score
annotator_a = [
["a","b","c"],
["d","e"]
]
annotator_b = [
["b","c"],
["f"]
]
Annotator_a labels document 1 with labels a, b and c. Annotator_b labels documents 1 with labels b and c.
I tried to calculate annotator agreement using:
cohen_kappa_score(annotator_a, annotator_b)
But this results in an error:
ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.
Any ideas on how I can calculate annotator agreement on this set?
Cohen's Kappa does not support multi-label input. Instead of using Cohen's Kappa, one could use Krippendorff's Alpha. This measure supports inter-rater agreement, missing values and non-exclusive topics. It is available on pypi.