pythonmachine-learningnlpnltk

NLTK agreement with distance metric


I have a task to calculate inter-annotator agreement in multi-label classification, where for each example more than one label can be assigned. I found that NLTK can measure agreement based on a distance metric.

I am looking for an example of calculating krippendorff alpha with MASI distance.

This is what I have.

import nltk
from nltk.metrics import masi_distance


toy_data = [['1', 5723, [1,2]],['2', 5723, [2,3]]]

task = nltk.metrics.agreement.AnnotationTask(data=toy_data, distance=masi_distance)
print task.alpha()

This code fails with

TypeError: unhashable type: 'list'

The following doesn't work either:

toy_data = [['1', 5723, set([1,2])],['2', 5723, set([2,3])]]

Do you have a working example? Thank you!


Solution

  • To be more precise, what needs to be a frozenset (as @alexis has pointed out) is just the third member of the triple, this is the labels assigned to the item.

    toy_data = [['1', 5723, frozenset([1,2])],['2', 5723, frozenset([2,3])]]