labelingsnorkel

What if my Snorkel labeling function has a very low coverage over a development set?


I am trying to label a span recognition dataset using Snorkel and am currently at the stage of improving labeling functions. One of the LF has a rather low coverage because it only labels a subclass of one of the entity spans. What would be the impact of low coverage labeling functions on the final downstream span recognition model?


Solution

  • Even if the labeling function is low coverage, it might have high empirical accuracy over the class it is labeling. According to this video on "Best Practices for Improving Your Labeling Functions" from Snorkel co-founder Paroma Verma, those Snorkel LF's that have low coverage but good empirical accuracy should not be discarded.