aho-corasick

How should duplicate keywords be handled when using Aho-Corasick?


When building the tree, how do you handle duplicates in the list of keywords? Should you ensure the list is free from duplicates before tree building?


Solution

  • I think you should avoid it. Simple because it gives for the dupes different states it makes the automata bigger and longer to search.