stringalgorithmbioinformaticssuffix-tree

Shortest not repeatable Substring with Suffix-Tree


I need to design an efficient algorithm that finds the Shortest non-repeatable Substring in a text. In essence; the shortest string that appears only once in a text. This has to be made only with suffix-tree

Example 1:

Text: AATGCCTA then Result: G

Example 2:

Text: AAAAGGGG then Result: AG


Solution

  • The shortes non-repeatable substring is the shortest unique prefix of all the suffixes.

    See Minimum Unique Substrings and Maximum Repeats by Lucian Ilie and W. F. Smyth.