I am studying some Suffix tree implementation and here is one reference implementation, and question is how "indexes" (refer line 19) is used for class SuffixTreeNode? I am not sure if "indexes" is useful and I think probably we just need to keep all nodes and their children character value? Not find too much values of "indexes" is used for class SuffixTreeNode.
Please feel free to correct me. Any insights are appreciated.
public class SuffixTree {
SuffixTreeNode root = new SuffixTreeNode();
public SuffixTree(String s) {
for (int i = 0; i < s.length(); i++) {
String suffix = s.substring(i);
root.insertString(suffix, i);
}
}
public ArrayList<Integer> getIndexes(String s) {
return root.getIndexes(s);
}
}
public class SuffixTreeNode {
HashMap<Character, SuffixTreeNode> children = new
HashMap<Character, SuffixTreeNode>();
char value;
ArrayList<Integer> indexes = new ArrayList<Integer>();
public SuffixTreeNode() { }
public void insertString(String s, int index) {
indexes.add(index);
if (s != null && s.length() > 0) {
value = s.charAt(0);
SuffixTreeNode child = null;
if (children.containsKey(value)) {
child = children.get(value);
} else {
child = new SuffixTreeNode();
children.put(value, child);
}
String remainder = s.substring(1);
child.insertString(remainder, index);
}
}
public ArrayList<Integer> getIndexes(String s) {
if (s == null || s.length() == 0) {
return indexes;
} else {
char first = s.charAt(0);
if (children.containsKey(first)) {
String remainder = s.substring(1);
return children.get(first).getIndexes(remainder);
}
}
return null;
}
}
public class Question {
public static void main(String[] args) {
String testString = “mississippi”;
String[] stringList = {“is”, “sip”, “hi”, “sis”};
SuffixTree tree = new SuffixTree(testString);
for (String s : stringList) {
ArrayList<Integer> list = tree.getIndexes(s);
if (list != null) {
System.out.println(s + “: “ + list.toString());
}
}
}
}
indexes
is surely needed for the implementation you are looking at of Suffix Tree (there are multiple versions of suffix tree some more optimized than others). The indexes
variable plays an integral part in returning the indices where the sub-string (is, sip, hi, sis) exist in the original string (mississippi) back to the calling method. getIndexes
returns indexes
in its base case this is how you get the list of occurrences of each sub-string. see below output
is: [1, 4]
sip: [6]
sis: [3]