I have a string that consists of characters A,B,C and D and I am trying to calculate the length of the longest substring that has an equal amount of each one of these characters in any order. For example ABCDB would return 4, ABCC 0 and ADDBCCBA 8.
My code currently:
public int longestSubstring(String word) {
HashMap<Integer, String> map = new HashMap<Integer, String>();
for (int i = 0; i<word.length()-3; i++) {
map.put(i, word.substring(i, i+4));
}
StringBuilder sb;
int longest = 0;
for (int i = 0; i<map.size(); i++) {
sb = new StringBuilder();
sb.append(map.get(i));
int a = 4;
while (i<map.size()-a) {
sb.append(map.get(i+a));
a+= 4;
}
String substring = sb.toString();
if (equalAmountOfCharacters(substring)) {
int length = substring.length();
if (length > longest)
longest = length;
}
}
return longest;
}
This currently works pretty well if the string length is 10^4 but I'm trying to make it 10^5. Any tips or suggestions would be appreciated.
Let's assume that cnt(c, i)
is the number of occurrences of the character c
in the prefix of length i
.
A substring (low, high]
has an equal amount of two characters a
and b
iff cnt(a, high) - cnt(a, low) = cnt(b, high) - cnt(b, low)
, or, put it another way, cnt(b, high) - cnt(a, high) = cnt(b, low) - cnt(a, low)
. Thus, each position is described by a value of cnt(b, i) - cnt(a, i)
. Now we can generalize it for more that two characters: each position is described by a tuple (cnt(a_2, i) - cnt(a_1, i), ..., cnt(a_k, i) - cnt(a_1, i))
, where a_1 ... a_k
is the alphabet.
We can iterate over the given string and maintain the current tuple. At each step, we should update the answer by checking the value of i - first_occurrence(current_tuple)
, where first_occurrence
is a hash table that stores the first occurrence of each tuple seen so far. Do not forget to put a tuple of zeros to the hash map before iteration(it corresponds to an empty prefix).