javaalgorithmtime-complexity

Find the number of valid substrings


Given a string of length n, I want to calculate how many substrings are possible having below characteristics:

a) substring length is even b) there exists a character in this substring whose frequency is equal to half the length of substring.

Example s="idafddfii", output = 13

Explanation:

The valid substrings are: "id", "da", "af", "fd", "df", "fi", "dafd", "afdd", "fddf", "ddfi", "dfii", "idafdd", "dafddf"

constraints:

1 <= n <= 10 power 5

s consists of lowercase english alphabets only

public class Main {

    public static long solve(String s) {
        int n = s.length();
        long result = 0;
    
        for (int i = 0; i < n; i++) {
            int[] freq = new int[26];
            for (int j = i; j < n; j++) {
                freq[s.charAt(j) - 'a']++;
                int len = j - i + 1;
                // Only check even-length substrings
                if (len % 2 == 0) {
                    if (isValid(freq, len)) {
                        result++;
                    }
                }
            }
        }
        return result;
    }
    
    private static boolean isValid(int[] freq, int len) {
        int half = len / 2;
        for (int count : freq) {
            if (count == half) {
                return true;
            }
        }
        return false;
    }
    
    public static void main(String[] args) {
        String s1 = "aaaaid";
        String s2 = "aidfg";
        String s3 = "ababbab";
    
        System.out.println(solve(s1)); // Output: 3
        System.out.println(solve(s2)); // Output: 4
        System.out.println(solve(s3)); // Output: 8
    }

}

My code runs in time complexity of O(n^2), I want to reduce this time complexity, what is the correct approach to solve this in less time.

Based on answer provided by @Unmitigated, I tried to build cumulative frequency of each character like this, now got stuck on how to solve the problem using this cumulative frequency.

import java.util.*;
public class Main {

    public static int solve(String s) {
        int n = s.length();
        int result = 0;
        Map<Character, int[]> map = new HashMap<>();
        for(int i=0; i<n; i++) {
            char ch = s.charAt(i);
            int[] cnt = map.getOrDefault(ch, new int[n]);
            cnt[i] += i == 0 ? 1 : cnt[i-1]+1;
            map.put(ch, cnt);
        }
        for(char c : map.keySet()) {
            System.out.println(c + ":" + Arrays.toString(map.get(c)));
        } 
        // what to do next
        return result;
    }

    public static void main(String[] args) {
        String s = "idafddfii";
        int output = solve(s);
        System.out.println(output); // Output: 13
    }
}

Solution

  • While iterating over the string, keep track of the cumulative frequency of each character. At each index r, for each of the 26 possible characters, we are looking for a lower index l such that cnt[r] - cnt[l] = (r - l) / 2 (where cnt[i] denotes the cumulative frequency of the current character up to index i). Since we only want to work with integers, we rewrite this as 2 * (cnt[r] - cnt[l]) = r - l. This further rearranges to 2 * cnt[r] - r = 2 * cnt[l] - l. Therefore, we can keep track of the count of each 2 * cnt[i] - i at each index for each character in a map and look up the number of previous indexes that have the same 2 * cnt[i] - i on each iteration.

    We also have to consider strings comprising only two characters of equal frequency, which need to be subtracted out to avoid overcounting. One way to do this is to also maintain the counts of cnt[i] - i for pairs of characters at each index.

    The time complexity of this solution is O(26n) = O(n).

    Sample implementation:

    @SuppressWarnings("unchecked")
    public static long solve(String s) {
        Map<Integer, Integer>[] lookup = new Map[26];
        Arrays.setAll(lookup, i -> new HashMap<>());
        record Pair(int a, int b) {}
        Map<Pair, Integer>[][] pairLookup = new Map[26][26];
        for (var row : pairLookup) Arrays.setAll(row, i -> new HashMap<>());
        var freq = new int[26];
        long result = 0;
        for (int i = 0; i < s.length(); i++) {
            int c = s.charAt(i) - 'a';
            for (int c2 = 0; c2 < 26; c2++) {
                lookup[c2].merge(2 * freq[c2] - i, 1, Integer::sum);
                if (c != c2) {
                    int minc = Math.min(c, c2), maxc = Math.max(c, c2);
                    pairLookup[minc][maxc].merge(new Pair(2 * freq[minc] - i, 2 * freq[maxc] - i), 1, Integer::sum);
                }
            }
            ++freq[c];
            for (int c2 = 0; c2 < 26; c2++) {
                result += lookup[c2].getOrDefault(2 * freq[c2] - i - 1, 0);
                if (c != c2) {
                    int minc = Math.min(c, c2), maxc = Math.max(c, c2);
                    result -= pairLookup[minc][maxc]
                            .getOrDefault(new Pair(2 * freq[minc] - i - 1, 2 * freq[maxc] - i - 1), 0);
                }
            }
        }
        return result;
    }