algorithm substring dynamic-programming monoids semigroup

Computing all infix products for a monoid / semigroup

Introduction: Infix products for a group

Suppose I have a group

G = (G, *)

and a list of elements

A = {0, 1, ..., n} ⊂ ℕ
x : A -> G

If our goal is to implement a function

f : A × A -> G

such that

f(i, j) = x(i) * x(i+1) * ... * x(j)

(and we don't care about what happens if i > j)

then we can do that by pre-computing a table of prefixes

m(-1) = 1
m(i) = m(i-1) * x(i)

(with 1 on the right-hand side denoting the unit of G) and then implementing f as

f(i, j) = m(i-1)⁻¹ * m(j)

This works because

m(i-1) = x(0) * x(1) * ... * x(i-1)
m(j) = x(0) * x(1) * ... * x(i-1) * x(i) * x(i+1) * ... * x(j)

and so

m(i)⁻¹ * m(j) = x(i) * x(i+1) * ... * x(j)

after sufficient reassociation.

My question

Can we rescue this idea, or do something not much worse, if G is only a monoid, not a group?

For my particular problem, can we do something similar if G = ([0, 1] ⊂ ℝ, *), i.e. we have real numbers from the unit line, and we can't divide by 0?

Solution

Yes, if G is ([0, 1] ⊂ ℝ, *), then the idea can be rescued, making it possible to compute ranged products in O(log n) time (or more accurately, O(log z) where z is the number of a in A with x(a) = 0).

For each i, compute the product m(i) = x(0)*x(1)*...*x(i), ignoring any zeros (so these products will always be non-zero). Also, build a sorted array Z of indices for all the zero elements.

Then the product of elements from i to j is 0 if there's a zero in the range [i, j], and m(j) / m(i-1) otherwise.

To find if there's a zero in the range [i, j], one can binary search in Z for the smallest value >= i in Z, and compare it to j. This is where the extra O(log n) time cost appears.

General monoid solution

In the case where G is any monoid, it's possible to do precomputation of n products to make an arbitrary range product computable in O(log(j-i)) time, although its a bit fiddlier than the more specific case above.

Rather than precomputing prefix products, compute m(i, j) for all i, j where j-i+1 = 2^k for some k>=0, and 2^k divides both i and j. In fact, for k=0 we don't need to compute anything, since the values of m(i, i+1) is simply x(i).

So we need to compute n/2 + n/4 + n/8 + ... total products, which is at most n-1 things.

One can construct an arbitrary interval [i, j] from at O(log_2(j-i+1)) of these building blocks (and elements of the original array): pick the largest building block contained in the interval and append decreasing sized blocks on either side of it until you get to [i, j]. Then multiply the precomputed products m(x, y) for each of the building blocks.

For example, suppose your array is of size 10. For example's sake, I'll assume the monoid is addition of natural numbers.

i: 0  1  2  3  4  5  6  7  8  9
x: 1  3  2  4  2  3  0  8  2  1

2: ----  ----  ----  ----  ----
   4     6     5     8     3

4: ----------- ----------
   10          13

8: ----------------------
   23

Here, the 2, 4, and 8 rows show sums of aligned intervals of length 2, 4, 8 (ignoring bits left over if the array isn't a power of 2 in length).

Now, suppose we want to calculate x(1) + x(2) + x(3) + ... + x(8).

That's x(1) + m(2, 3) + m(4, 7) + x(8) = 3 + 6 + 13 + 2 = 24.