algorithmtime-complexitybig-ocomplexity-theorybacktracking# Time Complexity of Backtracking solution - Leetcode 473. Matchsticks to Square

The following is a Leetcode problem: 473. Matchsticks to Square (https://leetcode.com/problems/matchsticks-to-square/description/)

Problem Statement

You have an array, *matchsticks*, of size *n*, with different matchstick lengths. You want to create a square using all the matchsticks, without breaking them. If you can, return *true*. If you can't, return *false*.

Example 1:

Input:

*matchsticks*= [1,1,2,2,2]Output:

*true*Explanation: You can form a square with length 2, one side of the square came two sticks with length 1.

Example 2:

Input:

*matchsticks*= [3,3,3,3,4]Output:

*false*Explanation: You cannot find a way to form a square with all the matchsticks.

Constraints:

1 <=

*n*<= 151 <=

*matchsticks[i]*<= 10^8

I have seen O(4^n) and O(n*2^n) solutions to this problem. **I have solved the problem differently through a solution that I haven't seen anywhere else, and I am not sure what its time complexity is.** I will explain the solution with the example *matchsticks* = [4, 4, 4, 3, 1] -> *sideLength* = 4. I solved it in C++. The code is at the end:

My solution: For each side, iterate through the matchsticks array, and for each matchstick explore both decisions: add it to the current side, and not add it to the current side. Once a side is complete, move on to the next side, starting the iteration over the matchsticks array from the beginning (i=0). Since there are two decisions per element, I was initially tempted to say that the time complexity is O(2^n). The height of the entire tree is not really *n*, but the height of individual subtrees is, at most. I am a bit confused here. How does one go about determining the time complexity of such a backtracking solution?

Code:

```
class Solution {
public:
bool makesquare(const std::vector<int>& matchsticks) {
int sum = 0;
for(int m : matchsticks)
sum += m;
if(sum % 4 != 0)
return false;
int sideLength = sum / 4;
std::vector<bool> available(matchsticks.size(), true);
return dfs(matchsticks, sideLength, 0, 0, available, 4);
}
bool dfs(const std::vector<int>& matchsticks, int sideLength, int currSideLength, int i, std::vector<bool>& available, int numRemainingSides) {
if(currSideLength == sideLength) {
currSideLength = 0;
numRemainingSides --;
i = 0; // restart choice of matches
if(numRemainingSides == 0)
return true;
}
if(i == matchsticks.size())
return false;
if(available[i]) {
// take current matchstick for the current side
available[i] = false;
if(dfs(matchsticks, sideLength, currSideLength + matchsticks[i], i+1, available, numRemainingSides))
return true;
// do not take the current matchstick for the current side
available[i] = true;
}
if(dfs(matchsticks, sideLength, currSideLength, i+1, available, numRemainingSides))
return true;
return false;
}
};
```

Solution

This is a complicated algorithm to analyze because each time you see an element, you make a binary choice. But then you can see the same element potentially up to 3 times. Which gives a trivial upper bound of `O(8^n)`

. However if you think about it, you'll see that we've really divided up the question of making 4 possible choices over 3 passes. If we didn't evaluate after the first or second passes, but only at the end of the third, we'd wind up with `4^n`

combinations, which would take `O(4^n)`

work.

The question therefore becomes, how much do we gain by not continuing on if we fail to make our first or second edges work out?

We can gain some intuition from the central limit theorem. Suppose that `n`

is a multiple of 4. How many ways are there of taking a set of `n`

things and dividing it into two piles of equal size? The answer is `O(2^n / sqrt(n))`

. And then how many ways are there of dividing first pile again into two piles of equal size? The answer is `O(2^(n/2) / sqrt(n))`

. So if we went by counts of edges rather than sizes, we'd expect there to be `O(2^(3/2 n) / n)`

to do the first two passes that require us to do the `O(2^(n/2))`

work to do the third pass. Resulting in `O(4^n / n)`

work.

I believe, but cannot prove that this is the worst case. However I can produce a family of adversarial inputs for which solving your problem looks very much like that simpler one, and demonstrate that you can't do better than that back of the envelope.

My example is the following family of adversarial inputs:

```
[3, 4, 4, 5, 14, 15, 17, 18]
[3, 4, 4, 4, 4, 4, 4, 5, 30, 31, 33, 34]
[3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 46, 47, 49, 50]
```

And in general the `k`

th will contain a `3`

, then `4`

repeated `2k-2`

times. And then `16k-2, 16k-1, 16k+1, 16k+2`

.

This sums to `80k`

. You will be attempting to make sides of length `20k`

. This can be done with the `3`

, the `16k+1`

, and any `k-1`

of the `4`

elements. It can also be done with the `5`

, the `16k-1`

, and any `k-1`

of the `4`

elements. It cannot be done in any other way because as soon as you add the `16k-2`

or `16k+2`

edges you wind up with something with a remainder of 2 when you divide by 4. There is no way to get rid of that remainder other than to put the other one in, but then you get a side of length `32k`

, which is too long.

`n`

is, of course, `4k+4`

. Because we have `4k-2`

`4`

's, a `3`

, a `5`

, and then 4 large elements.

What happens when we put these adversarial inputs in to your algorithm?

The first pass takes `O(2^n)`

as expected. (I will use `k`

where I have to, and `n`

whenever it is convenient.) We really don't hit `20k`

unless we've decided to include two of the final 4 elements.

Exactly `2 (4k-2 choose k-1)`

times, we finished the first pass perfectly, and try the second pass. Each time we do, we have 3/4 of the elements. And that takes `O(2^(3/4 n))`

. For each one, the second pass succeeds `(3k-1 choose k-1)`

times.

For each of those successful first then second passes, we do a third pass. Each third pass then takes `O(2^(n/2))`

.

The giant question is how many of those `choose`

's do we get. Because it is a large number, I'm going to switch to logarithms.

```
log((first pass #) * (second pass #))
= log((4k-2 choose k-1) * (3k-1 choose k-1))
= log((4k-2)! / (k-1)! / (3k-1)!)
+ log((3k-1)! / (k-1)! / (2k)!)
= log((4k-2)!) - log((k-1)!) - log((3k-1)!)
+ log((3k-1)!) - log((k-1)!) - log((2k)!)
= log((4k-2)!) - 2 log((k-1)!) - log((2k)!)
```

Now we can pull out Stirling's Approximation.

```
log(m!) = m log(m) - m + 1/2 log(2πm) + O(1/m)
```

And so

```
log((first pass #) * (second pass #))
= log((4k-2)!) - 2 log((k-1)!) - log((2k)!)
= (4k-2) log(4k-2) - (4k-2) + 1/2 log(2π (4k-2))
- 2( (k-1) log(k-1) - (k-1) + 1/2 log(2π (k-1)) )
- ( (2k) log(2k) - 2k + 1/2 log(2π (2k)) )
+ O(1/k)
```

Now let's let `O(1)`

errors pass. So `m log(m + O(1))`

can be replaced with `m log(m)`

, `(m + O(1))`

can be replaced with `m`

, and `1/2 log(O(1) m)`

can be replaced with `1/2 log(m)`

...

```
= (4k-2) log(4k) - 4k + 1/2 log(k)
- 2 ( (k-1) log(k) - k + 1/2 log(k) )
- ( (2k) log(2k) - 2k + 1/2 log(k) )
+ O(1)
= (4k-2) log(4k) - 4k + 1/2 log(k)
- (2k-2) log(k) + 2k - log(k)
- (2k) log(2k) + 2k - 1/2 log(k)
+ O(1)
= (4k-2) (2 log(2) + log(k))
- (2k-2) log(k)
- (2k) (log(2) + log(k))
- log(k) + O(1)
= 8k log(2) + 4k log(k) - 2 log(k)
- 2k log(k) + 2 log(k)
- 2k log(2) - 2k log(k)
- log(k) + O(1)
```

And now gather like terms. The `k log(2)`

have `8 - 2 = 6`

. The `k log(k)`

have `4 - 2 - 2 = 0`

. And the `log(k)`

have `-2 + 2 - 1 = -1`

. So we get

```
= 6k log(2) - log(k) + O(1)
= log(2^(6k) / k) + O(1)
```

So the number of combinations of first and second passes is (undoing the logs), `O(2^(6k) / k)`

. Remember that `n = 4k + 4`

and so `O(2^(6k) / k) = O(2^(3/2 n) / n)`

. Then for each of those we do another `O(2^(n/2))`

work on the third pass, resulting in a total of `O(2^(2n) / n) = O(4^n / n)`

work.

I'll happily give rep for anyone who can fix my heuristic into a proof that the worst case can't be worse than that.

- Difference between back tracking and dynamic programming
- How can we optimize this algorithm from O(N ** 2) to better complexity in order to pass performance test?
- How do I predict the required size of a Base32 Decode output?
- Reversing AND Bitwise
- Why does my binary search need an extra comparison? log2(N)+1
- How to build a trie for finding exact phonetic matches, sorted globally by weight, and paginated? (Building off this example)
- What is the Time Complexity and Space Complexity of extending a string according to a rule?
- Skyscraper puzzle algorithm
- Check if all elements in a list are equal
- Bitwise Interval Arithmetic
- fast algorithm for drawing filled circles?
- How to find distance from the latitude and longitude of two locations?
- Determine if two rectangles overlap each other?
- Randomly Splitting a Graph according to Conditions
- Maximize distance while pushing crates
- Free a binary tree without recursion
- How can I estimate number of nodes in given tree structure?
- Explanation of class Definition for Binary Trees in leetcode
- Procedural Generation of 2D Rooms
- Is there an algorithm to find the closest element to X in an unsorted array in Ω(logN)?
- Advanced Java idiom for 2+ classes implementing a generic interface
- Is there any algorithm in c# to singularize - pluralize a word?
- Number of Common sub sequences of two strings
- Trying to solve problem 19 on Euler Project
- is a "non-decreasing" sequence "increasing"?
- Is it possible to get the original value of a number, after several multiplications **with overflow**?
- Algorithm to determine the highest and lowest possible finishing position of a team in a league
- Algorithm to calculate the number of divisors of a given number
- Rolling or sliding window iterator?
- best way to pick a random subset from a collection?