I was doing this course on algorithms from MIT. In the very first lecture the professor presents the following problem:-
A peak in a 2D array is a value such that all it's 4 neighbours are less than or equal to it, ie. for
a[i][j]
to be a local maximum,
a[i+1][j] <= a[i][j]
&& a[i-1][j] <= a[i][j]
&& a[i][j+1] <= a[i][j]
&& a[i+1][j-1] <= a[i][j]
Now given an NxN 2D array, find a peak in the array.
This question can be easily solved in O(N^2)
time by iterating over all the elements and returning a peak.
However it can be optimized to be solved in O(NlogN)
time by using a divide and conquer solution as explained here.
But they have said that there exists an O(N)
time algorithm that solves this problem. Please suggest how can we solve this problem in O(N)
time.
PS(For those who know python) The course staff has explained an approach here (Problem 1-5. Peak-Finding Proof) and also provided some python code in their problem sets. But the approach explained is totally non-obvious and very hard to decipher. The python code is equally confusing. So I have copied the main part of the code below for those who know python and can tell what algorithm is being used from the code.
def algorithm4(problem, bestSeen = None, rowSplit = True, trace = None):
# if it's empty, we're done
if problem.numRow <= 0 or problem.numCol <= 0:
return None
subproblems = []
divider = []
if rowSplit:
# the recursive subproblem will involve half the number of rows
mid = problem.numRow // 2
# information about the two subproblems
(subStartR1, subNumR1) = (0, mid)
(subStartR2, subNumR2) = (mid + 1, problem.numRow - (mid + 1))
(subStartC, subNumC) = (0, problem.numCol)
subproblems.append((subStartR1, subStartC, subNumR1, subNumC))
subproblems.append((subStartR2, subStartC, subNumR2, subNumC))
# get a list of all locations in the dividing column
divider = crossProduct([mid], range(problem.numCol))
else:
# the recursive subproblem will involve half the number of columns
mid = problem.numCol // 2
# information about the two subproblems
(subStartR, subNumR) = (0, problem.numRow)
(subStartC1, subNumC1) = (0, mid)
(subStartC2, subNumC2) = (mid + 1, problem.numCol - (mid + 1))
subproblems.append((subStartR, subStartC1, subNumR, subNumC1))
subproblems.append((subStartR, subStartC2, subNumR, subNumC2))
# get a list of all locations in the dividing column
divider = crossProduct(range(problem.numRow), [mid])
# find the maximum in the dividing row or column
bestLoc = problem.getMaximum(divider, trace)
neighbor = problem.getBetterNeighbor(bestLoc, trace)
# update the best we've seen so far based on this new maximum
if bestSeen is None or problem.get(neighbor) > problem.get(bestSeen):
bestSeen = neighbor
if not trace is None: trace.setBestSeen(bestSeen)
# return when we know we've found a peak
if neighbor == bestLoc and problem.get(bestLoc) >= problem.get(bestSeen):
if not trace is None: trace.foundPeak(bestLoc)
return bestLoc
# figure out which subproblem contains the largest number we've seen so
# far, and recurse, alternating between splitting on rows and splitting
# on columns
sub = problem.getSubproblemContaining(subproblems, bestSeen)
newBest = sub.getLocationInSelf(problem, bestSeen)
if not trace is None: trace.setProblemDimensions(sub)
result = algorithm4(sub, newBest, not rowSplit, trace)
return problem.getLocationInSelf(sub, result)
#Helper Method
def crossProduct(list1, list2):
"""
Returns all pairs with one item from the first list and one item from
the second list. (Cartesian product of the two lists.)
The code is equivalent to the following list comprehension:
return [(a, b) for a in list1 for b in list2]
but for easier reading and analysis, we have included more explicit code.
"""
answer = []
for a in list1:
for b in list2:
answer.append ((a, b))
return answer
left_side + central_column
right_side + central_column
Why this works:
For cases where the maximum element is in the central column - obvious. If it's not, we can step from that maximum to increasing elements and will definitely not cross the central row, so a peak will definitely exist in the corresponding half.
Why this is O(n):
step #3 takes less than or equal to max_dimension
iterations and max_dimension
at least halves on every two algorithm steps. This gives n+n/2+n/4+...
which is O(n)
. Important detail: we split by the maximum direction. For square arrays this means that split directions will be alternating. This is a difference from the last attempt in the PDF you linked to.
A note: I'm not sure if it exactly matches the algorithm in the code you gave, it may or may not be a different approach.