algorithmrecursionlcssubsequence

Understanding the time complexity of the Longest Common Subsequence Algorithm


I do not understand the O(2^n) complexity that the recursive function for the Longest Common Subsequence algorithm has.

Usually, I can tie this notation with the number of basic operations (in this case comparisons) of the algorithm, but this time it doesn't make sense in my mind.

For example, having two strings with the same length of 5. In the worst case the recursive function computes 251 comparisons. And 2^5 is not even close to that value.

Can anyone explain the algorithmic complexity of this function?

def lcs(xstr, ystr):
    global nComp
    if not xstr or not ystr:
        return ""
    x, xs, y, ys = xstr[0], xstr[1:], ystr[0], ystr[1:]
    nComp += 1
    #print("comparing",x,"with",y)
    if x == y:
        return x + lcs(xs, ys)
    else:
        return max(lcs(xstr, ys), lcs(xs, ystr), key=len)

Solution

  • To understand it properly look at the diagram carefully and follow the recursive top-to-down approach while reading the graph.

    Here, xstr = "ABCD"
          ystr = "BAEC"
    
                                        lcs("ABCD", "BAEC")       // Here x != y 
                                      /                     \  
                    lcs("BCD", "BAEC")   <--  x==y   -->    lcs("ABCD", "AEC")  x==y
                              |                                        |
                              |                                        |
                      lcs("CD", "AEC")   <--  x!=y   -->     lcs("BCD", "EC")
                     /                \                     /                \
                    /                  \                   /                  \
                   /                    \                 /                    \
          lcs("D","AEC")                  lcs("CD", "EC")              lcs("BCD", "C")
        /                \              /               \              /        \       
    lcs("", "AEC")        lcs("D","EC")                  lcs("CD", "C")        lcs("BCD","")
      |        \         /              \                       |             /       |
    Return     lcs("", "EC")    lcs("D" ,"C")            lcs("D", "")   lcs("CD","")  Return
               /         \       /         \             /        \       /        \ 
            Return      lcs("","C")    lcs("D","") lcs("","")  Return  lcs("D","")  Return
                         /     \         /     \      /                 /      \
                      Return   lcs("","")       Return            lcs("", "")  Return
                                     |                                  |
                                  Return                            Return
    

    NOTE: The proper way of representation of recursive call is usually done by using tree approach, but here i used the graph approach just to compress the tree so one can easy understand the recursive call in a go. And, of course it would be easy to me to represent.


    Since, n+m is an integer number let's say N. Therefore, the time complexity of the algorithm : O(2N), which is not efficient for lager values of N.

    Therefore, we prefer Dynamic-Programming Approach over the recursive Approach. It can reduce the time complexity to: O(n.m) => O(n2) , when n == m.

    Even now, if you are getting hard time to understand the logic, i would suggest you to make a tree-like (not the graph which i have shown here) representation for xstr = "ABC" and ystr = "EF". I hope you will understand it.

    Any doubt, comments most welcome.