pythonperformancedata-structurescontainerslinear-algebra

Fast(est) way to process an expanding linear sequence in Python


I have the following conditions:

Ex: u = [1, 3, 4, 7, 9, 10, 13, 15, 19, 21, 22, 27, ...]

The program wants me to give the value of a member at a given index.

I've already found ways to solve this using insort and I hacked together a minimal binary search tree, as well. Unfortunately, this process needs to be faster than what I have and I'm at a loss as to what to do next. I thought the BST would do it, but it is not fast enough.

Here's my BST code:

class BSTNode:
    def __init__(self, val=None):
        self.left = None
        self.right = None
        self.val = val

    def insert(self, val):
        if not self.val:
            self.val = val
            return

        if self.val == val:
            return

        if val < self.val:
            if self.left:
                self.left.insert(val)
                return
            self.left = BSTNode(val)
            return

        if self.right:
            self.right.insert(val)
            return
        self.right = BSTNode(val)

    def inorder(self, vals):
        if self.left is not None:
            self.left.inorder(vals)
        if self.val is not None:
            vals.append(self.val)
        if self.right is not None:
            self.right.inorder(vals)
        return vals

and here's my function:

from sys import setrecursionlimit

def twotimeslinear(n):
    #setrecursionlimit(2000)
    i = 0
    u = [1]
    ended = False

    bst = BSTNode()
    bst.insert(1)
    while i < n and not ended:
        
        for j in range(2, 4):
            k = 1
            cur = j * bst.inorder([])[i] + 1
            bst.insert(cur)
            
            if len(u) == n:
                ended = True
                break
        i+= 1
    return bst.inorder([])[n]

I just need directions as to what I could do to make the process faster. I can solve the problem if I only knew what I was missing. I'm probably overlooking some data structure that would work better, but I don't even know what to look for. Thank you for any help.


Solution

  • Generating and merging the ys and zs:

    from heapq import merge
    from itertools import groupby
    
    def twotimeslinear(n):
        u = [1]
        ys = (2*x+1 for x in u)
        zs = (3*x+1 for x in u)
        for x, _ in groupby(merge(ys, zs)):
            u.append(x)
            if n < len(u):
                return u[n]
    
    print(*map(twotimeslinear, range(20)))
    

    Attempt This Online!

    Takes ~0.05 seconds for your limit index 60,000 and ~0.7 seconds for index 1,000,000.

    Alternative basic implementation:

    def twotimeslinear(n):
        u = [1]
        i = j = 0
        while n >= len(u):
            y = 2*u[i] + 1
            z = 3*u[j] + 1
            m = min(y, z)
            u.append(m)
            if y == m:
                i += 1
            if z == m:
                j += 1
        return u[n]
    
    print(*map(twotimeslinear, range(20)))
    

    Attempt This Online!

    And another version, throwing tons of batteries at it :-)

    from heapq import merge
    from itertools import groupby, tee, chain, islice, repeat
    from operator import itemgetter, mul, add
    
    def twotimeslinear(n):
        parts = [[1]]
        whole = chain.from_iterable(parts)
        output, feedback1, feedback2 = tee(whole, 3)
        ys = map(1 .__add__, map(2 .__mul__, feedback1))
        zs = map(add, map(mul, repeat(3), feedback2), repeat(1))  # different way just for fun
        merged = map(itemgetter(0), groupby(merge(ys, zs)))
        parts.append(merged)
        return next(islice(output, n, None))
    
    print(*map(twotimeslinear, range(20)))
    

    After setting it up with Python code, this runs entirely in C code. It's a silly hobby of mine.

    Attempt This Online!