pythonrecursioncrashlimitgrasshopper

Large data set crashes python script


I have simple script using GyPython component for Rhino/Grasshopper. The goal is to assign hourly weather data (only some hours were recorded) to hours. If there was no measurement it returns 0. It should work like this (example with similar values):

hoursList = [hr1,hr2,hr3,hr4,hr5,hr6]
measuredList = [hr2,hr3,hr6]
recordList = [wData1,wData2,wData3]
finalList = []    

def assignData(i,y):        
    for i < len(leadList):            
        if hoursList[i] == measuredList[y]:                
            finalList.append(recordList[y])                
            i += 1
            y += 1                
        else:                
            finalList.append(0)                
            i += 1    
        assignData(i,y)    

i = 0
y = 0    
assignData(i,y)

which should return

[0,wData1,wData2,0,0,wData3]

The resulting finalList for this case (line breaks added to help readability)

[0, 'wData1', 'wData2', 0, 0, 'wData3', 'wData3',
 0, 'wData3', 'wData3', 0, 0, 'wData3', 'wData3',
 0, 'wData3', 'wData3', 'wData2', 0, 0, 'wData3', 'wData3',
 0, 'wData3', 'wData3', 0, 0, 'wData3', 'wData3',
 0, 'wData3', 'wData3', 'wData1', 'wData2', 0, 0, 'wData3', 'wData3',
 0, 'wData3', 'wData3', 0, 0, 'wData3', 'wData3',
 0, 'wData3', 'wData3', 'wData2', 0, 0, 'wData3', 'wData3',
 0, 'wData3', 'wData3', 0, 0, 'wData3', 'wData3',
 0, 'wData3', 'wData3']

When I try to run this code on large data list (approx 43000 values), it crashes after about 7000 iteration. I checked sys.getrecursionlimit and it's 2147483647. Any ideas how to get this work?


Solution

  • ANALYSIS

    I gather that len(leadList) is the 43000 figure you give. I'll call this limit.

    Note how your loop works: for every value of i in the range of "input i" to limit, this processes one item, increments i (and perhaps y), and recurs. Thus, your top-level call at i=0 will spawn a call to assignData(1, 0) (assuming failure) and wait for that to finish. Then it will go back to the top of the loop, work with i=1, and continue ... eventually spawning limit recursive calls in succession.

    That initial call will now work the range (1, limit), spawning limit-1 calls, the first of which will spawn limit-2 calls, and so on. Each level will spawn another level with a large fan-out.

    In short, you're spawning far more calls than I think you realize; the total grows quite fast as you increase limit.

    I suspect your problem is that finalList simply outgrows available memory, as each of these calls appends one element.

    INVESTIGATION

    Insert the basic debugging statement into your code:

    def assignData(i, y):
        print "ENTER", i, y, finalList
        for i < len(leadList):
            ...
    

    So you can see the progression of calls.

    REPAIR

    I doubt that you need this doubly-nested recursion stack. In fact, I don't see that recursion buys you anything. It looks as if you need only to walk through the lists once, finding the times that actually correspond, filling in 0's otherwise. Get rid of the call, use the for properly to control the value of i, and pare down the code as appropriate.

    def assignData():
        y = 0        
        for i in range(0, len(leadList)):            
            if hoursList[i] == measuredList[y]:                
                finalList.append(recordList[y])                
                y += 1                
            else:                
                finalList.append(0)
    

    BETTER SOLUTION

    If all you need is the records for the matching times, you can make this even more direct. Build a dictionary to index the measurements from the times.

    meas = dict(zip(measuredList, recordList)
    

    Now, write a list comprehension to insert 0 for any time not in the dictionary.

    finalList = [meas[time] if time in meas else 0
                    for time in hourslist]
    

    If I'm reading your problem correctly, that's your overall goal.