python-3.xlistdictionaryreferencepython-multiprocessing

Python map makes a copy? I need to map by reference


I'm making a script that calls the same function several times with different parameters, then save the results in a spreadsheet (excel). To do this, I'm using multiprocessing.Pool, and I need to save the parameters in a list. The problem is, when I pass the parameters to Pool it gets a copy of the object. I assume the issue is map() but I'm not sure how to fix it.

Here is a working example to test the issue:

import sys
import multiprocessing
import xlwt
from xlwt import Workbook

def get_prices_in_range(varss):
    row = varss[0]
    col = varss[2]
    start_day = varss[1]
    sheet = varss[3]
    print(sheet)
    try:
        sheet.write(row, col, start_day)
    except:
        pass

if __name__ == '__main__':
       
    # multiprocessing pool object
    pool = multiprocessing.Pool()
    
    # pool object with number of element
    pool = multiprocessing.Pool(processes=2)

    #create sheet in excel
    wb = Workbook()
    sheet1 = wb.add_sheet("Sheet 1")
    print(sheet1)

    # input list: extend will copy sheet1! not a reference
    inputs = [[x] for x in range(2)]
    for x in inputs:
        x.extend([1, 2, sheet1])

    #they've been added by reference:
    print(inputs)

    #this doesn't work: sheet1 never gets writen.
    pool.map(get_prices_in_range, inputs)
    # this works:
    #get_prices_in_range([0, 1, 2, sheet1])
    #get_prices_in_range([1, 1, 2, sheet1])


    wb.save("test.xls")

You can see that every print(sheet) shows a different object than print(inputs) as is, or the same if you comment pool.map(get_prices_in_range, inputs) and uncomment the next 2 lines of code.

Any idea how to fix this issue? And happy new year!


Solution

  • The issue arises because multiprocessing.Pool creates separate processes, and objects shared between processes are serialized and deserialized. During this process, the reference to the original object (sheet1) is lost, so each process gets its own copy of the object.

    To solve this problem you can use multiprocessing.Queue to collect results. Instead of trying to write directly to sheet1 in the worker processes, you can have each process return the data it processes, and the main process can handle writing to sheet1.

    Here is the updated code that should work. I've added comments where I have made the changes:

    import multiprocessing
    import xlwt
    from xlwt import Workbook
    
    def get_prices_in_range(varss):
        # Worker function: return the data instead of modifying the sheet
        row, col, start_day = varss[0], varss[1], varss[2]
        return (row, col, start_day)
    
    if __name__ == '__main__':
        pool = multiprocessing.Pool(processes=2)
        
        # Create workbook and sheet in the main process
        wb = Workbook()
        sheet1 = wb.add_sheet("Sheet 1")
    
        # Prepare inputs; removed `sheet1` since workers no longer modify it directly
        inputs = [[x, 1, 2] for x in range(2)]
    
        # Collect results from workers instead of sharing the sheet
        results = pool.map(get_prices_in_range, inputs)
    
        # Write results to the sheet in the main process
        for row, col, start_day in results:
            sheet1.write(row, col, start_day)
    
        wb.save("test.xls")
    

    Let me know if this works for you.