pythonloggingmultiprocessingfile-writingfile-processing

How to merge multiple files once multiprocessing has ended in Python?


In my code, multiprocessing Process is being used to spawn multiple impdp jobs (imports) simultaneously and each job generates a log file with the dynamic name:

'/DP_IMP_' + DP_PDB_FULL_NAME[i] + '' + DP_WORKLOAD + '' + str(vardate) + '.log'

vardate = datetime.now().strftime("%d-%b-%Y-%I_%M_%S_%p")
tempfiles = []
for i in range((len(DP_PDB_FULL_NAME))):
        for DP_WORKLOAD in DP_WORKLOAD_NAME:
                 tempfiles.append(logdir + '/DP_IMP_' + DP_PDB_FULL_NAME[i] + '_' + DP_WORKLOAD  +  '_' + str(vardate) + '.log')
                 p1 = multiprocessing.Process(target=imp_workload, args=(DP_WORKLOAD, DP_DURATION_SECONDS, vardate, ))
                 p1.start()

I want to merge all the log files created into one large master log file once all the processes have ended. But, when I am trying to use something like this under the (for i in range((len(DP_PDB_FULL_NAME))) loop:

with open('DATAPUMP_IMP_' + str(vardate) + '.log','wb') as wfd:
    for f in tempfiles:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

then it's trying to write the files before the processes end.

Here, DP_PDB_FULL_NAME is a list of multiple databases so multiple processes are spawning simultaneously in multiple DBs. When I try to add p1.join() after the loop ends then the multiprocessing is not happening in multiple DBs.

So, how should I create a master log file once all the individual processes are completed?


Solution

  • You should create some kind of structure where you store the needed variables and process handles. Block with join after that loop until all subprocesses are finished and then work with the resulted files.

    handles = []
    for i in range(10):
        p = Process()
        p.start()
        handles.append(p)
    
    for handle in handles:
        handle.join()