pythonpython-3.xmultiprocessingevolutionary-algorithm

Implementing Multiprocessing with the same Function in While Loop


I'm have implemented an Evolutionary Algorithm process in Python 3.8, and am attempting to optimise/reduce its runtime. Due to the heavy constraints upon valid solutions, it can take a few minutes to generate valid chromosomes. To avoid spending hours just generating the initial population, I want to use Multiprocessing to generate multiple at a time.

My code at this point in time is:

populationCount = 500

def readDistanceMatrix():
    # code removed

def generateAvailableValues():
    # code removed

def generateAvailableValuesPerColumn():
    # code removed

def generateScheduleTemplate():
    # code removed

def generateChromosome():
    # code removed

if __name__ == '__main__':
    # Data type = DataFrame
    distanceMatrix = readDistanceMatrix()
    
    # Data type = List of Integers
    availableValues = generateAvailableValues()

    # Data type = List containing Lists of Integers
    availableValuesPerColumn = generateAvailableValuesPerColumn(availableValues)
        
    # Data type = DataFrame
    scheduleTemplate = generateScheduleTemplate(distanceMatrix)
    
    # Data type = List containing custom class (with Integer and DataFrame)
    population = []
    while len(population) < populationCount:
        chrmSolution = generateChromosome(availableValuesPerColumn, scheduleTemplate, distanceMatrix)
        population.append(chrmSolution)

Where the population list is filled in with the while loop at the end. I would like to replace the while loop with a Multiprocessing solution that can use up to a pre-set number of cores. For example:

population = []
availableCores = 6 
while len(population) < populationCount:
    while usedCores < availableCores:
        # start generating another chromosome as 'chrmSolution'
    population.append(chrmSolution)

However, after reading and watching hours worth of tutorials, I'm unable to get a loop up-and-running. How should I go about doing this?


Solution

  • It sounds like a simple multiprocessing.Pool should do the trick, or at least be a place to start. Here's a simple example of how that might look:

    from multiprocessing import Pool, cpu_count
    
    child_globals = {} #mutable object at the `module` level acts as container for globals (constants)
    
    if __name__ == '__main__':
        # ...
        
        def init_child(availableValuesPerColumn, scheduleTemplate, distanceMatrix):
            #passing variables to the child process every time is inefficient if they're
            #  constant, so instead pass them to the initialization function, and let
            #  each child re-use them each time generateChromosome is called
            child_globals['availableValuesPerColumn'] = availableValuesPerColumn
            child_globals['scheduleTemplate'] = scheduleTemplate
            child_globals['distanceMatrix'] = distanceMatrix
            
        def child_work(i):
            #child_work simply wraps generateChromosome with inputs, and throws out dummy `i` from `range()`
            return generateChromosome(child_globals['availableValuesPerColumn'],
                                      child_globals['scheduleTemplate'],
                                      child_globals['distanceMatrix'])
        with Pool(cpu_count(), 
                  initializer=init_child, #init function to stuff some constants into the child's global context
                  initargs=(availableValuesPerColumn, scheduleTemplate, distanceMatrix)) as p:
            #imap_unordered doesn't make child processes wait to ensure order is preserved,
            #  so it keeps the cpu busy more often. it returns a generator, so we use list()
            #  to store the results into a list.
            population = list(p.imap_unordered(child_work, range(populationCount)))