pythonpbsqsubembarrassingly-parallel

"embarrassingly parallel" programming using python and PBS on a cluster


I have a function (neural network model) which produces figures. I wish to test several parameters, methods and different inputs (meaning hundreds of runs of the function) from python using PBS on a standard cluster with Torque.

Note: I tried parallelpython, ipython and such and was never completely satisfied, since I want something simpler. The cluster is in a given configuration that I cannot change and such a solution integrating python + qsub will certainly benefit to the community.

To simplify things, I have a simple function such as:

import myModule
def model(input, a= 1., N=100):
    do_lots_number_crunching(input, a,N)
    pylab.savefig('figure_' + input.name + '_' + str(a) + '_' + str(N) + '.png')

where input is an object representing the input, input.name is a string, anddo_lots_number_crunching may last hours.

My question is: is there a correct way to transform something like a scan of parameters such as

for a in pylab.linspace(0., 1., 100):
    model(input, a)

into "something" that would launch a PBS script for every call to the model function?

#PBS -l ncpus=1
#PBS -l mem=i1000mb
#PBS -l cput=24:00:00
#PBS -V
cd /data/work/
python experiment_model.py

I was thinking of a function that would include the PBS template and call it from the python script, but could not yet figure it out (decorator?).


Solution

  • pbs_python[1] could work for this. If experiment_model.py 'a' as an argument you could do

    import pbs, os
    
    server_name = pbs.pbs_default()
    c = pbs.pbs_connect(server_name)
    
    attopl = pbs.new_attropl(4)
    attropl[0].name  = pbs.ATTR_l
    attropl[0].resource = 'ncpus'
    attropl[0].value = '1'
    
    attropl[1].name  = pbs.ATTR_l
    attropl[1].resource = 'mem'
    attropl[1].value = 'i1000mb'
    
    attropl[2].name  = pbs.ATTR_l
    attropl[2].resource = 'cput'
    attropl[2].value = '24:00:00'
    
    attrop1[3].name = pbs.ATTR_V
    
    script='''
    cd /data/work/
    python experiment_model.py %f
    '''
    
    jobs = []
    
    for a in pylab.linspace(0.,1.,100):
        script_name = 'experiment_model.job' + str(a)
        with open(script_name,'w') as scriptf:
            scriptf.write(script % a)
        job_id = pbs.pbs_submit(c, attropl, script_name, 'NULL', 'NULL')
        jobs.append(job_id)
        os.remove(script_name)
    
     print jobs
    

    [1]: https://oss.trac.surfsara.nl/pbs_python/wiki/TorqueUsage pbs_python