pythonmultiprocessingpython-multiprocessingconcurrent.futuresprocess-pool

I am having problems with ProcessPoolExecutor from concurrent.futures


I have a big code that take a while to make calculation, I have decided to learn about multithreading and multiprocessing because only 20% of my processor was being used to make the calculation. After not having any improvement with multithreading, I have decided to try multiprocessing and whenever I try to use it, it just show a lot of errors even on a very simple code.

this is the code that I tested after starting having problems with my big calculation heavy code :

from concurrent.futures import ProcessPoolExecutor

def func():
    print("done")

def func_():
    print("done")

def main():
    executor = ProcessPoolExecutor(max_workers=3)

    p1 = executor.submit(func)
    p2 = executor.submit(func_)

main()

and in the error message that I amhaving it says

An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

this is not the whole message because it is very big but I think that I may be helpful in order to help me. Pretty much everything else on the error message is just like "error at line ... in ..."

If it may be helpful the big code is at : https://github.com/nobody48sheldor/fuseeinator2.0 it might not be the latest version.


Solution

  • I updated your code to show main being called. This is an issue with spawning operating systems like Windows. To test on my linux machine I had to add a bit of code. But this crashes on my machine:

    # Test code to make linux spawn like Windows and generate error. This code 
    # # is not needed on windows.
    if __name__ == "__main__":
        import multiprocessing as mp
        mp.freeze_support()
        mp.set_start_method('spawn')
    
    # test script
    from concurrent.futures import ProcessPoolExecutor
    
    def func():
        print("done")
    
    def func_():
        print("done")
    
    def main():
        executor = ProcessPoolExecutor(max_workers=3)
        p1 = executor.submit(func)
        p2 = executor.submit(func_)
    
    main()
    

    In a spawning system, python can't just fork into a new execution context. Instead, it runs a new instance of the python interpreter, imports the module and pickles/unpickles enough state to make a child execution environment. This can be a very heavy operation.

    But your script is not import safe. Since main() is called at module level, the import in the child would run main again. That would create a grandchild subprocess which runs main again (and etc until you hang your machine). Python detects this infinite loop and displays the message instead.

    Top level scripts are always called "__main__". Put all of the code that should only be run once at the script level inside an if. If the module is imported, nothing harmful is run.

    if __name__ == "__main__":
        main()
    

    and the script will work.

    There are code analyzers out there that import modules to extract doc strings, or other useful stuff. Your code shouldn't fire the missiles just because some tool did an import.

    Another way to solve the problem is to move everything multiprocessing related out of the script and into a module. Suppose I had a module with your code in it

    whatever.py

    from concurrent.futures import ProcessPoolExecutor
    
    def func():
        print("done")
    
    def func_():
        print("done")
    
    def main():
        executor = ProcessPoolExecutor(max_workers=3)
    
        p1 = executor.submit(func)
        p2 = executor.submit(func_)
    

    myscript.py

    #!/usr/bin/env pythnon3
    import whatever
    whatever.main()
    

    Now, since the pool is laready in an imported module that doesn't do this crazy restart-itself thing, no if __name__ == "__main__": is necessary. Its a good idea to put it in myscript.py anyway, but not required.