python-3.xmultithreadingmultiprocess

How to choose between multiprocessing and multitasking in python for file process?


I looked at other StackExchange threads related to this topic but seems I need further assistant in understanding.

Please take look at the following scenario? and do explain which method to be used and why?

I have written the Python Code already which Loads the folder and Extracts the file.txt then calls the function "File_Processing" which processes the individual file and then saves the plot after plotting x and y. Thus it takes 20 min per 100 files. I have several folders containing 3000 files per folder.

Now my question is which method to be used, multiprocessing or multitasking and why?


Solution

  • Check out multiprocessing, it is a standard module: https://docs.python.org/3/library/multiprocessing.html

    What you need is almost exactly as in the most basic example:

    from glob import glob
    from multiprocessing import Pool
    
    list_of_filenames = glob("/path/to/files/*.txt")
    
    def f(filename):
        ...  # do contents of your for loop
    
    if __name__ == "__main__":
        with Pool(5) as p:
            p.map(f, list_of_filenames)
    

    Do not forget the if __name__ == "__main__":, I remember not having it may lead to some weird bugs.