pythonpandasmultiprocessinggpuxeon-phi

Hardware for python multiprocessing


I have a task where I need to run the same function on many different pandas dataframes. I load all the dataframes into a list then pass it to Pool.map using the multiprocessing module. The function code itself has been vectorized as much as possible, contains a few if/else clauses and no matrix operations.

I'm currently using a 10-core xeon and would like to speed things up, ideally passing from Pool(10) to Pool(xxx). I see two possibilities:

Which path should I concentrate on? Any other alternatives?

Software: Ubuntu 18.04, Python 3.7. Hardware: X99 chipset, 10-core xeon (no HT)


Solution

  • You can rely on new Intel 2066 platform or Xeon. With newest AVX512 they accelerated numpy processing a lot (numpy is the base of pandas). Check: https://software.intel.com/en-us/articles/the-inside-scoop-on-how-we-accelerated-numpy-umath-functions

    First of all, try to switch to numpy-based calculations (even with simple .values over the series), it can improve the processing speed up to 10x

    You can also try to get 2 CPU motherboard and get more parallelization for calculation.

    In the most situations, the bottleneck is not the processing of the data, but IO operations - reading from drive to memory. This will be the problem using GPU too.