I have a task where I need to run the same function on many different pandas dataframes. I load all the dataframes into a list then pass it to Pool.map
using the multiprocessing
module. The function code itself has been vectorized as much as possible, contains a few if/else clauses and no matrix operations.
I'm currently using a 10-core xeon and would like to speed things up, ideally passing from Pool(10)
to Pool(xxx)
. I see two possibilities:
GPU processing. From what I have read though I'm not sure if I can achieve what I want and would in any case need lots of code modification.
Xeon-Phi. I know it's being discontinued, but supposedly code adaptation is easier and if thats really the case I'd happily get one.
Which path should I concentrate on? Any other alternatives?
Software: Ubuntu 18.04, Python 3.7. Hardware: X99 chipset, 10-core xeon (no HT)
You can rely on new Intel 2066 platform or Xeon. With newest AVX512 they accelerated numpy processing a lot (numpy is the base of pandas). Check: https://software.intel.com/en-us/articles/the-inside-scoop-on-how-we-accelerated-numpy-umath-functions
First of all, try to switch to numpy-based calculations (even with simple .values over the series), it can improve the processing speed up to 10x
You can also try to get 2 CPU motherboard and get more parallelization for calculation.
In the most situations, the bottleneck is not the processing of the data, but IO operations - reading from drive to memory. This will be the problem using GPU too.