performancetensorflowdeep-learninghpcparallelism-amdahl

Should I change my original Python/Tensorflow code to make it run faster on HPC?


Please forgive if this question is too basic. I am neither familiar with the idea of parallelization nor used a HPC system before.

I am training a deep learning model which takes really long on my PC. It takes approximately 2 days on my i5 with 12 GB RAM.

So I decided to use HPC but in one of the tutorials I watched, it says that if I do not write my code properly HPC will not be any faster than a regular PC. What is it really meant? Should I adjust my original code so that I can benefit HPC?

Secondly, can we say that using 30 cores should be 5 times faster than using 6 cores? Is speed and number of cores proportionate?


Solution

  • Q : "can we say that using 30 cores should be 5 times faster than using 6 cores?"

    No, we can not.

    Q : "Is speed and number of cores proportionate?"

    No, it is not.

    There is an ultimate ceiling for any (potential) speedup. The Amdahl's Law ( even in its original, overhead-naive, atomicity-of-work ignoring formulation ).

    enter image description here

    Better use the revised, overhead-strict, resources-aware Amdahl's Law re-formulation.

    There you see.


    In a seek for improving performance?

    Start with this, best spending some time with tuning the core-parameters in the INTERACTIVE TOOL ( URL there ).

    A conversion of a classical library (like TF or other ) into an HPC-efficient tool is not easy and does not come free - add-on overhead costs may easily (ref. the results in the INTERACTIVE TOOL) devastate any potential HPC-powers, just due to a poor scaling (going from costs in the range of a few ns to costs above a few ms is killing the game at whatever HPC-budget you may spend, isn't it? )