python-3.xbigdatavaex

vaex: How to limit number of cores/threads/processes?


How can one limit the number of cores/threads/processes that are being used by vaex? Some operations have a boolean parallel switch, but I don't see a way to have more fine-grained control (which is important on larger shared servers).

Code snippet at hand:

vaex.open("/very/large/file.parquet/")\
   .sample(frac=0.01)\
   .export_parquet("/slightly/smaller/file.parquet", parallel=True)

Solution

  • Regarding the number of threads, you can use a env variable named VAEX_NUM_THREADS, by default it uses multiprocessing.cpu_count()

    cf https://github.com/vaexio/vaex/blob/2418d56a1925a82557a8e86493f5e5d117c06049/packages/vaex-core/vaex/multithreading.py#L21