How can one limit the number of cores/threads/processes that are being used by vaex
? Some operations have a boolean parallel
switch, but I don't see a way to have more fine-grained control (which is important on larger shared servers).
Code snippet at hand:
vaex.open("/very/large/file.parquet/")\
.sample(frac=0.01)\
.export_parquet("/slightly/smaller/file.parquet", parallel=True)
Regarding the number of threads, you can use a env variable named VAEX_NUM_THREADS
, by default it uses multiprocessing.cpu_count()