condor

HTCondor change NUM_CPUS based on Idle?


I would like to change CPU count based on someone working on the machine or not. Don't want to PREEMPT jobs as defined in the manual. Just do something like:

// condor_config file
if (KeyboardIdle < 10)
    NUM_CPUS = 2
else
    NUM_CPUS = 8
endif

The above command fails with: (KeyboardIdle < 10) is not a valid if condition because complex conditionals are not supported.

Any way I may implement this or is NUM_CPUS a fixed variable?


As per Greg answer the very bottom of my condor_config is as follows

NUM_CPUS = 16
START = (SlotID < 8) || (KeyboardIdle > 10)

Which would in theory only permit 8 jobs to start but running condor_status myMachine I get:

C:\>condor_status myMachine
Name                       OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      1.210 8186  0+00:00:02
slot2@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      0.500 8186  0+00:00:03
slot3@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      2.220 8186  0+00:00:01
slot4@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      1.500 8186  0+00:00:02
slot5@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      0.600 8186  0+00:00:02
slot6@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      0.380 8186  0+00:00:02
slot7@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      1.940 8186  0+00:00:03
slot8@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      0.880 8186  0+00:00:02
slot9@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      1.560 8186  0+00:00:02
slot10@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      0.310 8186  0+00:00:02
slot11@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      2.180 8186  0+00:00:02
slot12@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      1.580 8186  0+00:00:02
slot13@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      0.950 8186  0+00:00:02
slot14@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      1.890 8186  0+00:00:02
slot15@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      0.490 8186  0+00:00:02
slot16@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      1.600 8186  0+00:00:01

               Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain

X86_64/WINDOWS    16     0      16         0       0          0        0      0

         Total    16     0      16         0       0          0        0      0

Any ideias?


Solution

  • NUM_CPUS is fixed in HTCondor. Typically the way this sort of policy is implemented is by changing the START expression, so that there are varying number of slots whose START expression evaluates to false, and thus cannot start jobs.

    Assuming this machine has static slots (the default), a START expression could be something like

    START = (SlotID < 3) || (KeyboardIdle > 10)
    

    that is, start is always true for slots 1 & 2, and true for the rest of the slots if the keyboard is idle.

    To be annoyingly pedantic, this only controls whether jobs START on that machine according to the keyboard usage. With just the above configuration, a completely idle machine will allow itself to be filled with jobs, and those jobs will continue running indefinitely when a keyboard user returns. If you'd like to preempt those jobs, you can also use a preempt expression like

    PREEMPT = (SlotID > 3) && (KeyboardIdle < 10)