I am writing a code to measure the power usage of an NVIDIA Tesla K20 GPU (Kepler architecture) periodically using the NVML API.
Variables:
nvmlReturn_t result;
nvmlEnableState_t pmmode;
nvmlDevice_t nvmlDeviceID;
unsigned int powerInt;
Basic code:
result = nvmlDeviceGetPowerManagementMode(nvmlDeviceID, &pmmode);
if (pmmode == NVML_FEATURE_ENABLED) {
result = nvmlDeviceGetPowerUsage(nvmlDeviceID, &powerInt);
}
My issue is that nvmlDeviceGetPowerManagementMode
is always returning NVML_ERROR_INVALID_ARGUMENT
. I checked this.
The NVML API Documentation says that NVML_ERROR_INVALID_ARGUMENT
is returned when either nvmlDeviceID
is invalid or pmmode
is NULL
.
nvmlDeviceID
is definitely valid because I am able to query its properties which match with my GPU. But I don't see why I should set the value of pmmode
to anything, because the documentation says that it is a Reference in which to return the current power management mode
. For the record, I tried assigning an enable value to it, but the result was still the same.
I am clearly doing something wrong because other users of the system have written their own libraries using this function, and they face no problem. I am unable to contact them. What should I fix to get this function to work correctly?
The problem here was not directly in the API call - it was in the rest of the code - but the answer might be useful to others. Before attempting this solution, one must know for a fact that Power Management mode is enabled (check with nvidia-smi -q -d POWER
).
In case of the invalid argument error, it is very likely that the problem lies with the nvmlDeviceID
. I said I was able to query the device properties and at the time I was sure it was right, but be aware of any API calls that modify the nvmlDeviceID
value later on.
For example, in this case, the following API call had some_variable
as an invalid index, so nvmlDeviceID
became invalid.
nvmlDeviceGetHandleByIndex(some_variable, &nvmlDeviceID);
It had to be changed to:
nvmlDeviceGetHandleByIndex(0, &nvmlDeviceID);
So the solution is to either remove all API calls that change or invalidate the value of nvmlDeviceID
, or at least to ensure that any existing API call in the code does not modify the value.