Is there a way that I can get cuda context memory usage rather than having to use cudaMemGetInfo which only reports global information of a device? or at least a way to get how much memory is occupied by the current application?
It seems to be impossible [No]. However, retrieving per-process memory usage is still a good alternative. And as Robert has pointed out, per-process memory usage can be retrieved using NVML, specifically, by using nvmlDeviceGetComputeRunningProcesses
function.