I'm trying to speed-up my code with openacc with PGI 15.7 compiler.
I want to profile my code in C source level. I'm using 'nvvp' profiler from CUDA 7.0 When I run nvvp, I can use 'analysis tap' and can get which latency is the reason my code slows. (data dependency, conditional branch and bandwidth... etc)
But, I couldn't get line-based analysis, but only 'kernel' level analysis. (e.g. main_300_gpu kernel used 10s) . So I have some trouble to know where do I have to fix the code.
Is there any way to profile my code in source-level?
I'm using
PGI 15.7 (using pgcc)
CUDA 7.0
NVIDIA GTX 960
Ubuntu 14.04 LTS x86_64
You can also try adding the flag "-ta=tesla:lineinfo" to have the compiler add source code association for the profiler (it's the same flag as nvcc --lineinfo). Though as Bob points out, the code may been heavily transformed so the line information many not directly correspond back to your original source.