rustopenaccpgipgi-accelerator

Calling OpenACC code from Rust does not run on GPU


Update 09/07/2020

I'm adding a small example for this at https://gitlab.com/lisanhu2016/rust-openacc-example.git

It's a public repository with a README, you should be able to try the example there.

The libraries I have been linking to are: nvc, acchost, pgm, you can see them in build.rs

===

I’m trying to call openacc code from rust code and I’m having issues probably related to linking. I’m using the following steps:

  1. compiling bindings.cc with openacc flags -acc -gpu=managed -Minfo=accel and convert it to a static library libfoo.a
[ 87%] Building CXX object CMakeFiles/foo.dir/bindings.cc.o
/opt/nvidia/hpc/20.7/Linux_x86_64/20.7/compilers/bin/nvc++    -fast -O3 -DNDEBUG   -fPIC -acc -gpu=managed -Minfo=accel -o CMakeFiles/foo.dir/bindings.cc.o -c /usa/lisanhu/tmp/rust-c-ffi-example/bindings.cc
process_batch:
      6, Generating copyout(lengths[:array.l]) [if not already present]
         Generating implicit copyin(array.data[:]) [if not already present]
         Generating Tesla code
         10, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
      6, Generating implicit copyin(array) [if not already present]
[100%] Linking CXX static library libfoo.a
  1. compiling rust code with libfoo.a, and dynamically link to libraries nvc nvcpumath nvhost nvdevice.
  2. run the code with some data, it runs.
  3. run the code with PGI_ACC_TIME=1, there's no timing info
  4. run the code with ncu --set full, it shows no kernel information

I think we probably are linking to the wrong libraries but I'm not sure, would you please help me with this problem? Thank you so much!!


Solution

  • I worked with Sanhu via the OpenACC Slack channel on this one. There were two issues.

    First since he's not linking with PGI/NV, he needs to add the "-gpu=nordc" flag. RDC requires a device link step not performed when being linked by Rust.

    Second, since the "data" array is being allocated by Rust, it wont be put into CUDA Unified Memory. Hence he needed to add it to an OpenACC data region with the "process_array" routine.