In the DGX-1 system (8xV100), there are two types of NVlinks: NVlink-V1 and NVlink-V2,
is there any way for us to explicitly specify which types of NVlink we use for p2p and collective communication?
There aren't two types of NVLINKs in a single machine. The difference here is in the number of links that are bonded together.
The NV1 designation indicates those GPUs (on that connection path) have single-link connectivity.
The NV2 designation indicates those GPUS have double-link (i.e. twice the bandwidth) connectivity. Two links are "bonded" together.
You cannot choose one or the other, this is not controllable, its a function of the HW design.
If NCCL chooses to transfer data between two GPUs that have NV2 connectivity, it will do so at twice the speed.
There is nothing for you to set or control here.
A general principle when using NCCL is that you specify the collective you want to perform, and NCCL will use the existing fabric to get that collective done as quickly as possible.