This is a fairly simple question but googling doesn't seem to have the answer, so.
What I want to know is if I have two gpu cards (identical) capable of running cuda, can my kernel span these cards? Or is it bound to one card or the other? I.e. is cuda presented with the entire set of available gpu cores, or just the ones on the card it is run on.
If so, is there anything special I need to know about in order to make it happen and are there any examples over and above the cuda sdk worth knowing about?
Target language is of course C/C++.
A single CUDA kernel launch is bound to a single GPU. In order to use multiple GPUs, multiple kernel launches will be required.
The cuda device runtime API focuses on whichever device is selected. Any given kernel launch will be launched on whichever device was most recently selected using cudaSetDevice()
Examples of multi-GPU programming are given in the cuda samples simple multi-gpu with P2P and simple multi-gpu