google-cloud-platformnvidiatesla

Google Cloud tesla K80, only one device showing up?


I've setup an instance on Google Cloud with the following specs: 4 vCPUs, 15 GB memory, 1 Tesla K80 GPU

Tesla K80 consists of 2 GPU units and each should show up as a separate device in the nvidia's logs. However, when I run nvidia-smi in the shell it shows only one. Image: enter image description here

Does anyone know how to solve this issue? Is this because my cloud GPU quota is one and hence only one device is being used?

Additional logs:

==============NVSMI LOG==============

Timestamp                           : Tue Mar 13 16:05:42 2018
Driver Version                      : 390.30

Attached GPUs                       : 1
GPU 00000000:00:04.0
    Product Name                    : Tesla K80
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0320717075175
    GPU UUID                        : GPU-a3a146ad-aed1-d5ef-1e76-2565c1e20a13
    Minor Number                    : 0
    VBIOS Version                   : 80.21.25.00.01
    MultiGPU Board                  : No
    Board ID                        : 0x4
    GPU Part Number                 : 900-22080-6300-001
    Inforom Version
        Image Version               : 2080.0200.00.04
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : Pass-Through
    PCI
        Bus                         : 0x00
        Device                      : 0x04
        Domain                      : 0x0000
        Device Id                   : 0x102D10DE
        Bus Id                      : 00000000:00:04.0
        Sub System Id               : 0x106C10DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : N/A
            HW Power Brake Slowdown : N/A
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 11441 MiB
        Used                        : 10930 MiB
        Free                        : 511 MiB
    BAR1 Memory Usage
        Total                       : 16384 MiB
        Used                        : 3 MiB
        Free                        : 16381 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 54 %
        Memory                      : 1 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 7
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 7
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 1
        Pending                     : No
    Temperature
        GPU Current Temp            : 56 C
        GPU Shutdown Temp           : 93 C
        GPU Slowdown Temp           : 88 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 66.18 W
        Power Limit                 : 149.00 W
        Default Power Limit         : 149.00 W
        Enforced Power Limit        : 149.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 175.00 W
    Clocks
        Graphics                    : 562 MHz
        SM                          : 562 MHz
        Memory                      : 2505 MHz
        Video                       : 540 MHz
    Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Default Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 2505 MHz
        Video                       : 540 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : On
        Auto Boost Default          : On

Solution

  • Digging through the cloud platform website, I found an answer to this. Instead of removing the question, I am leaving this here, in case someone else faces the same issue.

    The Tesla K80 is '1 GPU board' consisting of 2 GPUs. If the number of GPUs on your VM instance on Google Cloud Platform is set to 1, you'd be allocated 1 GPU or half of a board.

    enter image description here

    Source: https://cloud.google.com/compute/docs/gpus/