gpumemory-bandwidth

How to get memory bandwidth from memory clock/memory speed


FYI, Here are the specs I got from Nvidia

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-680/specifications

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan/specifications

Note that the memory speed/memory clock are the same thing on their website and are both measured in Gbps.

Thanks!


Solution

  • The Titan has a 384bit bus while a GTX 680 only has 256, hence 50% more memory bandwidth (assuming clock and latencies are identical.

    Edit: I'll try to explain the whole concept a bit more: the following is a simplified model of the factors that determine the performance of RAM (not only on a graphics cards).

    Factor A: Frequency

    RAM is running at a clock speed. RAM running at 1 GHz "ticks" 1,000,000,000 (a billion) times a second. With every tick, it can receive or send one bit on every lane. So a theoretical RAM module with only one memory lane running at 1GHz would deliver 1 Gigabit per second, since there are 8 bits to the bytes that means 125 Megabyte per second.

    Factor B: "Pump Rate"

    DDR-RAM (Double Data Rate) can deliver two bits per tick, and there even are "quad-pumped" buses that deliver four bits per tick, but I haven't heard of the latter being used on graphics cards.

    Factor C: Bus width.

    RAM doesn't just have one single lane to send data. Even the Intel 4004 had a 4 bit bus. The graphics cards you linked have 256 bus lanes and 384 bus lanes respectively.

    All of the above factors are multiplied to calculate the theoretical maximum at which data can be sent or received:

    **Maximum throughput in bytes per second= Frequency * Pumprate * BusWidth / 8 **

    Now lets do the math for the two graphics cards you linked. They both seem to use the same type of RAM (GDDR5 with a pump rate of 2), both running at 3 GHz.

    GTX-680: 3 Gbps * 2 * 256 / 8 = 192 GB/s
    
    GTX-Titan: 3 Gbps * 2 * 384 / 8 = 288 GB/s
    

    Factor D: Latency - or reality kicks in

    This factor is a LOT harder to calculate than all of the above combined. Basically, when you tell your RAM "hey, I want this data", it takes a while until it comes up with the answer. This latency depends on a number of things and is really hard to calculate, and usually results in RAM systems delivering way less than their theoretical maxima. This is where all the timings, prefetching and tons of other stuff comes into the picture. Since it's not just numbers that could be used for marketing, where higher numbers translate to "better", the marketing focus is mostly on other stuff. And in case you wondered, that is mostly where GDDR5 differs from the DDR3 you've got on your mainboard.