fpga

Is it possible to add external SRAM on a FPGA card


When considering deploying a deep learning model on an FPGA acceleration card (such as an AMD Alveo U50), the onboard SRAM may be insufficient, and its bandwidth is significantly higher than that of HBM.

Given my limited familiarity with the FPGA tech stack, I wonder if attaching additional external SRAM to the FPGA is feasible. What are the potential trade-offs or limitations in terms of performance, scalability, cost, and workflow?


Solution

  • You're missing the fact the SRAM is not on-board, but on-chip. The SRAM blocks are woven into the FPGA fabric, and they only appear "significantly faster" than HBM when the combined width of all the utilized blocks in the pipeline are very large, like really, unreasonably large. The Terabytes/sec you could find in some flyers is the total width of all the ram blocks times the design clock speed, it is not the throughput of a single SRAM block. On their own such blocks are not that fast, in fact they're few times slower than external DDR(2/3/whatever) memory can deliver. On-chip ram blocks are only great in terms of fixed latency and short routes to the logic blocks around that use it - that's something no external memory can provide if it's big, because it has to be built with dynamic ram with refresh needs and complex protocols on top.

    The next best thing to on-chip SRAM is the external SDRAM, preferably with very wide interface, and that's what HBM does. It is external to FPGA chip, but it is located on the same silicon substrate as the FPGA die, and it is better than anything on-board. If something is not on the same substrate then the bandwidth will be limited by the electric effects in the PCB traces and by the amount of traces you have to lay down on PCB.

    Now back to original question - is it feasible to add external SRAM. No, it is not. That SRAM will be slower than individual on-chip SRAM blocks, and the interface width will be severely limited by the number of available pins on the FPGA. If you had any free pins at all - you'd want to push the speed of that width-limited interface as far as physical limits allow, and that what DDR3/4/xx memories do.