xilinxcrcvivado

Is CRC Calculation Faster on Xilinx Alveo U280 FPGA Using a Custom Algorithm or a Lookup Table?


I am working on a project where I need to implement CRC (Cyclic Redundancy Check) on a Xilinx Alveo U280 FPGA. I am considering two approaches for CRC calculation and would like to understand which one would be faster in terms of performance:

Custom Algorithm: Implementing the CRC calculation using custom logic that leverages the parallel processing capabilities of the FPGA.
Lookup Table: Precomputing the CRC values for all possible inputs and storing them in a lookup table for quick retrieval.

Here are the details and constraints of my project:

The FPGA model is Xilinx Alveo U280.
The FPGA has a sufficient amount of logic resources and memory.
The data sizes can vary, ranging from small (8-bit) to large (potentially multi-kilobyte streams).
Speed is a critical factor, and I need the CRC computation to be as fast as possible.
Memory usage should be efficient, but I am willing to allocate a reasonable amount of memory for performance gains.

I would appreciate insights on the following points:

Which approach is generally faster for CRC computation on an FPGA, specifically the Xilinx Alveo U280?

How do the two methods compare in terms of scalability and resource usage on this FPGA?

Are there any hybrid approaches or optimizations that could combine the benefits of both methods?

Any advice, examples, or references to relevant resources would be greatly appreciated. Thank you!


Solution

  • I'm not seeing how they are not both custom logic. Everything in an FPGA is custom logic.

    What you might mean by your first alternative would be a classic shift-register implementation, which would process one bit of input per cycle.

    For your second alternative you could implement a table-lookup that processes eight bits of input per cycle, with a table that is 256 by the the number of bits in the CRC.

    You need to be quantitative about the speed you require. "As fast as possible" is not a valid requirement. Is one bit per cycle fast enough? If not, how about eight? If not that, then what does your application require?