**FPL2012**<sup>22nd</sup> International Conference on Field Programmable Logic and Applications

## **Performance Analysis of Fully-adaptable CRC** Accelerators on an FPGA **Amila Akagic and Hideharu Amano Keio University**

Supported by KLL, Keio GCOE





128 tables could not fit into a moderate size **FPGA** following this approach.

Our main contribution is a methodology for building new fully-adaptable and high-throughput architecture for CRC on FPGAs. The implementation was based on table-based algorithm for generating CRCs. The adaptability was achieved with additional circuit for generating remainder values for variable width generator polynomials up to 64 bits.



**R** Module performs single operation of Remainder **Generator Unit.** Operations are interdependent from the results of previous operation, thus it is not possible to execute them in parallel.

1024

O(nm)

O(n)



In our Overlapped pipelined implementation we reduced number of modules to (2) by re-using previous modules. The space complexity is reduced from O(nm) to *O(n)*.





••Slicing-by-N\*[BRAM] ••Slicing-by-N\*[logic] 500 418.8 400 Ihput (Gbps) 300 289.8 212.8 Bhord During Dur 100 28.41 **27.8** 

**Throughput of fully adaptable CRC Accelerators** 

## Number of input bits processed at a time (bits)

256

512

In order to show scalability of the accelerator we implemented five versions of the accelerator, capable of processing different input data widths, ranging from 64 to 1024 bits at a time. Our accelerators are 1.65 to 31.64x faster than related work, depending on the data-path's width.

On the Xilinx Virtex 6 LX550T FPGA board they occupy between 1-2% area to produce maximum of 289.8Gbps at 283.1MHz if BRAM is deployed, or between 1.6 - 14% of area for 418.8Gbps at 408.9MHz if tables are implemented in logic.

| Algorithm       | Tables | Clock<br>Cycles | BRAM<br>(μs) | Logic<br>(μs) |
|-----------------|--------|-----------------|--------------|---------------|
| Slicing-by-8*   | 8      | 320             | .9           | .72           |
| Slicing-by-16*  | 16     | 384             | 1.11         | .92           |
| Slicing-by-32*  | 32     | 512             | 1.52         | 1.23          |
| Slicing-by-64*  | 64     | 768             | 2.40         | 1.85          |
| Slicing-by-128* | 128    | 1280            | 4.52         | 3.13          |

128

64

Time required for re-generation of a specific number of tables. Re-generation is required only when CRC standard is changed.

The results of the implementation in logic show significant increase in resource utilization but critical path is also significantly decreased since BRAM is not part of the critical path.

Throughput of implementation in logic is up to 31% higher than **BRAM** implementation, with maximum throughput reaching 418.8Gbps.

