## PPMC: Hardware Scheduling and Memory Management Support for Multi Accelerators



Tassadaq Hussain, Miquel Pericàs, Nacho Navarro, Eduard Ayguadé
University Polytechnic Catalunya / Barcelona Supercomputing Center / Tokyo Institute of Technology

{tassadaq.hussain, nacho.navarro, eduard.ayguade}@ bsc.es {pericas.m.aa}@ m.titech.ac.jp



Figure- General Multi Hardware Accelerator Based System

Figure - Stall Time of Generic Vector Processor



## Salient Features of Programmable Pattern based Memory Controller

- The PPMC based system can operate as stand-alone system, without support of the master core.
- PPMC supports multiple hardware accelerators using an event driven handshaking methodology.
- The PPMC system improves performance by efficiently prefetching complex/irregular data patterns.
- Due to the light weight (in terms of logic elements) of PPMC the system consumes less power.
- Standard C/C++ language calls are supported to identify tasks in software.





Figure - MicroBlaze based Multi-core Architecture



Figure - PPMC based Multi-core Architecture

| Kernel         | Application Description             | Data Pattern            | Regs,LUTs                  | OPs    |
|----------------|-------------------------------------|-------------------------|----------------------------|--------|
| Thresholding   | An application of image             | Load/Store              | 2289,2339                  | 1.     |
|                | segmentation which take             |                         |                            |        |
|                | streaming 8-bit pixel data          |                         |                            |        |
|                | and generates binary output.        |                         |                            |        |
| Finite Impulse | Calculates the weighted sum         | Streaming               | 3953,2960                  | 31     |
| Response       | of the current and past inputs.     |                         |                            | 20024  |
| Fast Fourier   | Used for transferring a time-       | 1D Block                | 4977,2567                  | 48     |
| Transform      | domain signal into corresponding    |                         |                            |        |
|                | frequency-domain signal.            |                         |                            |        |
| Matrix         | Output= Row[Vector] ×               | Column and              | 2925,1719                  | 62     |
| Multiplication | Column[Vector].                     | Row Vector              | ***                        |        |
| Laplacian      | Applies discrete convolution        | 2D Tiling               | 3380,2616                  | 17     |
| solver         | filter that can approximate         |                         |                            |        |
|                | the second order derivatives.       |                         |                            |        |
| 3D-Stencil     | An algorithm that averages nearest  | e testions in potential | 5000000 (A H2 000000000 ** | 10.0 X |
| Decomposition  | neighbor points (size 8x9x8) in 3D. | 3D-Tiling               | 6977,5567                  | 37     |

**Table - Test Application** 



Figure - Multi-Accelerator Systems: Application Kernels Execution Time



- Publications
- Reconfigurable Memory Controller with Programmable Pattern Support. Hussain TASSADAQ, Miquel Pericas, Nacho Navarro, Eduard Ayguade. 5th HiPEAC Workshop on Reconfigurable Computing, WRC 2011.

  Implementation of a Reverse Time Migration Kernel using the HCE High Level Synthesis Tool Hussain TASSADAQ, Miguel Pericas, Nacho Navarro, Eduard Ayguade. The 2011 International Conference on Field-Programmable.
- Implementation of a Reverse Time Migration Kernel using the HCE High Level Synthesis Tool Hussain TASSADAQ, Miquel Pericas, Nacho Navarro, Eduard Ayguade. The 2011 International Conference on Field-Programmable Technology FPT 2011 IIT Delhi New Delhi, India 12-14 December 2011
  PPMC: A Programmable Pattern based Memory Controller Hussain TASSADAQ, Muhammad Shafiq, Miquel Pericas, Nacho Navarro, Eduard Ayguade ARC 2012, the 8th International Symposium on Applied Reconfigurable Computing 21 23 March 2012 The Chinese University of Hong Kong, CUHK, Hong Kong