

## **Centre for High Performance Embedded Systems**

Alok Prakash, Christopher T. Clarke and Thambipillai Srikanthan

## CUSTOM INSTRUCTIONS WITH LOCAL MEMORY ELEMENTS WITHOUT EXPENSIVE DMA TRANSFERS

## **Abstract**

Traditionally, Instruction set extension (ISE) algorithms have treated memory and control flow as invalid operations during custom instruction identification to ensure deterministic latency of these extended instructions. In order to overcome these constraints some work has been done to incorporate local memory for custom instructions with memory operations. Such architectures have invariably relied on the expensive DMA protocol for data transfer. Cachecoherence management poses another challenge in such systems and requires additional hardware and/or software intervention. We propose a novel custom instruction architecture capable of incorporating certain types of memory and control-flow operations. Unlike existing architectures, the proposed design eliminates the need for expensive Direct Memory Access (DMA) transfers and additional cache management sub-systems, thereby saving significant time and energy. Our method is focused mainly on accelerating code segments with static variables as well as the ones allocated on the stack, which are widely prevalent in embedded applications. Experimental results show that the proposed method achieves a substantial performance gain of up to 47% over base processor implementation.

## Proposed Hardware model using Altera NIOS II.



| Table 1. Most frequently executed basic blocks in the SHA application and corresponding arrays/elements accessed |                       |                        |
|------------------------------------------------------------------------------------------------------------------|-----------------------|------------------------|
| Name of Basic Blocks                                                                                             | %age of total runtime | Variables<br>Accessed  |
| BB3                                                                                                              | 38.7                  | Array "W"              |
| BB6                                                                                                              | 12.1                  | Array "W", Elem. "A-E" |
| BB10                                                                                                             | 12.1                  | Array "W", Elem. "A-E" |
| BB13                                                                                                             | 12.1                  | Array "W", Elem. "A-E" |
| BB16                                                                                                             | 12.1                  | Array "W", Elem. "A-E" |



Suitable code segments for hardware implementation





Contact: