HomeProjectsPeoplePublicatons
Search:
   
 

A Reconfigurable Multiprocessor System for DSP Behavioral Simulation

Wook Kuh, Ph.D. 1990 (advisor: Jan Rabaey)

A major part of the design effort for DSP systems is devoted to the algorithmic verification and specification process. The behavioral simulation of DSP algorithms on a programmable computer will provide the flexibility to develop the algorithms and enable the short design cycle. However, the simulation often requires high computational throughput and the simulation of large amounts of data. It takes too long or is too costly to simulate on a general purpose computer. Therefore a dedicated simulation engine called SMART has been developed and presented in this report. It is a multiprocessor architecture optimized for real- time behavioral simulation of Digital Signal Processing (DSP) systems. The first prototype, containing 10 processors, is currently operational with a peak performance of 120 MFLOPS. The SMART system features a Configurable Bus and a Bypass Unit to trade off overall communication bandwidth and latency by taking advantage of the local communication between processors. The system performance is further improved by a Distributed Shared Memory system which lets the communication latency overlap with the computation time of the processors. Barriers, locks and events are supported by hardware to minimize the synchronization overhead. The benchmarks have demonstrated that the SMART architecture actually achieves the targeted low communication and synchronization overhead. In a SMART simulation environment, the designer can describe the algorithms using a high level language: C and Silage. The C programming environment, which requires the partitioning information in the program, is currently available. A high level software system, based on Silage, is under development to auto-schedule the algorithmic description onto the SMART processor array with a balanced loading and an effcient usage of the communication system. Performance of the actual SMART system was measured for typical DSP programs using floating-point operations. The measurement shows an average speedup of 76 times over SUN 3/60 and a speedup of 29 times over SUN SPARC Station 1. With extensive uses of library routines in programming, the speedup can be easily doubled over the above results. The performance is expected to increase even further when the system is upgraded from 120 MFLOPS to 200 MFLOPS.