Architectural Insights: Comparing Weight Stationary and Output Stationary Systolic Arrays for Efficient Computation
Abstract
This paper compares two prevalent architectures in systolic arrays: weight stationary and output stationary methods. Systolic arrays utilize interconnected processing elements (PEs) to perform parallel processing, making them suitable for applications in digital signal processing, image processing, and machine learning. We focus on their implementation of 2D matrix multiplication, a fundamental operation in neural networks. Simulations were conducted using Verilog HDL within the Xilinx Vivado Design Suite 2019, employing a 3x1 input matrix and a 3x3 weight matrix. Results confirmed the functionality of both architectures, with output matrices matching expected results. Weight stationary designs minimized data movement, while output stationary designs enhanced throughput through effective input data reuse. Furthermore, this research demonstrates that the critical path remains constant despite increases in the number of processing units, providing valuable insights for future architectural designs. With a critical path delay of approximately 8.8 ns, corresponding to a maximum frequency of about 113 MHz, the study highlights that the critical path remains stable when scaling the number of PEs. Overall, this research validates the effectiveness of both architectures in high-performance matrix operations, offering valuable insights for future systolic array designs. © 2024 IEEE.