# Emerging Computing Technology Laboratory at SJTU

Approximate Computing
Three important goals of VLSI design are reducing circuit area, improving circuit frequency, and reducing power consumption, all of which are achieved under the basic assumption that the circuit correctly implements the specified function. However, many applications widely used today, such as signal processing, pattern recognition, and machine learning, do not require perfect computation. Instead, results with small errors are still acceptable. A new design paradigm, known as approximate computing, is recently proposed to design circuits for those error-tolerant applications. Exploiting the error tolerance of applications, it deliberately sacrifices a small amount of accuracy to achieve improvement in area, performance, and power consumption.

An example of approximate computing is shown in Fig. 1. Fig. 1(a) shows the Karnaugh map of an accurate 2-bit multiplier. If we change the output "1001" in the red circle in Fig. 1(a) to "111", we obtain the Karnaugh map of an approximate 2-bit multiplier, as shown in Fig. 1(b). The circuit for the accurate multiplier and that for the approximate multiplier are shown in Fig. 1(c) and 1(d), respectively. Notice that by introducing a small amount of inaccuracy, we reduce both the area and the delay of the original multiplier significantly.

 (a) Karnaugh map of accurate multiplier (b) Karnaugh map of approximate multiplier (c) Circuit of accurate multiplier (d) Circuit of approximate multiplier

 Fig. 1. Accurate and approximate designs of 2-bit multiplier from the paper "Trading accuracy for power with an underdesigned multiplier architecture" by Kulkarni et al., 2011.

We study the following fundamental problems related to approximate computing:
1. Approximate logic synthesis. We develop algorithms to automatically synthesize approximate circuits. Specifically, given a target function and an error specification, the algorithm will produce an optimal circuit which is an approximation to the target function satisfying the error specification.
2. The design, synthesis, and analysis of approximate arithmetic circuits. Arithmetic circuits, including adders and multipliers, are key building blocks of many applications. We study how to design and synthesize high-performance energy-efficient approximate adders and multipliers by deliberately introducing errors. We also study how to analyze the error of these approximate circuits.
3. The design of approximate computing architectures. We study how to design novel architectures exploiting approximate computing to improve energy efficiency.

 Selected Publications on Approximate Logic Synthesis Chang Meng, Zhuangzhuang Zhou, Yue Yao, Shuyang Huang, Yuhang Chen, and Weikang Qian, "HEDALS: Highly efficient delay-driven approximate logic synthesis," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023. (Source code: ) Xuan Wang, Sijun Tao, Jingjing Zhu, Yiyu Shi, and Weikang Qian, "AccALS: Accelerating approximate logic synthesis by selection of multiple local approximate changes," in Proceedings of the 2023 ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2023. Chang Meng, Jiajun Sun, Yuqi Mai, and Weikang Qian, "MECALS: A maximum error checking technique for approximate logic synthesis," in Proceedings of the 2023 Design, Automation, and Test in Europe Conference (DATE), Antwerp, Belgium, 2023. (Source code: ) Chang Meng, Xuan Wang, Jiajun Sun, Sijun Tao, Wei Wu, Zhihang Wu, Leibin Ni, Xiaolong Shen, Junfeng Zhao, and Weikang Qian, "SEALS: Sensitivity-driven efficient approximate logic synthesis," in Proceedings of the 2022 ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2022. (The first two authors contributed equally.) Sanbao Su, Chang Meng, Fan Yang, Xiaolong Shen, Leibin Ni, Wei Wu, Zhihang Wu, Junfeng Zhao, and Weikang Qian, "VECBEE: A versatile efficiency-accuracy configurable batch error estimation method for greedy approximate logic synthesis," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022. (The first two authors contributed equally.) (Source code: ) Chang Meng, Weikang Qian, and Alan Mishchenko, "ALSRAC: approximate logic synthesis by resubstitution with approximate care set," in Proceedings of the 2020 Design Automation Conference (DAC), virtual event, 2020, pp. 187:1-187:6. (Source code: ) Yi Wu and Weikang Qian, "ALFANS: Multi-level approximate logic synthesis framework by approximate node simplification," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 7, pp. 1470-1483, 2020. Zhuangzhuang Zhou, Yue Yao, Shuyang Huang, Sanbao Su, Chang Meng, and Weikang Qian, "DALS: Delay-driven approximate logic synthesis," in Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 2018, pp. 86:1-86:7. (Source code: ) Sanbao Su, Yi Wu, and Weikang Qian, "Efficient batch statistical error estimation for iterative multi-level approximate logic synthesis," in Proceedings of the 2018 Design Automation Conference (DAC), San Francisco, CA, USA, 2018, pp. 54:1-54:6. (Source code: ) Yue Yao, Shuyang Huang, Chen Wang, Yi Wu, and Weikang Qian, "Approximate disjoint bi-decomposition and its application to approximate logic synthesis," in Proceedings of the 35th IEEE International Conference on Computer Design (ICCD), Boston, MA, USA, 2017, pp. 517-524. Yi Wu, Chuyu Shen, Yi Jia, and Weikang Qian, "Approximate logic synthesis for FPGA by wire removal and local function change," in Proceedings of the 2017 Asia and South Pacific Design Automation Conference (ASPDAC), Chiba, Japan, 2017, pp. 163-169. Yi Wu and Weikang Qian, "An efficient method for multi-level approximate logic synthesis under error rate constraint," in Proceedings of the 2016 Design Automation Conference (DAC), Austin, TX, USA, 2016, pp. 128:1-128:6.

 Selected Publications on the Design, Synthesis, and Analysis of Approximate Arithmetic Circuits Xuan Wang and Weikang Qian, "MinAC: Minimal-area approximate compressor design based on exact synthesis for approximate multipliers," in Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 2022. (Source code: ) Weihua Xiao, Cheng Zhuo, and Weikang Qian, "OPACT: Optimization of approximate compressor tree for approximate multiplier," in Proceedings of the 2022 Design, Automation, and Test in Europe Conference (DATE), Antwerp, Belgium, 2022. Yi Wu, You Li, Xiangxuan Ge, Yuan Gao, and Weikang Qian, "An efficient method for calculating the error statistics of block-based approximate adders," in IEEE Transactions on Computers, vol. 68, no. 1, pp. 21-38, 2019. (The first three authors contributed equally.) Junjun Hu, Zhijing Li, Meng Yang, Zixin Huang, and Weikang Qian, "A high-accuracy approximate adder with correct sign calculation," in Integration, the VLSI Journal, vol. 65, pp. 370-388, 2019. Junjun Hu and Weikang Qian, "A new approximate adder with low relative error and correct sign calculation," in Proceedings of the 2015 Design, Automation, and Test in Europe Conference (DATE), Grenoble, France, 2015, pp. 1449-1454.

 Selected Publications on the Design of Approximate Computing Architectures Xingyue Qian, Chang Meng, Xiaolong Shen, Junfeng Zhao, Leibin Ni, and Weikang Qian, "High-accuracy low-power reconfigurable architectures for decomposition-based approximate lookup table," in Proceedings of the 2023 Design, Automation, and Test in Europe Conference (DATE), Antwerp, Belgium, 2023. Chang Meng, Zhiyuan Xiang, Niyiqiu Liu, Yixuan Hu, Jiahao Song, Runsheng Wang, Ru Huang, and Weikang Qian, "DALTA: A decomposition-based approximate lookup table architecture," in Proceedings of the 2021 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Munich, Germany, 2021.

Stochastic Computing
Traditional arithmetic circuits operate on numbers encoded by binary radix, which is a deterministic way to represent numerical values with zeros and ones. Fundamentally different from the binary radix, stochastic encoding is another way to represent numerical values by logical zeros and ones.  In such a encoding, a real value p in the unit interval is represented by a sequence of N random bits X1, X2, ..., XN, with each Xi having probability p of being one and probability (1-p) of being zero, i.e., P(Xi = 1) = p and P(Xi = 0) = 1-pFig. 2(a) shows an example of a stochastic bit stream encoding the value 5/8.

Since the random sequences are composed of binary digits, we can apply digital circuits to process them. Thus, instead of mapping Boolean values into Boolean values, a digital circuit now maps real probability values into real probability values. We refer to this type of computation as stochastic computing. Fig. 2(b) illustrates an AND gate performing multiplication on stochastic bit streams.

 Fig. 2. Stochastic encoding and computation on stochastic encoding: (a) A stochastic bit stream encoding the value x = 5/8; (b) An AND gate multiplying two values encoded by two input stochastic bit streams.

We study the following fundamental problems related to stochastic computing:
1. The synthesis of stochastic computing circuits. In the earlier works on stochastic computing, a number of stochastic circuits are designed manually, but they are restricted to basic computation units such as adder and multiplier. We explore algorithms to automatically synthesize stochastic circuits for arbitrary arithmetic functions.
2. The generation of stochastic bit streams. Stochastic computing relies on stochastic number generators (SNGs) to generate input stochastic bit streams of the desired probabilities. However, they need far more area and power consumption than the stochastic computing core, offsetting the latter's advantage. We explore solutions to reduce the area and power consumption of SNGs.
3. Architectures and application of stochastic computing. We explore architecture for stochastic computing. We apply such computational paradigm to different applications, such as image processing and numerical integration.

 Selected Publications on the Synthesis of Stochastic Computing Circuits Xuan Wang, Zhufei Chu, and Weikang Qian, "MinSC: An exact synthesis-based method for minimal-area stochastic circuits under relaxed error bound," to appear in Proceedings of the 2021 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Munich, Germany, 2021. (Source code: ) Chen Wang, Weihua Xiao, John Hayes, and Weikang Qian, "Exploring target function approximation for stochastic circuit minimization," in Proceedings of the 2020 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), virtual event, 2020, pp. 1-9. (Source code: ) Xuesong Peng and Weikang Qian, "Stochastic circuit synthesis by cube assignment," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 12, pp. 3109-3122, 2018. Zheng Zhao and Weikang Qian, "A general design of stochastic circuit and its synthesis," in Proceedings of the 2015 Design, Automation, and Test in Europe Conference (DATE), Grenoble, France, 2015, pp. 1467-1472. Peng Li, David J. Lilja, Weikang Qian, Kia Bazargan, and Marc D. Riedel, "The synthesis of complex arithmetic computation on stochastic bit streams using sequential logic," in Proceedings of the 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2012, pp. 480-487. Weikang Qian and Marc D. Riedel, "The synthesis of robust polynomial arithmetic with stochastic logic," in Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC), Anaheim, CA, USA, 2008, pp. 648-653.

 Selected Publications on the Generation of Stochastic Bit Streams Kuncai Zhong, Zexi Li, Haoran Jin, and Weikang Qian, "Exploiting uniform spatial distribution to design efficient random number source for stochastic computing," in Proceedings of the 2022 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 2022. Zhijing Li, Zhao Chen, Yili Zhang, Zixin Huang, and Weikang Qian, "Simultaneous area and latency optimization for stochastic circuits by D flip-flop insertion," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 7, pp. 1251-1264, 2019. Meng Yang, Bingzhe Li, David J. Lilja, Bo Yuan, and Weikang Qian, "Towards theoretical cost limit of stochastic number generators for stochastic computing," in Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Hongkong, 2018, pp. 154-159. Meng Yang, John Hayes, Deliang Fan, and Weikang Qian, "Design of accurate stochastic number generators with noisy emerging devices for stochastic computing," in Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Irvine, CA, USA, 2017, pp. 638-644. Yili Ding, Yi Wu, and Weikang Qian, "Generating multiple correlated probabilities for MUX-based stochastic computing architecture," in Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2014, pp. 519-526.

 Selected Publications on Architectures and Applications of Stochastic Computing Kuncai Zhong, Zexi Li, and Weikang Qian, "Towards low-cost high-accuracy stochastic computing architecture for univariate functions: Design and design space exploration," in Proceedings of the 2022 Design, Automation, and Test in Europe Conference (DATE), Antwerp, Belgium, 2022. Peng Li, David J. Lilja, Weikang Qian, Kia Bazargan, and Marc. D. Riedelin IEEE Transactions on Very Large Scale Integrated (VLSI) Systems, vol. 22, no. 3, pp. 449-462, 2014. Weikang Qian, Xin Li, Marc D. Riedel, Kia Bazargan, and David J. Lilja, "An architecture for fault-tolerant computation with stochastic logic," in IEEE Transactions on Computers, vol. 60, no. 1, pp. 93-105, 2011.

 Deep Neural Network Accelerators Deep neural networks (DNNs) have achieved a great success in the past decade due to their high accuracy and have been successfully applied in many domains including computer vision and speech recognition. However, they involve many multiply-and-accumulate (MAC) operations, which makes the traditional Von Neumann architecture unsuitable. To address this challenge, novel DNN accelerators are being actively explored. We study how to apply stochastic computing and approximate computing to design energy-efficient DNN accelerators. In this case, both computing paradigms allow efficient design of multipliers and adders, while their computational accuracy loss is well tolerated by the DNN application. We also study how to design DNN accelerators using emerging devices. One particular device we are interested in is resistive random access memory (ReRAM). ReRAM-based crossbars can naturally implement matrix-vector multiplication. However, they have notorious reliability issues. We study how to design reliable DNN accelerators using ReRAM-based crossbars.

 Selected Publications Ziqi Meng, Yanan Sun, and Weikang Qian, "Write or not: Programming scheme optimization for RRAM-based neuromorphic computing," in Proceedings of the 2022 ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2022. Ziqi Meng, Weikang Qian, Yanan Sun, Yilong Zhao, Rui Yang, and Li Jiang, "Digital offset for RRAM-based neuromorphic computing: a novel solution to conquer cycle-to-cycle variation," in Proceedings of the 2021 Design, Automation, and Test in Europe Conference (DATE), virtual event, 2021. Chang Ma, Yanan Sun, Weikang Qian, Ziqi Meng, Rui Yang, and Li Jiang, "Go unary: a novel synapse coding and mapping scheme for reliable ReRAM-based neuromorphic computing," in Proceedings of the 2020 Design, Automation, and Test in Europe Conference (DATE), virtual event, 2020, pp. 1432-1437. Yawen Zhang, Sheng Lin, Runsheng Wang, Yanzhi Wang, Yuan Wang, Weikang Qian, and Ru Huang, "When sorting network meets parallel bitstreams: A fault-tolerant parallel ternary neural network (TNN) accelerator based on stochastic computing," in Proceedings of the 2020 Design, Automation, and Test in Europe Conference (DATE), virtual event, 2020, pp. 1287-1290. Chuliang Guo, Li Zhang, Xian Zhou, Weikang Qian, and Cheng Zhuo, "A reconfigurable approximate multiplier for quantized CNN applications," in Proceedings of the 2020 Asia and South Pacific Design Automation Conference (ASPDAC), Beijing, China, 2020, pp. 235-240.

 Design Automation for Emerging Technologies As CMOS technology is scaled into the nanometer regime, power consumption has become one of the paramount concerns in designing very large scale integrated (VLSI) circuits. To address this challenge, alternatives to CMOS technology are being actively explored. We are exploring two kinds of promising substitutes for CMOS devices and addressing challenges in designing VLSI circuits with them: Single-electron transistors (SETs). SET is an ultra-low power device. A suitable structure for realizing logic function using SET is a binary decision diagram (BDD)-based SET array. The special connection of this structure puts some constraints in realizing digital logic and hence, requires new methods to map a Boolean function onto it. We explore methods to map a target Boolean function onto an SET array with minimal area cost. Carbon nanotube field effect transistors (CNFETs). CNFETs, which use carbon nanotubes (CNTs) as the transistor channel, are a promising substitution of conventional CMOS technology. However, due to the randomness in the CNFET fabrication process, the number of CNTs in each CNFET has a large variation, resulting in degraded circuit performance. We explore a timing-driven placement method for CNFET circuits that effectively improves the timing yield.

 Selected Publications Chen Wang, Yanan Sun, Shiyan Hu, Li Jiang, and Weikang Qian, "Variation-aware global placement for improving timing-yield of carbon-nanotube field effect transistor circuit, " in ACM Transactions on Design Automation of Electronic Systems, vol. 23, no. 4, pp. 44:1-44:27, 2018. Zheng Zhao, Chian-Wei Liu, Chun-Yao Wang, and Weikang Qian, "BDD-based synthesis of reconfigurable single-electron transistor arrays," in Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2014, pp. 47-54.