

A 4MHz parameterized LogarithmSquare Root IPCoreNelson Campos, Roberto Costa, Elton Costa, Gutemberg Junior and Elmar Melcher Abstract Logarithms and square root are nonelementary operations frequently used in digital signal processing. In this work, implementation and design of an IPCore to compute square root and multibase logarithm is presented. The design is parameterized in fixed point notation achieving a low arithmetic error even when irrational numbers are being calculated. The module was synthesized in ASIC using FSC0G_D_GENERIC_CORE from UMC and in FPGA occupying 518 logic elements and two DSP blocks for multiplication. Keywords I. INTRODUCTION The logarithm operation is widely used in digital signal processing applications such as speech recognition [4] [10] and 3D graphics [5]. The square root operation is also present in many modern applications such as Cholesky decomposition, LU factorization and in the solution of quadratic equations [11]. This operation requires expensive computational resources and high energy consumption, and its efficient implementation may be accomplished with the use of dedicated hardware. This paper demonstrates the hardware implementation of an arithmetic module called SQRTLOG comprising four modules: sqrt, log_{2} , log_{10} and log_{e}. First, the paper describes some common algorithms to compute the square root and the binary logarithm. Next, the paper explains the proposed architecture of the SQRTLOG and finally discusses the results of the synthesized hardware in ASIC and FPGA. II. SQUARE ROOT COMPUTATION METHODS There are many ways to compute the square root operations for VLSI and FPGA implementations. Three algorithms will be discussed here. A. The NewtonRaphson Algorithm The NewtonRaphson method is widely used in order to calculate , by approximation. The reciprocal square root using this method may be computed iteratively according to equation 1, where qi is the approximate value of . After n iterations the square root of X is [6]. q_{i+1} = q_{i }(3  xq^{2}_{i} ) / 2 (1) B. The Restoring Algorithm The restoring algorithm will calculate square root and its remainder value in iterative process. Consider that and D = S^{2 }+ R, where D is the radicand, S is the square root result, and R is the remainder. The restoring algorithm will guess the value of S and R iteratively. If there is a wrong guess of S, the previous value is restored[9]. C. The NonRestoring Algorithm The nonrestoring algorithm is similar to the restoring one and it will calculate square root and its remainder value in iterative process. However, unlike the restoring algorithm, in case of a wrong guess of S, it does not change the bits of S more than once [9]. III. LOGARITHM COMPUTATION METHODS Some techniques have been used in hardware implementations of Logarithms. A. LookUp Table According to the implementation described in [10] any number can be represented in the form of a = N x 2^{p}, where p is integer and . Taking the binary logarihtm of a, the equation 2 is obtained. Since p is the most significant bit of a, and log_{2}(N) can be found using a LUT, the computation of the log_{e}(a) can be done mutliplying log_{2}(a) by a constant as shown in equation (2). B. Cordic Method The CORDIC is a simple and efficient algorithm to calculate hyperbolic and trigonometric functions setting one bit per iteration [2][7][8]. The logarithm computation may be accomplished using the hyperbolic expression shown in equation 3. C. FloorShift Method The method presented in [7] computes the binary logarithm through the following steps:
IV. PROPOSED ARCHITECTURE This section discusses the proposed architecture of the SQRTLOG, which is divided in two parts: the sqrt module and the logarithm module. The sqrt module computes the square root of N = (a_{1}+a_{2}+a_{3}+...+a_{n})^{2} guessing one bit at a time according to equations 5 to 7, where a_{m} is the mth bit being guessed. The steps to compute the square root algorithm can be seen in the flowchart depicted in Figure 1. Fig. 1. Square Root Algotihm Flowchart Similarly, the algorithm to compute the binary logarithm proposed by [12] was adapted to a finite state machine for hardware implementation setting one bit per iteration according to equation 8. The algorithm referenced by [12] takes the logarithm of x (where ) and the finite state machine scales a number out of this range according to equation 9. To compute logarithms in base 10 and natural, SQRTLOG simply takes the binary logarithm output multiplying its result by 1=log2(10) and 1=loge(10), respectively. The algorihtm flowchart to compute the binary logarithm can be seen in Figure 2. Fig. 2. Binary Logarithm Algotihm Flowchart The square root and logarithm modules are interconnected using an interface Valid/Ready present in AMBA AXI Protocol [1] and its architecture is described in Figure 3. The signals of its interface are described in Table I and depicted in Figure 4. Fig. 3. The SQRTLOG architecture TABLE I SQRTLOG: DESCRIPTION OF THE INTERFACE SIGNALS
Fig. 4. SQRTLOG: Interface signals V. RESULTS This section discusses the synthesis results of the SQRTLOG in ASIC and in FPGA. A. ASIC Synthesis The module was simulated using dc shell from Synopsys [3] with the standard cell library FSC0G_D_GENERIC_CORE from UMC [13] with 0:13µm CMOS technology using a 4MHz clock. During the experiment, the number of ports and cells are plotted as a function of the number of bits of N (see Figure 5). Fig. 5. SQRTLOG: Number of ports and cells vs NBITS The area and the power consumption as a function of the number of bits of the SQRTLOG are presented in Figure 6. Fig. 6. SQRTLOG: Area and Power vs NBITS The arithmetical percentage error of the module was analyzed to compute and and the results can be seen in Figure 7. Fig. 7. Arithmetic error vs NBITS The module was parameterized in fixed point representation with QI:F, with I integer bits and F fractional bits and NBITS = I +F. The experiment setup measured the module with NBITS, ranging from 4 to 24 bits, with a 4bit step for each measurement. Table II shows that when NBITS = 4, it presents the lower area and power consumption. However, the Higher arithmetic error is obtained. For NBITS = 24, the arithmetic error decreases significantly, increasing power and area. For NBITS=16, there is a good tradeoff between power and area vs % error, as can be seen on Table II. TABLE II: ASIC SYNTHESIS RESULTS
B. FPGA Synthesis The design was also synthesized (with NBITS = 20) in the altera DE1SoC board using 2% of its logic utilization. The synthesis report was generated using the Quartus Prime from Altera and the results can be seen in Table III. Using the Netlist Viewer from Quartus the circuit of the Figure 8 was obtained. Due to the square part of the logarithm algorithm, two multipliers were inferred from a total of 87 of the FPGA device. Fig. 8. Quartus Netlist of the SQRTLOG TABLE III: FPGA SYNTHESIS RESULTS FPGA: Cyclone V
VI. CONCLUSIONS This paper introduces some methods to compute the square root and logarithms in hardware implementation. It is proposed the architecture of a module called SQRTLOG, which basically computes four operations: sqrt, log_{2 }, log_{10} and log_{e}. Results are available for both implementations, ASIC and FPGA. It is shown that the SQRTLOG presents a good tradeoff between arithmetic error and the number of bits of its input, providing a fast and accurate algorithm for digital signal processing and scientific applications. ACKNOWLEDGMENT The authors would like to thank the PEM (Projects for Excellence on Microelectronics) initiative for the financial support. REFERENCES [1] AXI AMBA. Protocol specification. ARM, June, 2003. [2] Liu Bangqiang, He Ling, and Yan Xiao. Basen logarithm implementation on fpga for the data with random decimal point positions. In Signal Processing and its Applications (CSPA), 2013 IEEE 9th International Colloquium on, pages 17–20. IEEE, 2013. [3] Design Compiler. Synopsys inc, 2016. [4] Aidong Deng, Li Zhao, and Yan Zhao. Recognition of acoustic emission signal based on mae and propagation theory. In Management and Service Science, 2009. MASS’09. International Conference on, pages 1–4. IEEE, 2009. [5] Hyejung Kim, BG Nam, JH Sohn, JH Woo, and HJ Yoo. A 231mhz, 2.18mw 32bit logarithmic arithmetic unit for fixedpoint 3d graphics system. IEEE journal of solidstate circuits, 41(11):2373–2381, 2006. [6] Yamin Li and Wanming Chu. Parallelarray implementations of a nonrestoring square root algorithm. In Computer Design: VLSI in Computers and Processors, 1997. ICCD’97. Proceedings., 1997 IEEE International Conference on, pages 690–695. IEEE, 1997. [7] AM Mansour, AM ElSawy, MS Aziz, and AT Sayed. A new hardware implementation of base 2 logarithm for fpga. International Journal of Signal Processing Systems, 3(2):171–181, 2015. [8] Pramod K Meher, Javier Valls, TsoBing Juang, K Sridharan, and Koushik Maharatna. 50 years of cordic: Algorithms, architectures, and applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 56(9):1893–1907, 2009. [9] Rachmad Vidya Wicaksana Putra. A novel fixedpoint square root algorithm and its digital hardware design. In ICT for Smart Society (ICISS), 2013 International Conference on, pages 1–4. IEEE, 2013. [10] VB Saambhavi, SSSP Rao, and P Rajalakshmi. Design of feature extraction circuit for speech recognition applications. In TENCON 2012 2012 IEEE Region 10 Conference, pages 1–5. IEEE, 2012. [11] Shashank Suresh, Spiridon F Beldianu, and Sotirios G Ziavras. Fpga and asic square root designs for high performance and power efficiency. In 2013 IEEE 24th International Conference on ApplicationSpecific Systems, Architectures and Processors, pages 269–272. IEEE, 2013. [12] C Turner. A fast binary logarithm algorithm. IEEE Signal Processing Mag, 27(5):124–140, 2010. [13] UMC. United Microelectronics Corporation. www.umc.com. If you wish to download a copy of this white paper, click here 
Home  Feedback  Register  Site Map 
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. 