# A Low Power High Speed Viterbi Decoder Using Xilinx HLS Tool Jyoti Zunzunwala Ph.D. Research Scholar, Department of Electronics and Telecommunication Engg. Sipna College of Engineering and Technlogy, Amravati Abstract— The viterbi algorithm was proposed in 1967 as a method of decoding convolutional code Viterbi Algorithm (VA) decoders are currently used in about one billion cellphones. This is probably the largest number in any application. However, the largest current consumer of VA. Processor cycles is probably digital video broadcasting. The Viterbi decoder extracts the original input message from the corrupted data using the Viterbi algorithm based on the maximum likelihood principle. A Viterbi decoder mainly comprises four essential units: a branch metrics unit, add-compare-select unit, path metrics unit, and survivorpath memory unit. To address the issue of power reduction in the proposed research work, concurrent architecture of the Viterbi decoder is proposed. The architecture is described using hardware description language and it is targeted to the Kintex series Field Programmable Gate Arrays (FPGA) which are fabricated at 28nm technology. For describing the architecture Xilinx Vivado High Level Synthesis (HLS) tool is preferred. The outcome of the proposed architecture is evaluated using different Keywords—Branch Metric Unit, Path Metric Unit, Traceback Unit, Kintex FPGA, HLS. ascendency parameters like time, frequency, power utilization and resource utilization. #### I. INTRODUCTION The algorithm has found universal application in decoding the convolutional codes used in both CDMA and GSM digital dial-up modems, satellite, communications, and 802.11 wireless LANs. This require high level of compression of data before transmission and hence different encoding and data compression techniques are employed. In this compression process, small number of samples are used to represent large number of data set. Since small number of symbols are used to represent large number of datasets, the limited symbols are used repeatedly. This process is called as encoding. When this highly compressed data is transmitted through possible media and available channel, the data symbols which are used to represent the large data becomes vulnerable to the different noise. Regardless of the significant progress in the last decade, the problem of power dissipation in the Viterbi decoders still remains challenging and requires further technical solutions. Thus, a flexible, low power, and high speed Viterbi decoder design is a key challenge for future portable and communication devices. Dr. Atul S. Joshi *Professor* ISSN NO: 1844-8135 Department of Electronics and Telecommunication Engg. Sipna College of Engineering and Technology, Amravati Fig. 1. Block diagram of channel encoder and decoder for Digital communication system The viterbi algorithm is an efficient way of performing maximum likelihood decoding by reducing its complexity. Also viterbi algorithm is known as optimum algorithm since it reduces the probability of errors. It eliminates the least likely trellis path at each stage which leads to reduce the decoding complexity. Also by the early rejection of unlike paths decoding complexities are reduced in this algorithm. The efficiency of viterbi algorithm is based upon the paths of the trellis diagram. The architecture is targeted to Xilinx Kintex series FPGA. Xilinx Kintex series FPGAs are fabricated at 28nm technology and are popular for critical applications like aerospace, defense, audio, video, automotive, consumer and high-performance computing applications. In the subsequent section, different methodologies and their outcomes, reported by distinct authors are disclosed as the literature review. Subsequently, implementation and research outcomes in terms of pre-synthesis, post-synthesis and statistical outcomes are disclosed. # II. REVIEW OF LITERATURE Burcu Ozbay et. al 2018 [3] proposes a method in which a power- and area- efficient Viterbi decoder architecture that also reduces the computational complexity is proposed. Initially, in this technique a hard-decision Viterbi decoder system architecture design for Very Large Scale Integration (VLSI) realization was fulfilled without any further improvement to compare the performance of fundamental and improved designs with respect to power consumption. The architectural design is described using the Verilog hardware description language for comparing the tests and performance of FPGA platform. Convolutional encoder with constraint length 3 and code rate 1/2 and the decoder decoding this code has been designed and implemented and decoder has been developed as power efficient. The key consideration of this work is to decrease the power dissipation and Power consumption on hardware has been reduced to half. Wagar Ahmad et. al 2018 [4] propose a method for enhancement in DLX and Pico- Java II processor ISA for efficient implementation of Viterbi decoding algorithm. This technique creates a custom trellis expansion instruction (Texpand) in CPUSIM simulator on RISC based architecture and MIC-1 simulator on stack based architecture. The execution time is stupendously improved to approximately three times, when Texpand instruction is designed for RISC architecture and approximately three times for stack based architecture. In addition, this method enhances the ISA of NIOS II soft processor for the efficient implementation of Viterbi algorithm. The comparison with and without the custom instruction shows substantial improvement in the results. The performance of the NIOS II processor with the custom instruction is improved to two times to the assembly language program without the custom instruction. However, an FPGA based implementation of these processors may also improve the execution performance for computationally complex algorithms as the clock frequency can be change and also execute the custom instruction in parallel to other independent instructions. S.Nanthini Devi et. al 2017[7] proposes a method for wireless communication, by taking into the consideration of demand for high speed, low power and low cost Viterbi decoding. In this work Convolutional coding with Viterbi decoding is used which is very powerful method for forward error correction and detection method. From this research it can be conclude that if trace back is started after going deeper into trellis diagram then more accurate data can be achieved but it results in complex hardware design and latency in the received signal. Viterbi algorithm of any rate can be designed using same basic principles and its techniques. Dinesh Kumar et. al 2017 [9] designed a high speed feed forward viterbi decoder using hybrid track back and register exchange architecture and embedded BRAM of target FPGA. this viterbi decoder has been designed with Matlab, simulated with Xilinx ISE 8.1i Tool, synthesized with Xilinx Synthesis Tool (XST), and implemented on Xilinx Virtex4 based xc4vlx15 FPGA device. The results show that the proposed design can operate at an estimated frequency of 107.7 MHz by consuming considerably less resources on target device to provide cost effective solution for wireless applications. The results of proposed design can work at an estimated frequency of 86.6 MHz by using considerable less resources of target FPGA to provide high performance cost effective solution for wireless communication applications. Mohd Azlan Abu et al 2016 [10] designed a Viterbi decoder for low power consumption space time trellis code without adder architecture using RTL model. This research aims to describe the real-time design and implementation of a Space Time Trellis Code decoder using Altera Complex Programmable Logic Devices (CPLD). The code uses a generator matrix designed for four-state space time trellis code (STTC) that uses quadrature phase shift keying (QPSK) modulation scheme. This research gives comparative analysis between previous CPLD devices for the STTC Viterbi decoder design. The result shows that this proposed design can work with a 96 per cent improvement in power consumption for a targeted MAX V CPLD board compared to the experiences reported in the previous methods. The decoding process has been carried out using maximum likelihood sequences estimation through the Viterbi algorithm. This work showed that the STTC decoder can successfully decipher the encoded symbols from the STTC encoder and can fully recover the original data. The data rate of the decoder is 50 Mbps. ISSN NO: 1844-8135 T. Kalavathi Devi et. al 2015 [13] designed an asynchronous low power and high Performance VLSI architecture for Viterbi decoder implemented with quasi delay insensitive templates. Designed decoder meets the demand of high speed and low power. At present, the design of a competent system in Very Large Scale Integration (VLSI) technology requires these VLSI parameters to be finely defined. The proposed asynchronous method focuses on reducing the power consumption of Viterbi decoder for various constraint lengths using asynchronous modules. The result of this work shows that the design flow using asynchronous can yield good performance with 25.21% decrease in power consumption compared to the synchronous method. Manthana et al.2013 [18] This proposal shows the hardware architecture for fast- fading channels and slow-fading channels. Complexity analysis was undertaken to provide the information required of the proposed methodology to be used for different STTC configurations. Vestias et al. (2012) suggested a low-power design for Viterbi decoding with a trellis-coded modulation (TCM) system using Verilog HDL. Mechanical pre-computing architecture and T algorithms were used to reduce power consumption without compromising Viterbi decoder performance. The study was conducted using ASIC technology and TSMC 90 nm CMOS standard cell hardware Pujara et al. 2013 [26] In Pujara and Prajapati's (2013) study, a Viterbi algorithm convolutional code with a constraint length of 7 and 1/2 code rate was proposed using the Verilog hardware description language (HDL) code. A Viterbi decoder was simulated and synthesized using Xilinx ModelSim PE 10.0 and 12.4. In terms of the Viterbi algorithm, convolutional codes were designed using Virtex-6 FPGA this design method achieved data rates of 360 Mbps by using the Radix 2 and Radix 4 techniques. Uma Devi et al. 2012 [29] the authors have proposed an algorithm that can be configured for an STTC Viterbi decoder. This algorithm was designed and implemented using 0.18-m complementary metal-oxide-semiconductor (CMOS) technology. The decoder design uses four-phase shift keying (PSK) modulation techniques. Shr et al 2010 [31] the author has discussed the design, implementation and test results for a four-state STTC decoding trellis code. The systems using quadrature phase shift keying (QPSK) mapping and system performance are tested through simulation via a loopback test. In developed communication channels, utilization of fault resolving programs have been established to be a convincing technique to overcome data corruption. For advance blunder alteration, convolution encoding with Viterbi unscrambling is a groundbreaking approach. Convolution programs translate the entire data string into one single program. Convolution programming with Viterbi decoder unraveling has been an inspiring technique employed in numerous applications like data move, computerized video, satellite communication and versatile communication. The key objective of the proposed work is to realize a convolution encoder of one third rate with a length requirement of five and a Viterbi decoder of one third rate with the length of five on a field programmable gate array. For the testing procedure, a clamor model is proposed. All the units are ordered employing very high-speed integrated circuit hardware description language and implemented using a field programmable gate array board. Coding utilizing field programmable gate array is used to help power calculation and characteristics examination of Viterbi decoder. [2] Fig. 2. Block diagram of Viterbi decoder To utilize the list Viterbi decoding technique as an internal decoder with noise identification external programs, the list dimension should be sufficiently huge to comprise the accurate sequence. On the other hand, if noise rectification is utilized, a very short list of two can be sufficient. Convolution vector representation decoding proved that it can pick the accurate input sequence from various sequences. From the realization point of view of convolution vector representation decoding, the traditional methods show that the list of two select is the most effective dimension in terms of complication and presentation. In this paper, the authors [3] have proposed the design and realization of list Viterbi decoding with the list of two in Verilog hardware description language. This Verilog hardware description language code can be realized on a field programmable gate array board and employed as an internal decoder for the convolution vector representation decoding. In digital wireless communication systems, the transmitted information is affected by several noises that introduce some unwanted errors in the received information. The Viterbi decoders are employed for restoring information, correcting the received information and converting them back into the original information. In this paper, the authors [4] have proposed a new memoryless Viterbi algorithm with a 4-stage soft decision for extensive period development systems. In this proposed structure, the survivor memory can be entirely eradicated, which considerably decreases 50% of the total utilization. The proposed design employs complementary metal-oxide semiconductor technology. For the decoding of convolutional codes, the Viterbi algorithm is widely used. In this paper, the authors have proposed a novel standard of hybrid very large-scale integrated structures for survivor path processing to be utilized in the Viterbi algorithm. This structure together with the advantages of trace forward and register exchange algorithms, to be precise, minimum register memory need and latency against realization effectiveness. Based on architectural assessment it is observed that the structure can be proficiently relevant to codes with a huge numeral of stages where trace back dependent structures, which enhance latency, are generally dominant. [5] ISSN NO: 1844-8135 Fig. 3. Proposed block diagram Convolutional programs are normally utilized to encode digital information before transmission. A Viterbi decoder is used to decode the convolutional programs. A huge number of advanced communication systems are available, and an adaptable hardware platform that can be configured to sustain various systems is still required. The authors [6] have proposed a Viterbi decoder structure that supports several restriction lengths of three, five and seven, and code rates of half and one-third which makes it compatible with several general standards. Xilinx simulator has been used to simulate the proposed Viterbi encoder and realized with Verilog hardware description language on field programmable gate array board. An advance compare choose unit is employed that effectively decreases the power utilization by 28% and region by 22%. In digital communication systems, channel coding are mostly employed tools as they can play an important part in improving the system performance like power reduction, better presentation of bit fault rate and increases in the transmission rate. Moreover, such kind of method shows considerable enhancement without concerning expenditures on particular hardware and is flexible to all kinds of communication systems because of its variety of characteristics and counteractive potential, so that for every channel there is one or many alternatives accessible. In this paper, the authors [7] have proposed Viterbi decoders for field programmable gate array platforms. The proposed decoders are generated from the vectors that contain the adders of a convolutional encoder with a half code rate. The cellular mobile communication structure utilizes the Viterbi algorithm technique which is a powerful fault identification and correction method for decoding the information securely. The encoding procedure inserts correlation to the source information sequence to generate the encoded symbol sequence. The decoder investigates this faulted sequence and develops the identified correlation to restructure the unique symbol and information bit sequences. The presented paper [8] gives the design, realization and performance characteristics of the sequential Viterbi decoder by utilizing field programmable gate arrays. The decoding technique supposes the transmitted symbols are coded with a half-rate convolutional encoder with generator purpose. The sequential Viterbi decoder is designed utilizing Verilog hardware description language and realized using field programmable gate arrays. The performance characteristics outputs with 2db Gaussian noises show that the working of the sequence Viterbi decoder functions well. Fig. 4. Serial Viterbi decoder test environment The research work disclosed through [9] concentrated on the implementation of a convolutional encoder and adaptive Viterbi decoder with a restraint length of three and a half code rate employing a field programmable gate array. The proposed adaptive Viterbi decoder has the capacity to decode adaptively through various trace back lengths. The decoder constraint trace back length can be reconfigured through the realization of the adaptive Viterbi decoder in accordance with the altering channel fault features of the threshold signal to noise proportion, which is 8db. The simulation outputs show that the reconfiguration constraint trace back length of 5 and 14 of adaptive Viterbi decoder realization has important enhancement in field programmable gate array device exploitation. Generally, a Viterbi decoder is employed to decode convolutional coding used in wireless communication and space communication. In wireless communication, a wireless cellular standard for code division multiple access uses convolutional coding. Viterbi decoders utilized in digital wireless communication systems are convoluted and disperse a huge quantity of power. The authors have analyzed power consumption for several realizations of the Viterbi algorithm for digital wireless communications systems and also proposed a low-power Viterbi decoder. In the proposed lowpower Viterbi decoder, the techniques used are toggle filtering and clock gating. The authors [10] have given the features of Viterbi decoders in Verilog hardware description language. The calculated power achieved through gate-level simulations shows that the proposed low power Viterbi decoder decreases the power dissipation of the normal Viterbi decoder by 55%. ISSN NO: 1844-8135 Fig. 5. Proposed Viterbi decoder Viterbi algorithm evaluates a series of state alterations in a procedure modeled by hidden Markov models. In this paper, the authors have examined power dissipation for the Viterbi algorithm. The authors have proposed an enhanced Viterbi algorithm in a power attentive manner and used some low power methods to decrease its power dissipation. The first alteration is a reorganization of arithmetic calculations to decrease the number and complication of computational elements. An additional simplification is made in the survivor memory component by storing only a single bit to recognize the preceding state in the survivor path and by conveying every register to the conclusion vector of every clock cycle. This method eradicates preventable shift functions and facilitates the authors to apply the clock gating method to put out of action all the registers but one. The concluding alteration stems from the property of meeting all of the traceback pathways at a similar state despite their preliminary state. Power evaluation achieved via gate-level simulation shows that the proposed technique decreases the power dissipation by 86%. [11] In the paper [12], the authors have described the design and Verilog hardware description language realization of the two most important elements of the fractional response maximum likelihood channel and the Viterbi decoder. These are realized using parameterized Verilog hardware description language units from a library of ordinary digital signal processing and mathematical operations. The results of the proposed system show that the worst-case sampling rates of 50 mega samples per second are attainable. Working four filter modules in parallel can considerably increase the sampling rate from 50Mhz to 179 Mhz. The proposed realization has 50% lower power dissipation when compared with a pipeline filter functioning at a similar rate. A novel realization for the Viterbi decoder based on the schedule substitute technique is proposed. The schedule substitute technique is faster and simpler than the trace back technique, although it has the drawback that each bit in the register should be read and redrafted for every bit decoded. A pointer based on the pointer scheme is allocated to every memory. As an alternative to copying the elements of one memory to another, the pointers are customized. The performance, register size, power consumption and the speed of the survivor register element are investigated for the proposed schedule substitute technique as well as the traceback technique. The results of the proposed realization show that the power dissipation is decreased by 46%. [13] #### III. METHODOLOGY Proposed method mainly concentrate on designing of VD which consumes less power without degrading its performance using combination of different architectural level power reduction techniques. The top level architecture of the VD is shown below in Figure 4.1. It is composed of three functional units: - (1) Branch Metric Unit (BMU). - (2) PathMetric Unit (PMU) or Add Compare Select Unit (ACSU). - (3) Survivor Memory Unit (SMU). Fig. 1. The top level architecture of the Viterbi decoder There are three main stages while designing power efficient VD which are, Branch Metric Unit, Path Metric Unit and Trace back unit. All the three stages come together to decode the received message that is transmitted over a digital channel after appropriate convolution encoding to introduce appropriate redundancy in the transmitted message. Low power design strategy of proposed method is as follows: Figure 4.4 Flowchart of proposed Method In convolution encoder is the converter which generates n number of symbols in response to the k number of inputs. For generation of these n number of symbols, encoder uses current value of k and all possible combinations of ak-1 combinations. k also indicates constraint length. The output convolution code can be indicated as ISSN NO: 1844-8135 $$F = (n, k, m)$$ Here, as mentioned above, k is the input, n is the output and m indicate the essential memory component. Naturally, n and k are integer numbers but essentially k < n. Practically, more the memory component less error probabilities. Another component, K represents the number of bits in the encoder memory and it is represented as. $$K = m + 1$$ The encoder rate r is the measure of the code efficiency and it can be defined as the $$r = k/n$$ The memory elements begin with '0' and mod2 adders which means through XOR gates. Further generator polynomial defines how adders or XOR gates are connected along the memory component. The proposed encoder with the assumed constraints is depicted in the subsequent figure. The system for Viterbi decoder can be described using Branch Metric Unit, Path Metric Unit and Traceback Unit building blocks. The systematic block arrangement for the Viterbi decoder is depicted in the subsequent figure. The very first building block of the Viterbi decoder is the Branch Metric Unit (BMU). It is used to calculate hamming distance values, at each of the time instances, for the paths between the current state value and previous state value. It is the process used to compute the branch metric. The subsequent block is the Path Metric Unit (PMU). The accumulated error metric is called as the path metric. In the process of path metric computation, the current branch metric is added with the previous path metric followed by comparison of each of the distances. The inside components of the PMU are disclosed through the following figure. At the decoder end, the decoding of the incoming string is carried out mainly using Register Exchange Method (REM) and Traceback Method. In the proposed research work the register exchange method is used for decoding the incoming stream In REM technique, register is assigned to each of the states. These registers hold information bits indicating the survivor path from the initial state to the current state. The registers in fact holds the bits indicating the decoded stream along the path. The copy of all the previous registers are updated to the current state register. Hence traceback is not required and the final state register holds the final decoded output. ### IV. RESULTS AND DISCUSSIONS The architecture of the proposed design is described using Very High-Speed Integrated Circuit Hardware Description Language (VHDL) and it is targeted to Kintex series FPGA. The FPGA are the reconfigurable device, comprising of several configurable logic blocks and programmable interconnect switches. These configurable logic blocks are connected to each other in logical form and in required number to perform certain required operation. How these blocks are interconnected to each other is described by the configuration bits. The configurable bits are nothing but combination of 1's and 0's which describes interconnection between the configurable blocks. The internal architecture of each of the configurable blocks depends on the target technology that is FPGA targeted for the design. The high level description that is vhdl description is converted into ngc file format which is then converted to the generic database file format and finally it is converted into interconnection bit The statistical outcome of the proposed design in terms of estimated frequency, time, power utilization and resource utilization is disclosed through the subsequent table. TABLE I. STATISTICAL OUTCOME OF THE PROPOSED DESIGN | Sr. No | Experimental Results | | |--------|----------------------|---------------------------| | 1 | Software Used | Xilinx VIVADO HLS<br>Tool | | 2 | Frequency in MHz | 315.615 | | 3 | Decoding Time (ns) | 3.11 | | 4 | Power Utilized (W) | 0.00659 | # V. CONCLUSION The architecture of high-speed Viterbi decoder is proposed in this research paper. The proposed design is done by using around Branch Metric Unit, Path Metric Unit and Register Exchange Unit. The assumptions are described using VHDL concurrent language. The experimental results shows that the proposed design efficient in terms of Power, Area, Resource and Time as shown in the Table I. ## References - [1] A. J. Viterbi, "Convolutional codes and their performance in communication systems," IEEE Transactions on Communication Technology, vol. 19, no. 5, pp. 751–772, 1971 - [2] Mathana, J.M., Rangarajan, P. and Perinbam, J.R.P. (2013), "Low complexity reconfigurable turbo decoder for wireless communication systems", Arabian Journal for Science and Engineering, Vol. 38 No. 10, pp. 2649-2662. - [3] Cholan, K. Design and Implementation of Low Power High Speed Viterbi Decoder. Procedia Engineering. 30. 61-68. 10.1016/j.proeng.2012.01.834.. (2012). - [4] M. Mozaffari Kermani and A. Reyhani-Masoleh, "A Lightweight Highperformance Fault Detection Scheme for the Advanced Encryption Standard Using Composite Fields," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 1, pp. 85-91, Jan. 2011. [5] T. Kalavathi Devi and C. Venkatesh, "Design of low power Viterbi decoder using asynchronous techniques," International Journal on Advances in Engineering and Technology, vol. 4, pp. 561–570, 2012. ISSN NO: 1844-8135 - [6] H. Garner, "Error codes for arithmetic operations," IEEE Trans. on Electronic Computers,vol. 15, no. 5, pp. 763 – 770, Oct. 1966. - [7] Mostafa, K., Hussein, A., Youness, H. and Moness, M. (2016), "High performance reconfigurable Viterbi Decoder design for multi-standard receiver", 33rd National Radio Science Conference (NRSC), Aswan, 22-25 - [8] hen, Y.H., Su, M.L. and Ni, Y.F. (2014), "FPGA implementation of trellis coded modulation decode on SDR communication system", International Conference on Information Science, Electronics and Electrical Engineering (ISEEE), Sapporo, 26-28 April, pp. 89-93. - [9] Jinjin He, Huaping Liu, "High-speed Low-power Viterbi Decoder design for TCM Decoders", IEEE Trans. VLSI, Vol. 20, Apr 2012 - [10] Pujara, H. and Prajapati, P. (2013), "RTL implementation of viterbi decoder using VHDL-IOSR", Journal of VLSI and Signal Processing, Vol. 2 No. 1, pp. 65-71. - [11] Y.C. Tang, D. C. Hu, , W. Wei, W. C. Lin, H. Lin, "A Memory-Efficient Architecture for Low Latency Viterbi Decoders," 2009 International Symposium on VLSI Design, Automation and Test, 28-30 April, 2009. - [12] )V.G. Kumar and A. C. Sudhir, "Implementation of Viterbi Decoder using T-algorithm for TCM Decoders," International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering, vol. 3, no 5, May, 2015. - [13] Devi, P.U. and Rao, P.S. (2012), "Viterbi decoder with low power and low complexity for space time trellis codes", International Journal of Engineering Research and Applications, Vol. 2 No. 3, pp. 1359-1365 - [14] M. Mozaffari Kermani and A. Reyhani-Masoleh, "Parity Prediction of S-box for AES," in Proc. IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp.2357-2360, Ottawa, Canada, May 2006. - [15] Shr, K.T., Chen, H.D. and Huang, Y.H. (2010), "A low-complexity viterbi decoder for space-time trellis codes", IEEE Transaction on Circuits and systems I: Regular Papers, Vol. 57 No. 4, pp. 873-885. - [16] Marimuthu C.N, An efficient Viterbi Decoder Architecture, International organization of Scientific Research Journal of VLSI and Signal Processing, 2013,46-50. - [17] John G. Proakis, "Digital Communication", McGraw Hill, Singapore. pp 502-507, 471-475, 2010. - [18] R. O. Ozdag and P. A. Beerel, "An asynchronous low-power highperformance sequential decoder implemented with QDI templates," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 9, pp. 975–985, 2006.