n Comment on “Fast Parallel Prefix Modulo 2+1 Adder” Ghassem Jaberipur and Hanieh Alavi, ECE dept., Shahid Beheshti University Abstract—Costas Efstathiou et al present (IEEE Trans. Computers, Vol. 53, No. 9 pp. 1211-1216) ann-bit totally parallel prefix n (TPP) implementation of modulo 2 +1 adders with6 2 log ݊latency in terms of unit gate delay. We locate a flaw in the logic equation for the most significant bit and present a simple counter example to prove this claim. We provide the relevant correct equation and its derivation details. We also show that it can be implemented within the TPP tree, without additional latency. Furthermore, despite the correctness of the equations for individual carry signals, we point out a missing parallel prefix operand in the corresponding general equation. In lack of any derivation or proof for the latter, we provide the relevant correct equation with derivation details. n Index Terms—Binary adders, Modulo 2 +1 arithmetic, Parallel prefix adders, RNS. 1 INTRODUCTION n nn HE moduli set {2–1, 2, 2+1} is popular in ther0:ש ܩْ ܿ݉ ൌݏݎ ൌ݉ ْ, whereܩ isthe ାଵ ,ଵ,ଵ T applications of residue number system (RNS).carry-out of positionnthe original computation of inM. n nn+1n Modular adders for the 2and 2–1 channels haveNote that sinceM≤no carry is generated in 2+2 –1 been reported vian-bit parallel prefix andn-bittotally thecomputation ofܿ ܩ. Therefore,ש ܩ݉ ൌܿ. ,ଵାଵ ,ଵ parallel prefix (TPP) trees, with (3 2 log ݊) latency in כ כ i≤n–1):ݎ ൌ݉ ْْ ܿܩ ൌݏ ri: (1≤ ିଵ,ଵ ିଵْ ܩ, ିଵ,ଵ כ terms of unit gate delay [2]. Timing coordination within whereܩis the carry into positioniwithin the addition ିଵ,ଵ n the three RNS channels calls for modulo 2 +1 adders with ݉ …݉ ܿ שܩ. ିଵ ,ଵ O (logn) latency. This has motivated the authors of [1] toכ ൌݏ ܿ ݉ ܩThis is simply sum of rn:ݎ ିଵାଵ ିଵ,ଵ. design such adders via a (logn)level TPP, where the the bits in positionnof Fig. 1. overall latency is6 2 log ݊. Unfortunately however, TheS+C stageof Fig. 1 is trusted to a TPP tree, with there is a flaw in the logical equation for the most ܿ ൌܿש ܩ. However, there is flaw in the actual ,ଵ significant bit (MSB) of the sum that leads to wrong equation implemented in Fig. 3 of [1] forrn. results in several instances of the input operands. In this comment we underline the aforementioned flaw 2.1 The flaw in the computation ofrnvia a counter example, provide the derivation details of n It is rightly stated in [1] thatrn =1 ifA+B =2 . Then the the pertinent correct equation and its implementation authors, without providing any proof, conclude that within the same (logn)level TPP tree of [1] such that the ݎ ൌܿ רܲ רݏ, wherePn isthe group propagate signal ሺ62log݊ሻis preserved. Moreover, we offer the latency from position 1 ton. It is not difficult to prove that derivation details for Equation (4) of [1], where our ሺܣ ܤ ൌ2ሻ impliesሺܿ רܲ רݏ ൌ1ሻ. However, the motive is twofold: First, this equation has a key role since converse (i.e.,ሻܲ רሺܿ רฺ ሺܣ ܤ ൌ2ݏ ൌ1ሻ) is not it computes all the TPP carries, while no derivation or always true. In fact it fails in 25% of the cases of input proof is given for it in [1]. Secondly, there is a missing data forn= 8. This claim is supported by exhaustive test parallel prefix operand, although all applications of this via VHDL simulation. However, Example 1 below clearly equation, forn= 8, are correct. demonstrates the flaw. Example 1 (Counter example for ൌ ר ר): ܖ 2 THE FLAWConsider an instance of Fig. 1, forn=4, as depicted in Fig. lead toൌ1ר ܲר ݏ. n 2, wherec4 =0,s0 =1, andP41 =ܿ The modulo 2+1 addition scheme in [1] primarily However,R=|12+12|17=7, which leads tor4= 0.◄computesܯ ൌܣ ܤ 2െ 1, whereA andB arethe n n+1-bit operands in [0, 2 ]. Equation (1), adapted from [1], A0 1 1 0 0= 12 defines ܤ|… ݎݎ ൌ|ܣܴ ൌݎ ݎ interms ofM, ିଵଵ ଶ ାଵ B0 1 1 0 0= 12 where|ܺ| standsforX modulom andݔfor the stands4 2 –1 = 150 1 1 1 1 complement ofx. S1 1 1 1 0 ܴ ൌ|݉݉ …݉ ݉ ሺ2 1ሻ݉|శభ (1) ିଵଵ ାଵ ଶ C0 1 1 0 0 Fig. 1 depicts a typical computation of M (= S+C), where M1 0 0 1 1 1 s ൌa ْb,bc ൌa ש,۩bs ൌaandc ൌa רb. ୧ ୧୧ ୧୧ ୧୬ ୬୬ ୬୬ ୬ Fig. 2: An instance of Fig. 1 forn= 4 The correct equation, as will be derived in Section 3, is: A an an1…a1 a0כ ݎ ൌሺܿ שݏ רܿ שሺݏ שܿ ሻר ܩሻ ْܩ (3) ିଵ ିଵିଵ,ଵ ିଵ,ଵ B bn bn1…b1 b0The correctness of the latter is exhaustively tested via n 2 –10 1… 11 VHDL simulation forn8. =Besides the latter flaw, a S sn sn1…s1 s0parallel prefix termሺܿ ש݃ , ሻmissing in the third is C cn cn–1…c1 c0part of Equation (4) in [1]. The corrected Equation (4), as M mn+1 mn mn1…m1 m0will be derived in the next section, is as follows, where Fig. 1: Computation ofMԭሺܩ, ܲሻൌܩ: The computation ofRcan be analyzed as follows:כ ܩ ൌԭቆ൫ܩ,ଵ, ܲ,ଵ൯ ל ݏ൯ቇש ݃, ሻ ל ൫ܩ, ܲר ሺܿ(4),ଵ ିଵ,ାଵ ିଵ,ାଵ

1

3 THE CORRECTED DESIGN Theγandπequations are justified as follows: ܿ שݏ רܿ שሺݏ שܿ ሻר ܩ ିଵ ିଵିଵ,ଵ The corrected Equations (3) and (4) are derived below. ൌܽ רܾ שሺܽ שܾ ሻר ܽר ܾר ܿש ሺܽש ܾሻ ר ܽר ܾ ିଵ 3.1 Derivation of the most significant bitrר ܩר ܩש ܿnିଵ,ଵ ିଵିଵ,ଵ ൌܽ רܾ שሺܽ שܾ ሻר ܿש ሺܽש ܾש ܿሻ ר ܩRecalling the arithmetic equation forrnfrom Section 2 (i.e, ିଵ ିଵିଵ,ଵ כ ݎ ൌݏ ܿ ݉ ܩ), and݉ ൌܩש ܿ, the ିଵାଵ ିଵ,ଵାଵ ,ଵ ൌܽ שܾ שܿ שܽ רܾ שሺܽ שܾ ሻר ܿר ܩ ିଵ ିଵ ିଵ,ଵ MSBrncan be computed by the logical Equation (5). כ ݎ ൌݏ ْܿ ْ݉ ْܩ ൌൌԭ ቀሺγ, πሻ ל ሺܩ, ܲሻቁ ିଵ ାଵିଵ,ଵ ିଵ,ଵ ିଵ,ଵ כ ሺሺݏ ْሺܩ שܿ ሻሻْ ܿሻ ۩ܩ(5) ,ଵ ିଵିଵ,ଵ כ ݎ ൌԭቀሺγ, πሻ ל ሺܩ, ܲሻቁ ْܩ (6) ିଵ,ଵିଵ,ଵ ିଵ,ଵ Given thatܿ ൌ0ݏ ר,ܿൌݏרܿ,ܩ ൌ݃ש ר ܩ, ,ଵ ିଵ,ଵכ The prefix node that could computeܩthe last in ିଵ,ଵ ݃ ൌܿݏ ר, andܿ ൌݏ ש, the inner ିଵ ିଵ row and columnn–1 of the TPP tree is missing in Fig. 3 of parenthesized XOR equation can be simplified as follows: [1]. This and the path leading tor8, defined by Equation ݏ ْ൫ܩ שܿ ൯ൌሺݏ שܿ שܩ ሻר ݏר ܩש ݏר ܿ ,ଵ ,ଵ ,ଵ (7),are now added in the new Fig. 3 below. כ ൌݏ רܩ שݏ רܩ שܿ רݏ שܿ רܩ ,ଵ ,ଵ ,ଵ, ቀሺγ, πሻ ל ሺ݃, ሻ ל … ሺ݃ݎ ൌԭሻቁ ْܩ (7) ଼ ଵଵ ,ଵ ൌܿ שݏ רܩ שݏ רܩ ൌר ݃ש ר ܩש ݏ ,ଵ ,ଵܿש ݏ ିଵ,ଵThe latency ofγ andπ is2 and 3 unit gate delay. Therefore, the prefix operand (γ,π) is ready as soon as all ר ൫݃ש ר ܩ൯ ൌܿש ݏר ݃ר ൫ש ܩ൯ ש ݏר ିଵ,ଵ ିଵ,ଵ other prefix operands are. This leads to availability ofrnat ݏ רܿ שݏ רሺݏ שܿ ሻר ܩ ିଵ ିଵିଵ,ଵ time5 2 log ݊in terms of unit gate delay. ൌܿ שݏ רሺݏ שܿ ሻר ൫ݏר ܿש ܩ൯ ש ݏר ܿ ିଵ ିଵିଵ,ଵ ିଵb8a8b7a7b6a6b5a54 43 32 2b1a1b a b ab ab a ר ܩൌܿ שݏ רܿ רܩ שݏ רܿ רܩ0 0 ିଵ,ଵ ିଵିଵ,ଵ ିଵିଵ,ଵ Given thatൌcc֜1ൌ0֜רcൌcc, the outer ୬ ୬ିଵ୬ ୬ିଵ୬ cs8ccs4c32s cccc 8776655311c 00 parenthesized XOR equation within Equation (5) can now be simplified as follows: ൫ܿ שݏ רܿ רܩ שݏ רܿ רܩ ൯ْ ܿ(γ,π)(g∨c,p)5 54 4(T1,g1) g,p(g,p) ( ) ିଵିଵ,ଵ ିଵିଵ,ଵ ିଵ 8 88 ,(,p)3 3 ( )(t,g) g pg 7 76 6t,g) 2 2 ൌ൫ܿ שݏ רܿ רܩ שݏ רܿ רܩ שܿ ൯ ିଵିଵ,ଵ ିଵିଵ,ଵ ିଵ ר ൫ܿש ݏר ܿר ܩש ݏר ܿר ܩ൯ ר ܿ ିଵିଵ,ଵ ିଵିଵ,ଵ ିଵ ൌ൫ܿ שݏ רܩ שܿ ൯ר ቀܿר ܿש ݏר ܿר ܩቁ ିଵ,ଵ ିଵ ିଵ ିଵିଵ,ଵ ൌܩ שݏ רܿܿ שܿ רܩ שݏ רܿ רݏ שܿ ר ିଵ ିଵ,ଵ ିଵ,ଵ ିଵ ∗∗∗∗∗ ∗ ש ܿר ܩൌܿ שݏ רܿ שሺݏ שܿ ሻר ܩc *ccc4c3c1 6 ିଵ ିଵ,ଵ ିଵ ିଵିଵ,ଵ c−1 ∗ c7∗ It only remains to apply the latter into Equation (5) that c0 hhs 54h0 h7h3h2h1 6 leads to the desired Equation (3). Direct implementation r rr7r5r4rrrr of this equation leads to the overall latency of7 2 log ݊. 863210 n Fig. 3: The corrected modulo 2 +1 TPP adder כ 3.2 Derivation of(1≤i≤n–2) , כ כ ܩ ൌܩ שܲ רܩ ൌܩש ܲר ሺݏר ܿש ݃ש ܩ ሻ,ଵ ,ଵ,ଵ ,ଵ ,ଵ ିଵ,ଵ REFERENCESൌܩ שܲ רݏ רܿ ש݃ ר൫ שܩ שܲ רܩ ൯,ଵ ,ଵ ିଵ,ାଵିଵ,ାଵ ,ଵ n [1]Efstathiou C., H. T. Vergos, and D. Nikolos, “Fast Parallel-Prefix 2 +1 ൌܩ שሺܲ רݏ רܿ ר݃ ר ሻש ሺܲר ݏר ܿש ݃,ଵ ,ଵ ,ଵ Adder”, IEEE Trans. on Computers, Vol. 53, No. 9, pp. 1211–1216, Sep-ר ܩר ܲሻ ש ൫ܲר ݏר ܿש ݃ר ܩר ܩ൯tember 2004. ିଵ,ାଵ ିଵ,ାଵ,ଵ ିଵ,ାଵ ,ଵ n [2]Kalamatianos, “High-Speed Parallel-Prefix Modulo 2 –1 Adders,” IEEE ൌܩ שሺܲ רݏ רܿ ר ሻש ܲר ݏר ܿש ݃ר ܩ,ଵ ,ଵ ,ଵ ିଵ,ାଵ Trans. Computers, Vol. 49, No. 7, special issue on computer arithmetic, ൌܩ שܲ רݏ ר൫ܿ ש שܿ ש݃ רܩ ሻ൯,ଵ ,ଵ ିଵ,ାଵ pp. 673-680, July 2000. n ൌܩ שܲ רݏ רሺܿ ש ሻሺܿש ݃ש ܩሻ[3]Vergos H.T. and C. Efstathiou, "Novel Modulo 2 +1 Multipliers", Proc. ,ଵ ,ଵ ିଵ,ାଵ of the 9th EUROMICRO Conference on Digital system Design, Du-ൌܩ שܲ רݏ רܿ ש݃ ש רܩ,ଵ ,ଵ ିଵ,ାଵ brovnik, pp.168-175, 2006. ൌԭሺሺܩ ,ܲ ሻל ݏר ሺܿש ݃, ሻ ל ሺܩ, ܲሻሻ,ଵ ,ଵ ିଵ,ାଵ ିଵ,ାଵ Given the above carry signals, which are computable via a TPP tree exactly as in [1], the final sum bitsri canbe computed as follows: כ כ ݎ ൌ݄ ْܩ ൌݏْ ْܿ ܩ (1≤i≤n– 1) ିଵ,ଵ ିଵିଵ,ଵ

3.3 Implementation ofwithin the TPP tree The left operand of the XOR in Equation (3) can be represented as a prefix equation and implemented within the TPP tree as in Equation (6) and Fig. 3, where γ ൌܽש ܾש ܿ andπ ൌܽר ܾש ሺܽሻ ר ܿש ܾ. ିଵ ିଵ