# Daniel E. Atkins Department of Electrical and Computer Engineering and Program in Computer, Information, and Control Engineering The University of Michigan Ann Arbor, Michigan 48104 # Developments Reported Prior to 1972. This paper reviews work related to the theory and application of higher-radix, non-restoring division as originally defined by Robertson in 1958 [1]. The class of division methods proposed by Robertson is described by the recursive relationship: $$p_{j+1} = rp_j - q_{j+1}d \quad j = 0,1,...,m-1$$ where $p_i$ is the j<sup>th</sup> partial remainder, p is the dividend, $p_{m}$ is the remainder, q is the j th digit of the quotient to the right of the radix point, m is the number of digits, radix r, used to represent the quotient, d is the divisor. The quotient digit set is {-n, -(n-1),...,0,1,...n} where 1/2(r-1) < n < r-1. Each $p_j$ satisfies $|p_j| \le \rho |d|$ where $\rho$ , the redundancy ratio, is n/(r-1). The effect of the redundancy ratio, is n/(r-1). The effect of the redundant representation of the quotient is the ability at each step of the recursion to select $q_{j+1}$ based upon estimates of the full precision divisor, d, and shifted partial remainder rp<sub>j</sub>. In reference [1] Robertson shows a division with r=4, n=2, in which quotient digits are selected based upon a comparison of $4p_j$ with 0.5d and 1.5d to a precision of 7 bits. Also included is an example of the case r=10, n=7, requiring 3 digit comparisons of $10p_j$ with the quantities 0.5d, 1.5d, 2.5d. The examples given by Robertson require low-precision computation of multiples of d preliminary to the recursive operations. In 1968, Atkins published [2] which includes a tutorial review of [1], introduction to the "P-D plot," and discussion of a method for determining sufficient precision in d and $rx_j$ for correct selection of quotient digits. The P-D plot, a plot of partial remainder vs. divisor, is an aid to understanding the quotient selection processes and precision requirements. A point $(d, rp_j)$ on the P-D plot falls within one or more "q(i)-regions," i.e., a region in which $q_{j+1} = i$ is a correct choice. A variety of quotient selection mechanisms are possible but they must all meet the following requirement: Given an estimate, $\hat{d}$ , of a full precision divisor in the range $d-\alpha < \hat{d} < d+\beta$ , and an estimate, $r\hat{p}_j$ of a full precision shifted remainder in the range $rp_j-\lambda < r\hat{p}_j < rp_j+\gamma$ , then if the quotient selection function applied to $\hat{d}$ , $r\hat{p}_j$ produces $q_{j+1}=i$ , it must be case that the rectangle defined by the points $(d-\alpha,\ r\hat{p}-\gamma)$ , $(d-\alpha,\ r\hat{p}-\lambda)$ , $(d+\beta,\ r\hat{p}-\lambda)$ , $(d+\beta,\ r\hat{p}+\gamma)$ lies entirely within the q(i) region. In [2] Atkins distinguished between an "arithmetic model" and a "table look-up" model for quotient selection. Given d and rp , the arithmetic model multiplies $r\hat{p}_j$ by an approximation of $1/\hat{d}$ , rounds to an integer result and returns this integer to be used as $q_{j+1}$ . The arithmetic type model picks $q_{j+1}=i$ , if $i-1/2 \le r\hat{p} \times (1/\hat{d}) < i+1/2$ which is consistent with the fact that for d=1, $q_{j+1}=i$ is a correct choice iff $i-\rho \le rp_j \le i+\rho$ . The minimum value of $\rho$ is 1/2. The paper gives the number of bits required in $\hat{d}$ and $r\hat{p}_j$ for arithmetic models with $\rho=2/3$ . Two's complement representation is assumed. These results are still believed to be correct, but are presently being reviewed and generalized. The table look-up model consists of a combinatorial logic realization of the P-D plot. The "costs for table look-up models" (precision requirements in $\hat{\mathbf{d}}$ and $\hat{\mathbf{rp}}_j$ ) as given in [2] are not entirely correct. A more general, and correct, method of computation will be cited shortly. In [3] the distinction between arithmetic and table look-up quotient selection mechanisms is subsumed by a generalized quotient selection model as shown in Figure 1. Table 1 produces an estimate of $1/\hat{d}$ and the multiplier uses this value to transform the range of $\hat{rp}$ and $\hat{d}$ . Table 2 is a combinational logic, table look-up implementation of the P-D plot for the transformed divisor and partial remainders. If Table 1 is sufficiently complex to insure that $\hat{ extsf{D}}$ $\stackrel{>}{_{\sim}}$ 1 , then the rounded integer part of $\hat{ extsf{P}}$ is q and Table 2 degenerates (the arithmetic model in [2]). If A = 1 for all $\hat{d}$ , then Table 2 must be sufficiently complex to select correct quotient digits (the table look-up) model. This model points out that there are a large number of intermediate structures between the two extremes of a degenerate Table 1 and a degenerate Table 2. The major results in [2] relate to the cost and performance analysis of "Table 2" shown in Figure 1. Results include: - (1) An upper bound on $\epsilon$ and $\delta$ , the number of bits to the right of the radix point required in $r\hat{p}$ and $\hat{d}$ , respectively. The bound is a function of the maximum quotient digit n; radix r, the end points of the divisor domain, and form of representation of d and rp. - A procedure to synthesize a minimal literal count, sum-of-products realization of Table The arguments of this procedure include all of the parameters mentioned in (1) above. - (3) Expressions to predict the logic complexity of realizations of Table 2 produced by this algorithm. - (4) A complexity analysis, including case studies, that (unfortunately) shows the hardware complexity of Table 2 proportional to $r^2 \log r$ , and the time complexity for quotient generation improving only at a rate proportional to $\log r$ . A floating-point division with r=4, n=2 based upon the methods described in reference [3] has been implemented and is described in reference [4]. ## Developments Since 1972. Developments since 1972 are described in a conference paper [5] and a Ph.D. thesis [6]. Copies of this thesis are available to the interested reader and journal versions of the results are in preparation. In the remainder of this paper, we will highlight some of the results reported in the thesis: - (1) The algorithm for construction of the quotient selector table (Table 2) previously described in the appendix to [3] requires that the quotient selection regions (q(i) regions) be defined by way of logical minimization. With modern LSI array memory technology, direct, ROM implementations of Table 2 become feasible in lieu of minimized sum-of-products forms. Kalaycioğlu [6] has devised an algorithm which determines q(i) regions without the requirement for minimization. (A "pre-minimization" selection algorithm.) Should the q(i) regions generated by Kalaycioglu's algorithm be implemented in minimized form, more prime implicants might be required than by Atkins' method. Kalaycioglu's shows, however, that the additional number would be less than 12%. For a given P-D plot, the time complexity of the Atkins "postminimization" quotient selection table algorithm is n . For the algorithm it is - (2) Kalaycioglu has obtained a lower upper bound for δ and ε than described in [3]. The value of δ or ε derived is based upon worst-case assumptions and may be reduced for a specific set of design parameters. This possibility is easily tested by execution of the pre-minimization quotient selection table generation algorithm. - (3) Reference [5] and a major portion of [6] concern the definition and analysis of organizations which permit time concurrency between quotient digit selection and partial remainder calculation. One method is based upon use of a radix r² selection table which at iteration j produces not q<sub>j+1</sub> but rather q<sub>j+2</sub>. The other scheme overlaps the formations of the estimate of p<sub>j+2</sub> with the formation of the full precision version of p<sub>j</sub>. A combination of the two schemes is also possible. Generation of the quotient selection table for these cases is examined in detail. In particular, for an $r^2$ table the definition of the redundancy ratio , $\rho$ , must be changed from n/(r-1) to (n+1)/(r+1). (4) The thesis also contains a cost-performance case study comparing a higher-radix, non-restoring division to the Goldschmidt algorithm as implemented in the IBM 360/91. The conclusion is that for a 56-bit dividend, a 24-bit divisor and a 24-bit quotient, a radix-4 structure providing concurrency between quotient selection and partial remainder formation is 1.6 times faster and 2.9 times less expensive. (5) Reference [6] also discusses the possibility of application of the division-like techniques to other arithmetic functions. ### Work in Progress. A large collection of theory concerning higher-radix, non-restoring division has been produced. A key to industrial acceptance of any of the ideas is the demonstration of a prototype and dissemination of information in an applications-oriented form. Reference [7] which should be available in early 1976 is an examination of the practicality of ideas described in [5] and [6] in the context of modern MSI, and LSI-integrated components. The work includes the design, construction, and evaluation, of a 24-bit divider based upon some of the newly-developed concurrency ideas. Given the availability of very high-speed multiplier arrays, we plan to reconsider quotient selection schemes using a non-degenerate Table 1 (Figure 1) and degenerate Table 2. The complexity of Table 1 is proportional to r rather than ${\bf r}^2$ . We are also involved in the practical application of theoretical results in the implementation of a special purpose processor for classifying remotely sensed data at the Environmental Research Institute of Michigan. Two papers describing the structure and application of this processor have been submitted to the 1976 Symposium on Computer Architecture. ### Acknowledgement The work described under the heading "Developments since 1972" and "Work in Progress" has been supported primarily by the National Science Foundation under Grant No. DCR74-18573. ### References - [1] J. E. Robertson, "A new class of digital division methods," <u>IRE Trans. of Elec. Comp.</u>, Vol. EC-7, No.3 (Sept. 1958), pp. 218-222. - [2] D. E. Atkins, "Higher radix division using estimates of the divisor and partial remainder," <u>IFEE Trans. on Comp.</u>, Vol. C-17, No. 10 (Oct. 1968), pp. 925-934. - [3] D. E. Atkins, "A study of methods for selection of quotient digits during digital division," Dept. of Computer Science. Report No. 397, University of Illinois, Urbana, 1970. - [4] D. E. Atkins, "Design of the arithmetic units of Illiac III: Use of redundancy and higher radix methods," IEEE Trans. on Comp., Vol. C-19, No. 8. (Aug. 1970), pp. 720-723. - [5] D. E. Atkins and Ü. Kalaycıoğlu, "Concurrency in generalized radix, non-restoring division," Proceedings of Allerton Conference on Circuit and Switching Theory, Oct. 1974, University of Illinois. - "..." [6] U. Kalaycıoğlu, "Analysis and synthesis of generalized radix additive normalization division techniques," Systems Engineering Laboratory, Report No. 88, Department of Electrical and Computer Engineering, The University of Michigan, Ann Arbor, May 1975. - [7] Janis Beitch Baron, "Implementation studies of higher radix, non-restoring division," SEL Report in preparation. Figure 1. Generalized Structure of Model Division (Quotient Selector)