# FAULT-TOLERANT SYSTOLIC ARRAYS: AN APPROACH BASED UPON RESIDUE ARITHMETIC #### Vincenzo PIURI Department of Electronics, Politecnico di Milano Piazza L. da Vinci 32, I20133 Milano, Italy, tel.(39)(2)23993528 #### ABSTRACT Much attention has been recently given to VLSI and WSI processing arrays: systolic arrays are often adopted to execute a wide class of algorithms, e.g for matrix arithmetic or signal and image processing. In this paper a fault-tolerant architecture is proposed to allow reliable computation of systolic arrays by using physical redundancy and residue number coding. Such architecture supplies also information for fast reconfiguration. #### 1. INTRODUCTION Array architectures consist of a number (usually very high) of identical processing elements connected by a regular interconnection grid [11,13]; these characteristics make arrays well suited for VLSI and WSI integration. They are extensively studied for a wide spectrum of high-computing applications ranging from signal and image processing to matrix operations to relational data base support. The continuing growth of interest in such computing architectures, their great complexity, their VLSI or WSI integration and their use in dangerous or critical applications give more importance to the problem of reliability of their computation. Since each processing element in such a system executes a part of the whole computation, the failure of one of them may lead to uncorrect results. The systolic array has to work correctly even if some processing elements are faulty [4,6,7]; besides, it has to supply enough information to allow an easy, fast reconfiguration of the whole array when a permanent fault is detected in a processing element [15]. High reliability and availability of arrays may be achieved by means of fault-tolerance techniques. Two main methodologies may be found in literature: the first approach is recondancy (either physical or time redundancy) [17], the second one is coding [3,14,22]. In physical redundancy additional circuits are added to the basic architecture to allow error detection and correction, while in time redundancy the computation is repeated more times and results are compared. These solutions present two drawbacks: silicon area for additional circuits may be great and delays required by redundant computation may became unacceptable. On the other hand, by properly coding input and output data it is possible to detect and correct errors due to faults; but, also in this case, the greater number of bits introduces execution delays and additional area occupation. Whenever a fault is detected, it may be masked by correcting the results or it may be ignored and the array may be reconfigured to exclude the faulty processing element [15]. Permanent faults may be usually treated by using off-line testing and reconfiguration; while, transient faults and critical applications require techniques which are able to assure the correct functioning of the whole system and to supply data for the identification of faulty elements allowing a fast reconfiguration. In this paper a mixed approach to fault tolerance for processing arrays is presented: physical redundancy due to replication of computational elements is associated to the adoption of a residue number code [19,20] for input and output data. Redundancy allows to detect and correct errors and to generate data for reconfiguration, while residue data representation bounds the increase of silicon area and the computation delay. In this way it is possible to exploiting the properties of modular redundancy without replicating the whole processing element. This approach to fault tolerance was firstly proposed in [16,18] for multipliers; here, it is shown how it may be adapted to processing arrays. Error detecting and correcting architectures are proposed for linear and bidimensional systolic arrays: additional circuits are also introduced to allow a fast identification of faulty elements and, then, an easy reconfiguration. The definition of the error models and the choice of the bases of the residue number system are discussed: constraints upon them and mathematical properties are presented to assure the correction of outputs. Fault-tolerance may be locally introduced in arrays: for example, it is possible to design fault-tolerant processing elements or it is possible to repeat computation on different neighboring elements and compare results. This approach requires a wide silicon area and high delays; moreover, unregularity may generate difficulties in circuit design. A more profitable approach is the definition of globally fault-tolerant arrays: in this case the whole array is replicated and results at the edges of the array are examined to detect errors due to faults. In particular, the basic architecture that is here considered consists of a usual processing array, in which operations among integers are performed in binary representation without truncation or rounding, and of a residue network able to correct errors and to outline the faulty elements. In the residue network a set of processing arrays perform the same operations of the usual binary array, but using residue representations of data: the results of the binary array are compared with the results of the residue arrays to detect errors and, then, a correction newtwork allows to correct the results of the binary array. The residue network may identify the faulty processing elements reconfiguration. Several architectures may be designed by using this approach to achieve different degrees of fault tolerance; for example, the following policies with different complexity and cost, may be adopted: - error detection followed by reconfiguration; - error correction without reconfiguration (one or two faults may be tolerated in the whole array); - correction of the first error, detection of the second error and, then, reconfiguration; - error correction and reconfiguration after the first or the second fault. The main features of proposed architectures are: the strong regularity of the whole structure (allowing an easy VLSI or WSI implementation), the bounded increase of computation time and of silicon area due to redundancy, the minimization of hard-core (i.e. the unprotected area). ## 2. ARCHITECTURE OF PROCESSING ELEMENTS An array is composed by a set of simple processing elements connected by a proper network [11,13]: thus, the characteristics of these architectures depend on the properties of processing elements and on the kind of interconnection network. Before presenting the architecture and its overall properties of self-checking and fault-tolerance, it is necessary to discuss the structure of each processing element to introduce its features, which allow to achieve the desired behavior of the whole array. The fault-tolerant processor array is composed by processing elements of two kinds: the elements of the first kind operate in binary representation and belong to the binary array, while the elements of the second kind belong to a residue arrays of residue network and use a representation of data in a suited Residue Number System [19,20]. Here, the structure of such processing elements is briefly described. From literature it is possible to extract a simple common structure of the processing elements belonging to the binary array (see fig. 1) [11,13]. Usually three inputs are supplied by the environment external at the element: one is the data coming from the host computer into the binary array, the second one is the result of the computation performed by the adjacent element, the third one is the weight for combining the other input data. The outputs of each element are the input from the host, the weight and the result of the computation executed in the element itself. Computation in a processing element is a linear weighted combination of the data from the host computer and the result of the adjacent element: data from the host is multiplied by proper weight and, then, added to the result of the adjacent element. The direction in which data flow may be changed to implement some classes of algorithms, without modifying the results of this paper. b) X <- X in <- Win Yout <- REV in +REW in \*X in b Fig. 2 - Residue processing element By using this structure t is possible to execute a wide set of algorithms for different applications characterized by reglarity (i.e. repetitive computations on large set of data) and high computing [11]. In digital signal processing element may be adopted to implement, via a suited interconnection no work, FIR or IIR filtering, 1-D or 2-D convolution, discret Fourier Transform, and other operations [9]. Some problems in matrix arithmetic may be solved with such processing elements, e.g. matrax-vector matrix-matrix multiplication, matrix triangular- and image processing this multiplication, ization, solution of triangular linear systems [10]. The processing elements 😁 residue arrays are similar to the processing elements of the binary array; they are structurally equivalent to binary elements, but they execute all the operations in the adopted Residue Number System [\*\*,20]. Therefore, the binary multiplier and the birry adder of fig. 1 are substituted by devices working in representation (fig. 2.a). #### Implementation o the residue processing elements Many techniques were proposed to implement residue arithmetic devices [19,20]; for example, the simplest circuit is a ROM or FLA, whose inputs are the operands and the output is the result of operation. Nevertheless, these structures are highly area consuming. Very compact solutions may be designed for particular bases, such as low-cost bases [1]. For other bases special-purpose devices may be considered. A general solution for a modulo-b adder consists in adding to the inputs the constant (2\*\*log<sub>2</sub>(b-1)-b): if a carry is generated, the result has to be corrected by adding b to it and discarding the new carry; this structure is compact and fast. alternative solution is shown in fig. 2.b; multiplication and sum are performed by usual binary devices: the fina. result is the input of a correcting circuit whose output is the residue representation of its input. Anyway, the area used by the residue processing element is smaller than the area of the binary element, because the number of bits of data is less. On the other hand, the additional delay in computation, due to modulo correction, is shorter than the reduction of computation time due to the minor number of bits: therefore the result of residue element will be approximately available at the same time of the result of binary processing element. # 3. ERROR MODELS OF PROCESSING ELEMENTS Before discussing the detection and correction mechanisms in the array architecture, it is necessary to define the error model, i.e. the influence of faults on computation, for processing elements. Let us assume that a fault (stuck-at-0 or stuck-at-1) in a gate of a processing element causes in the result an additiv∈ error of the type "power-of-2" [16,18]: i.e. for any possible fault, the correct result is derived from the output of the processing element algebraically added to the amount of error. The number of bits affected by a fault depends on the considered device; in this paper it is always assumed that a fault can only produce a single error. If P is the correct result and P' the wrong result, it is P'=P± K.2', where K.6{0,1}. The exponent "i" is related to the position of the fault device, while the sign of the error depends on the input data. The coefficient K, depends on the input data; a fault in a processing element may not appear in the output of that element: error detection depends also on the input data that can mask the error. In this case the fault remains latent, until it will be detected. An error may be identified by examining the difference between the correct result and the wrong one. When a set of pases $\{b_k \mid k=1..n\}$ is considered, the n-tuple of the residues of such difference is called the syndrome s<sub>±1</sub> of an error: $s_{\pm i} = (\mathbf{6}_{\pm i}^{k} | \mathbf{6}_{\pm i}^{k} = RTF + P] = RTE \pm K_{i} 2^{i} J_{b}, k=1..n\}.$ Since the error in some arithmetic units may depend on the input data, iso the syndrome may be related to the input data [18]. Usually, circuits with one, two or three diffe ant syndromes for each error are considered; these errors will be referred as single-syndrome errors, double-syndrome errors and triple syndrome errors respectively. If two faults occur, double errors may be introduced in the result of the processing element. The errors are still additive as for the single error case, i.e. they are always summ of powers of 2 [18]. The global error is error is $P^{i} = \pm K \cdot 2^{j} \pm K \cdot 2^{j}.$ In this case the specimes $\frac{1}{2}$ is given by: $S_{\pm i, \pm j} = \{c_{\pm i, \pm j} | c_{\pm |$ the syndrome due two faults may be equal to a syndrome due to a single fault or may not detect any error (power of 2 ith same magnitude and opposite sign). The bases of the Residue Number System have to be chosen so that $\varepsilon$ , previous events are detected. coefficients equal faults are possible If more than two sults occur, there are multiple errors in the outs to of the processing element. The error is a linear combination of powers of 2 with > ±1; also in this case masking or coincidence with syndromes due to minor number of When two or more a fors are considered, they may be partitioned in two .asses [18]: consecutive p manent errors (or, briefly, consecutive errors), due to permanent faults which takes ace in different times. The syndrome product i by the first failure may be stored in a register: correction of a second permanent failure may be performed more easily. - "any kind" errors, i.e. due to permanent, transient or intermittent faults occurring either in the same time or in subsequent times. Generally, it is possible to assume that errors are consecutive since the probability of contemporaneous occurrence of more faults is very low: this fact allows a very efficient error detection and correction. Many arithmetic units satisfy the above error models, e.g. adders (SN7483, Lai and Muroga [12]), multipliers (Dadda [5], Wallace [21], Jayshree and Basu [8]) and more complex networks of full-adders. The unique constraints are that data flow must pass only once through each full-adder and all bits has to be processed in a parallel fashion, so that a fault in one of them produce an error equal to a single power of 2. Every processing element described in the previous section satisfies these error models, because it is a full-adders network in which a data word passes once through each of full-adder. The models are correct also if buffers or latches are introduced in the processing elements: in fact a stuck-at fault of a bit produce an error of the type power of 2 at the output of such devices. Note that all the assumptions are still satisfied when the processing elements are connected in an array: therefore, a fault in a processing element produces an error equal to a power of 2 at the edges of the array (if input data do not mask it) and error models discussed above hold. #### 4. ARCHITECTURES FOR FAULT-TOLERANT ARRAYS The processing elements previously described may be connected by a interconnection network to obtain an array, i.e. a regular architecture for high-computation applications [11,13]. When a fault occurs in a processing element, the computation may produce erroneous outputs. In this section an approach is proposed to achieve fault-tolerant arrays by means of modular redundancy and the properties of Residue Number Systems: these architectures are able to detect and, possible, correct the errors due to a bounded number of faults when given constraints are satisfied. To allow the survival of the array when more faults occur, the host computer, which drives its computation, must stop the array and reconfigure it. Reconfiguration [7,15] consists in reconnecting the processing elements to exclude faulty ones by sending suited signals to the interconnection network. Then, the computation may be restarted. The capability to output the correct results, even if faults occurred (but less than the upper bound), reduces the frequency of the reconfigurations and, therefore, increases the availability of the system. To decrease the time required by reconfiguration and to assure a greater availability of the system, it is necessary that the architecture itself is self-checking, i.e. it has to identify the faulty element. In this way the host computer acquires the position of the faulty element, selects the new interconnection of processing elements and generates the corresponding signals for the interconnection network. Otherwise, it has also to test the processing elements to identify the fault: but this operation may require too much time. Linear array Horizontal computing bidimensional array Bidirectional computing bidimensional array Fig. 3 - Different topologies for arrays Different types of arrays are here considered: linear arrays and bidimensional array (fig. 3) [11,13]. Two structures are identified for bidimensional arrays, accordingly with the direction of transmission of the results. In horizontal-computing bidimensional arrays the results of computation are transmitted in horizontal direction, while in vertical direction input data are passed. In bidirectional-computing bidimensional arrays the results of computation is passed horizontally and vertically to the adjacent processing elements. Note that interconnection networks are not drawn in fig. 3 and in the following ones, since they depend on the reconfiguration capabilities introduced by the designer. Basic fault-tolerant architectures for linear arrays are presented: their behavior and their performance are discussed. Then, bidimensional arrays are considered; the solutions considered for linear are extended to horizontally-computing bidimensional structures and an effective solution is proposed to identify the faulty processing element in the array. Characteristics and properties of fault-tolerant bidirectional-computing bidimensional arrays are derived immediately from horizontal-computing architectures: in fact in bidimensional-computing arrays also data passed vertically are the result of a computation performed in the processing elements, which are a simple extension of the processing elements presented above. Details on all these architectures will be discussed in the next section. ## 4.1. Linear arrays Here, the architectures for fault-tolerant linear arrays are presented. The most simple solution to add fault-tolerance capabilities to the linear array of binary processing element is shown in fig. 4. The binary array executes a computation receiving input data (integers in binary representation) from the host computer, which controls all activities. The same data are converted in a Residue Number System: each set of binary-to-residue converters generates the inputs for the corresponding residue array. Each residue array executes the same computation of the binary array, but all operations are in modulo: if no fault occurs, its output is the residue of the output of the binary array in the corresponding base. The output of binary array is, then, converted in its residue representation by others binary-to-residue converters. The occurrence of errors may be detected by examining the output of the syndrome persentar, i.e. the differences between each residue of the Fig. 4 - Fault- olerant linear array output of the binary array and the output of corresponding residue array. If all these differences are zero, no error occirs or, better, no error is detected: in this case he multiplexer controller in the error correction network selects the output of the binary array as the correct output. When at least one difference is not zero, an error is detected: its syndrome in the input of the correction generator in the error correction network. The correction generator produces the correction that has to be added to the output of the binary array to obtain the correct result of the computation. Then, the multiplexer controller selects the output of the adder as the final output of the fault-tolerant array. Note that in fig. 4 a very simple error correcting network is slown: it may be used only to correct single errors. Error correcting networks for more complex error models will be presented in the following section. This architecture is able to tolerate a given number of faults according with the syndrome generator and the error correction network adopted. But no information is supplied about the position of the faulty processing elements in the array. As already said, such data are very important to allow a fast reconfiguration and to assure a high availability of the system. To overcome such problem, additional circuits are required: a possible solution is proposed in fig. 5. In this case the error correcting network is able to manage only single errors. In this architecture the intermediate results of the computation, i.e. the outputs of each working processing element, are tested. The outputs of the binary elements are converted in the residue representation and the differences between the residue of the output of every binary element and the output of the corresponding residue element are computed. If all these differences related to processing elements in a given position in the arrays are zero, no error is detected: otherwise there is a faulty processing elament in that position. Reconfiguration algorithm takes advantage of this information and excludes the faulty processing ëlement. Both these architectures minimize the hard-core, i.e. the unprotected area; in fact, it is reduced only to the last stage composed by the error correction network, while the syndrome generator itself is Fig. 5 - Linear array with fault localization protected from faults. The additional silicon area used to achieve a fault-colerant system may be smaller than the area required by the classical modular redundancy if careful hardware design is performed: moreover, the regularity of the architecture is well suited for a VESI or WSI implementation. Let us consider how recundancy influences the execution speed of the thole architectures. Note that conversion delay may be minimized by means of a careful hardware design [2]. When no fault arises in the arrays, the dela introduced by additional circuits is very low: in fact it is due to the residue conversion of the result of binary array, the multiplexer controller and the multiplexer itself. When faults occur, the calay is due also to the error correction network: sucl additional delay depends on the type of error model apopted for the processing elements. Anyway, the right result is immediately available after having ddes the correction; this is an advantage of the method here proposed on techniques using Redund at Residue Number Systems: in such approaches a time- onsuming conversion from the residue representation o the binary one is in fact required [2,20]. # 4.2. Bidimensional arras To achieve higher competation speed, the processing elements may be connect disabidimensional regular grid: here, some ar hit stures achieving fault tolerance are proposed for this configuration. A detailed description horizontal-computing bedist isional arrays; similar architectures and techniques may be easily derived for bidirectional-computing bidimensional arrays, in which also data flowing vertically in the structure are the result of the loom, station performed in the processing elements. chniques are given for The simplest architecture is shown in fig. 6; it is very similar to the slution presented for linear arrays in fig. 4. The binary array executes the algorithm assigned by the lost computer, while the residue arrays operate in | Residue Number System. The outputs at the ec e f the binary array are converted in the residu in resentation and compared with the outputs of he residue arrays: if the resulting syndrome is zero, no error is detected; otherwise, the output of the binary array is corrected by the error correction network. Fig. 6 - Fault-tolerant bidimensional array Such network may also signal the row of the processor matrix containing the faulty processing element, in fact the output of each row is separately verified. But, the faulty element may not be identified because the whole output of the row is tested. To overcome this problem, an architecture like that presented in fig. 5 might be adopted. Nevertheless, the number of wires and the number of binary-to-residue converters are too high: therefore this solution is not suitable for VLSI or WSI integration. To reduce the numbers of wires and converters the intermediate outputs might be latched and passed along the columns of the arrays: they should be compared only at the low edge of the arrays. In this case two clocks and register files (one for each column) are required: moreover, also this solution requires too much silicon area and it is not suitable for intergration. attractive architecture for fault-tolerant horizontal-computing bidimensional arrays based on residue arithmetic is shown in fig. 7: intermediate outputs of each column of processing elements are added together to generate the output Y' out To compute such sums the basic processing elements have to be modified, as it is shown in fig. 8.a; in each processing element an adder is inserted: its inputs are the output of the basic processing element considered until now and the partial sum produced by the adder of the element in the previous row and in the same column. Note that the sums Y' computed by the additional adders are obtained with an algorithm which is similar to the target algorithm executed by the array, since they are sum of products of the inputs X and W. Besides, all conditions on the data flow Fig. 7 - Bidimensional array with fault localization Fig. 8 - Processing element for fault localization through the circuits are still satisfied: therefore, the error models are valid and, thus, these sums give information about the state of the processing elements of the corresponding column. The additional sums produced by the binary array are converted in their residue representation; the set of the differences between the residue representation of one of those sums and the sums produced by the corresponding column in the residue arrays are similar to the syndrome generated by the syndrome generator of the architecture in fig. 6. The syndrome obtained at the right edge of the array allows to detect an error and to identify the row of the faulty processing element; in a similar way, the additional syndrome generated at the bottom of the array allows to identify the column of the faulty processing element. These two information are enough to identify uniquely the faulty element for a subsequent reconfiguration of the whole architecture. It may happen that the input data of the array mask a fault in the row test, but not in the column test (or vice versa): in this case the fault has to be considered latent, even if the host should store the position of the column (row) containing a faulty element to identify it as soon as possible. the column with the faulty element disappears. crencome thi smaller silicon are:.. tolerance may be easily introduced in Fault tolerance may be easily introduced in bidirectional-computing bidimensional architectures by adopting solutions similar to those proposed for horizontal-computing anchitectures by adding another error correction network at the bottom of the binary array: error detection and correction are performed in the same wey as in horizontal-computing bidimensional arrays. On the other hand, the additional adder of processing element in fig. 9 was otects the multiplier and the first adder, but it does not allow to detect errors due faults in the additional adder itself. If such circuit fails, only the column containing the faulty processing element may be identified, not its row since the check on the rows is passed by all elements. This may be acceptable since the output of the array is corre to but the capability to identify drawback two solutions are available. First, it is possible to add a third adder in the basic structure of the processing element as it is shown in fig. 8.5. The output of this adder is the value Y" .; while its inputs are Y' (computed the value Y"; while its inputs are Y' (computed by the cell itself and the value Y" (equal to the output Y" of the previous cell). If a fault occurs in the second adde, it may be detected by examining the output Y" at the edges of the arrays, since also in this case all constraints on the structure of processing element as it and data flow are estimated. processing element and on data flow are satisfied. Therefore, the column of the faulty processing element may be identified by examining the output Y' at the bottom of the arrays; while the row may be obtained from Y and, eventually, Y" out An alternative solution is shown in fig. 8.c. The two additional adders he here substituted by a unique adder with three in ot : its computes a check sum for each processing element. Its inputs are the target output function Y f the processing element, Ying equal to the che aum of the upper processing element, Y" equal to the check sum of the leftmost processing element. This structure allows to detect errors in a similar way of the previous solution, but a careful hardwar $\epsilon$ sign may allow to obtain a ## 5. RROR CORRECTION In this section data is of the architectures for error detection and correction are presented and analyzed: for each type of error model, the constraints upon bases of the adopted Residue Number System are given to a sure correction. First, error due to a single fault is considered; then, the occurrence of two faults is discussed. It has be pointed out that gresentation deals only with the architectural aspects of error detection and correction, since theometical foundations are already well-known [19,20]. # 5.1. Single error correction The anchitecture whice allows to detect a single error in a linear array is shown in fig. 9.a, where only one residue array is used in parallel with the binary array. The residue representation of data consists in the resides of the data in the selected base. The alternative colution presented in fig. 9.b (with a single residue array) is able to supply the position of the faulty processing element in the linear array for reconfiguration. Similarly, the architectures of figg. 10.a and 10.b (with only one residue array) may be adopted for single error detection in bidimensional architectures. Whenever the residue of a result of the computation of the whole binary array differs from the result of the computation of the whole residue array, the error due to a fault is detected and the host computer has to reconfigure the faulty array. Fig. 9 - Single error detection in linear array Fig. 10 - Single error detection in bidimensional arrays Nevertheless, these architectures is not allow to correct the result of the binary array, since no enough information is available. As in usual structures based upon Triple Modular Redundancy [17], triplication of computation modules is required to correct a single error by voting their results. Thus, to tolerate a single fault the architectures presented in figg. 4, 7, 6 and 7 may be used, where two residue arrays are added to the binary array. The first architecture and the second one implement linear array, while the others are bidimensional architectures; the second architecture and the fourth one signal also the faulty processing element, allowing easier and faster reconfiguration. When the data word of minary array has n bits, the syndromes of all powers of 2 with either positive or negative coefficient must be unique to correct a single error, i.e. the couple of bases (b<sub>1</sub>,b<sub>2</sub>) must satisfy [16,18]: 1.a) $$s_{\pm i} \neq s_{\pm j}, \forall i=1,2,...,n, \forall j=1,2,...,n, \forall i\neq j,$$ 1.b) $s_{\pm i} \neq s_{-i}, \forall i=1,2,...,n, \forall j=1,2,...,n.$ In this case the correction generator is able to produce, without ambiguity, the correction which has to be added to the actual output of the binary array to obtain the expected result. The correction generator may be implemented by means of a combinatoric circuit (ROM, PLA or a custom circuit): its input is the syndrome, while its output is the correction. The <u>coverage</u> assured by a couple of bases, i.e. the number of bits which the couple is able to correct, may be evaluated for odd bases with a simple algorithm presented in [16]. # 5.2. <u>Double error correction</u> The architectures for single error correction are also able to detect two errors, but not to correct them, as in modular redundancy. Besides, if two errors occur, two residues in the syndrome may be zero, not allowing detection of the error. To avoid this problem, it is possible to select a couple of bases generating a non-zero syndrome for any combination of two errors; but the required bases have high values, i.e. residue processing elements are too great. An alternative solution consists in adopting a triple of bases such that no more than one residue is null for any couple of errors. To correct two errors, three residue arrays are required in the structures presented in figg. 4, 5, 6 and 7. Also in this case the architectures of figg. 5 and 7 allow a faster reconfiguration in comparison with the other two structures. As already said in the section about error models, there are two types of double errors: consecutive and "any kind" errors. Accordingly with the type of the double-error model, it is possible to design the correction generator and to identify the constraints to which the bases must satisfy to allow the error correction. Let us consider "any kind" errors; the correction generator is similar to that designed for single error correction: it consists of four combinatoric circuits (such as ROMs, PLAs or a custom circuits), one for double errors, the others for single errors. To correct single and double errors, it is necessary that the triple of bases satisfies the following constraints which allow to identify uniquely the error in n-bit outputs [16,18]: - any couple of bases must be able to correct any single error; - 2. S' \( \) S" = \( \text{0}\), where S'=\( \text{s} \) is the set of the syndromes due to single errors and S"=\( \text{s} \) is the set of the syndromes due to two errors; \( \delta \) is - the syndromes in S" must be unique, i.e. all combinations of powers of two must have different syndromes, if a double error takes place, one residue can be zero at the most. The architectures for correction of consecutive errors are more interesting because the probability of contemporaneous faults is very low in comparison with the probability of consecutive faults, even if the latter architectures are not able to manage transient or intermittent faults. The structure of correction generator and the constraints on the triple of bases for consecutive errors depend on the number of different syndromes produced by a permanent fault. Let us consider single-syndrome errors. They may be managed by the correction generator shown in fig. 11. When no error occurs in the arrays, the signal & allows to select the output of the binary array directly. As soon as the first error arises, it is recognized by a suited combinatoric circuit (as in single error architectures): such circuit produces the corresponding correction, which is stored with the syndrome in a register. The signal $\alpha_{\rm s}$ controls selection of the correct output. Until no other fault occurs, the output of binary array is corrected by the above combinatoric circuit, but corrections and syndromes are not stored. When a second fault is detected, the circuit for single errors is not able to manage it. In this case, the difference between the stored syndrome and the syndrome generated after the second fault identifies the new correction in a second combinatoric circuit; such second correction is added to the sum of the stored correction and the output of the binary array to obtain the correct result. The signal & selects the correct output. This technique allows to implement compact correction generators: in fact the combinatoric circuit for "any kind" errors is surely greater than the circuits required by two consecutive single-syndrome errors. Fig. 11 - Correction generator for single-syndrome consecutive errors To correct two consecutive single-syndrome errors, the triple of bases must satisfy the following constraints [16,18]: - any couple of bases must be able to correct any single error; - 2. S' n S" = Ø; - 3. the syndromes in S' must be unique, i.e. all powers of two must have different syndromes; this condition allows to identify the second error when the previous one holds; - if a double error takes place, one residue can be zero at the most. Let us consider now [16,18]. The correction circuit presented for syndromes of the first smaller than the area for "any kind" errors. resuits in which a fault produces double syndromes the triple of bases must satisfy the same constructs of single-syndrome case merator is similar to the ingle-syndrome errors, but there are two regists to store the possible syndromes of the first error detected and their corrections (fig. 12). The signals $\alpha_1$ and $\alpha_2$ allow to select the suited correction for each output after the first falls, while the signal $\alpha_3$ select correct output after the second fault. Also in this case the silicon area required by correction generator is Fig. 12 - Correction generator for double-syndroma consecutive errors Finally, when triple-syndrome errors are considered, the triple of bases must satisfy the same constraints of single-syndrome errors and the following one [16,18]: 5) $$S_{ij}^{"} \cap S_{ij}^{"} \cap S_{i+1}^{"} = \emptyset$$ where $S_{x}^{"} = S_{xj}^{"} = \emptyset$ ,2,...,n) $\forall x \in \{i,-i,i+1\}$ . The structure of correction generator is an upgrade of the sincuit for double-syndrome Nevertheless this more somplex structure and the degradation in performance can justify the choice of full-adders with a low tan-out, i.e. with no more than two different syndromes for any fault. To compute the coverage assured by a triple of bases, i.e. the number of correctable bits, properties of residues of powers of 2 have been investigated and a computer program has been written considering odd bases and double-syndrome errors [16]. The results obtained are presented in tab. 1: when more triples supply the same coverage, the triple with the minimum sum of the squares of the bases has been selected. The architectures discussed above may be extended to consider the cases of three or more errors. But the silicon area required to implement such architectures is too great in comparison with benefits obtained by error correction. ## 6. COHCLUDING REMARKS The great interest in array architectures, their great complexity and their use in critical applications give more importance to the problem of reliability of their computation. High reliability and availability of arrays may be achieved by means of fault-tolerance techniques. Two main methodologies may be found in litersture: the first approach is redundancy (either physical or time redundancy), the second one is coding. | - | | | | | | | | | | |-----|----------|------------|------------|-------------|-------|-------|------------|--------|--| | ١ | | Base | s | Single | e e . | ٢ | | 2 err. | | | - 1 | | | | corre | | using | pairs | consec | | | ١ | | | | of bas | | | | 1 1 | | | ı | i | l j | l k | l ij | 1 5 | j k | MIN | i [ | | | ī | 3 | 1 13 | 1 23 | l 12 | · · · | l 132 | 1 12 | 1 10 I | | | i | | 1 11 | 13 | 30 | | 60 | 1 12 | 12 | | | i | 7 | 1 11 | 19 | i 30 | | 45 | 1.8 | i 13 i | | | i | 9 | 1 11 | ĺ 17 | 1 15 | i . | 40 | 1 15 | 15 1 | | | i | 7 | 111 | 23 | 30 | | 110 | 30 | 16 | | | i | 11 | 19 | 1 21 | 45 | | 1 18 | 1 18 | 18 | | | ĺ | 5 | 111 | 19 | 20 | | 45 | 20 | 20 1 | | | Ì | 7 | 23 | 25 | 33 | , | 150 | 33 | i 21 i | | | - 1 | 3 | 23 | 29 | 22 | 1 | 150 | i ::2 | i 22 i | | | - 1 | 7 | 11 | 17 | 30 | | 40 | 2.4 | 24 | | | 1 | 13 | 23 | 31 | 132 | | 55 | 1 :5 | 25 | | | ı | 15 | 19 | 29 | 36 | | 150 | 2.8 | 28 | | | ļ | 7 | 11 | 29 | 30 | | 140 | 30 | 30 | | | - ! | 13 | 19 | 23 | 36 | 1 - | 150 | 2.6 | 31 | | | -! | 5 | 23 | 27 | 44 | | 150 | 3.6 | 32 | | | -! | 11 | 13 | 23 | 60 | 1 | 132 | 60 | 34 | | | - ! | 11 | 13 | 19 | 60 | | 36 | 36 | 36 | | | ł | 11 | 19<br> 17 | 23<br> 19 | 45 | 1 | 150 | 4.5 | 38 | | | H | 13 | 23 | 29 | 40<br> 132 | | 72 | 40 | 40 | | | H | 11 | 23 | 27 | 1110 | ٠, | 150 | 4.2 | 42 | | | i | 9 | 23 | 25 | 1 66 | | 150 | 45 | 45 | | | i | ģ | 23 | 29 | 66 | | 150 | 66 | 51 | | | i | ģ | 25 | 29 | 60 | | 70 | 0.5 | 56 | | | i | 23 | 25 | 27 | 220 | 1 | 180 | 1.0 | 57 | | | i | 11 | 35 | 49 | 60 | | 34 | 6.0 | 58 | | | Ì | 11 | 23 | 35 | 110 | | 132 | i sā | 60 | | | - 1 | 17 | 19 | 47 | 72 | 1 | 250 | 1.2 | 61 | | | ļ | 23 | 29 | 35 | 250 | | 84 | 1 4 | 63 | | | ļ | 21 | 23 | 29 | 66 | | 250 | 5 | 66 | | | ļ | 9 | 29 | 47 | 84 | . 1 | 250 | | 67 1 | | | ļ | 23 | 25 | 29 | 220 | ٤. | 70 | J | 70 | | | -! | 17 | 19 | 23 | 72 | | 198 | | 72 | | | - ! | 23<br>11 | 37<br>23 | 59 | 250<br>110 | | 250 | 2.0 | 75 | | | ł | 23 | 29 | 39 | 250 | 1 | 231 | 110 | 82 | | | - | 19 | 23 | 1 29 | 198 | 1 2 | 84 | 84 | 84 | | | i | 23 | 25 | 1 37 | 1 220 | 2 | 250 | 198<br>190 | 89 | | | i | 23 | 33 | 47 | 1110 | ÷ : | 230 | 1 113 | 90 1 | | | i | 11 | 37 | 49 | 180 | 2 | 250 | 180 | 111 | | | i | 27 | 41 | 49 | 180 | 1. | 250 | 126 | 120 | | | i | 31 | 37 | 53 | 180 | 2 | 234 | 130 | 121 | | | ĺ | 23 | 25 | 59 | 180 | ž. | 250 | 1 220 | 173 | | | ĺ | 23 | 47 | 53 | 220 | Ē | 250 | 1 2 30 | 191 | | | ĺ | 29 | 47 | 59 | 250 | i i | 250 | 250 | 228 | | | - | 45 | 47 | 53 | 250 | . 3 | 250 | 2:0 | 238 | | | ļ | 19 | 53 | 59 | 250 | i | 250 | 2∋0 | 250 | | | ļ | 23 | 47 | 59 | 250 | ī. | 250 | 2.0 | 250 | | | f | 25 | 49 | 59 | 250 | ž | 250 | 2 20 | 250 | | Tab. 1 -verage In this paper a mixed approach to fault tolerance for processing arrays was prese due to replication of coputational elements is associated to the adoption if a residue number code for input and output da detect and correct errors reconfiguration, while re bounds the increase of computation delay. In this exploiting the properties without replicating the who Error detecting and corr proposed for linear and arrays: additional circuit allow a fast identification then, an easy reconfi availability of the syste defined and the problem of number system is presented. Finally, single and multiple discussed. An interesting increase of global execuant time when no fault occurs in the architecture the arrays. Otherwise, the the error correction netal to in particular by the adder before the multiplexers nevertheless, a careful hardware design of su degradation. ted: physical redundancy Redundancy allows to nd to generate data for due data representation silicon area and the way it is possible to of modelar redundancy process ag element. ting ar mitectures are bidimensional systolic are also introduced to of fault, elements and, ration and an high The er or models are he choice of the residue error commection is also esult is the negligible ter a re onfiguration of imputation is degraded by circui:s may #### REFERENCES - [1] D.P.Agarwal, "Modulo 2<sup>n</sup>+1 Arithmetic Logic", IEEE J.Electronic Circuits and Systems, vol.2, no.6, Nov.1978 - [2] Alia, Barsi e Martinelli, "A Fast VLSI Conversion Between Binary and Residue Systems", Information Processing Letters, 18, 30, Mar. - [3] A. Avizienis, "Arithmetic Error Codes: Cost and Effectiveness Studies for Application in Digital System Design" IEEE Trans. on Comp., Nov. 1971 - [4] Y.H. Choi et al., "Fault Diagnosis of - Reconfigurable Systolic Arrays", Proc. ICCD 1984 [53] L. Dadda, "Some Scheme for Parallel Parallel Multipliers", Alta Frequenza 34, May 1965 - [6] D. Fussel, P. Varman, "Fault Tolerant Wafer Scale Architectures for VLSI", Proc. 9th Symp. Comp. Architecture, Apr. 1982 - K.H. Huang, J.A. Abraham, "Fault Diagnosis of Reconfigurable Systolic Arrays", IPPC, Aug. 1984 - [8] T. Jayshree, D. Basu, "On Binary Multiplication Using the Quarter Square Algorithm", IEEE Trans. on Comp., Sept. 1976 - [9] W.K.Jenkins, "Recent Advances in Residue Number Techniques for Recursive Digital Filtering", IEEE Trans. ASSP, vol. ASSP-27,no.1, Feb. 1979 - [10] H.T. Kung, C.E. Leiserson, "Systolic Arrays for VLSI", Sparse Matrix Proc. 1978, Society for Industrial and Applied Mathematics, 1979 - [11] H.T. Kung, "Why Systolic Architectures?" - Computer, Vol. 15, No. 1, Jan. 1982 [12] H.C. Lai, S. Muroga, "Minimum Parallel Binary Adders with NOR (NAND) Gates", IEEE Trans. on Computer, Sept. 1979 - [13] F.T. Leighton, C.E. Leiserson, "Wafer Scale Integration of Systolic Arrays", Proc. 23th Ann. Symp. Found. Computer. Sci., 1982 - [14] J.L. Massey, G.N. Garcia, "Error-Correcting Codes in Computer Arithmetic", Advances in Information Systems Science, 1972 - [15] R. Negrini, M. Sami, R. Stefanelli, "Fault Tolerance Techniques for Array Structures Used - in Supercomputing", Computer, Vol. 19, Feb. 1986 [16] V. Piuri et al., "Residue Arithmetic for a Fault Tolerant Multiplier: the Choice of the Best Triple of Bases", Presented at EUROMICRO '86 Symposium, Venice, Italy, Sept. 1986 - [17] D.P. Siewiorek, R.S. Swarz, "The theory and practice of reliable system design", Digital Press, 1982. - [18] R. Stefanelli, M. Annaratone "A Multiplier with Multiple Error Correction Capability", Arith-6, Aarhus Denmark 1983 - [19] N.S.Szabo e R.I.Tanaka, "Residue Arithmetic and Its Application to Computer Technology", McGraw-Hill, New York, 1967 [20] F.J.Taylor, "Residue Arithmetic: a Tutorial with - Examples", IEEE Trans. on Computers, May 1984 - [21] C.S. Wallace, "A Suggestion for a Fast Multiplier", IEEE Trans. on Comp., Feb. 1964 [22] W. Watson, C.W. Hastings, "Self-Checked - [22] W. Watson, C.W. Hastings, "Self-Checked Computation Using Residue Arithmetic", IEEE Proc. 54(12), Dec. 1966