A hard error is an error that damages a circuit irrevocably; a soft error flips the logic states without causing any physical damage to the circuit, resulting in transient corruption of data. They result in transient, inconsistent corruption of data.

The soft-error tolerance of logic circuits is recently getting more attention, since the soft- error rate of advanced CMOS devices is higher than before. As a response to the concern on soft errors, we propose a new method for making asynchronous circuits tolerant to soft errors. Since it relies on a property unique to asynchronous circuits, the method is different from what is done in synchronous circuits with triple modular redundancy. Asynchronous circuits have been attractive to the designers of reliable systems, because of their clock-less design, which makes them more robust to variations on computation time of modules. The quasi delay-insensitive (QDI) design style is one of the most robust asynchronous design styles for general computation; it makes one minimal assumption on delays in gates and wires. QDI circuits are easy to verify, simple, and modular, because the correct operation of a QDI circuit is independent of delays in gates and wires.

Here, we shall overview how to design a QDI circuit, and what will happen if a soft error occurs on a QDI circuit. Then the crucial components of the method are shown: (1) a special kind of duplication for random logic (when each bit has to be corrected individually), (2) special protection circuitry for arbiter and synchronizer (as needed for example for external interrupts), (3) reconfigurable circuits using a special configuration unit, and (4) error correcting for memory arrays and other structures in which the data bits can be self- corrected. The solution of protecting random logic is compared with alternatives, which use other types of error correcting codes (e.g., parity code) in a QDI circuit. It turns out that the duplication generates efficient circuits more commonly than other possible constructions. Finally, the design of a soft-error tolerant asynchronous microprocessor is detailed and testing results of the soft-error tolerance of the microprocessor are shown.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/9HMY-RR92, author = {Prakash, Piyush}, title = {Throughput Optimization of Quasi Delay Insensitive Circuits via Slack Matching}, school = {California Institute of Technology}, year = {2008}, doi = {10.7907/9HMY-RR92}, url = {https://resolver.caltech.edu/CaltechETD:etd-05262008-234258}, abstract = {Though the logical correctness of an asynchronous circuit is independent of implementation delays, the cycle time of an asynchronous circuit is of great importance to the designer. Oftentimes, the insertion of buffers to such circuits reduces the cycle time of the circuit without affecting the logical correctness of the circuit. This optimization is called slack matching. In this thesis the slack matching problem is formulated. I show that this problem is NP-complete via a reduction from subset sum. I describe two methods for expressing slack matching as a mixed integer linear program(MILP). The first method is applicable to any QDI circuit, while the second method produces a smaller MILP for circuits comprised solely of half buffers. These two formulations of slack matching were applied to the design of a fetch loop in an asynchronous micro-controller. Slack matching reduced the cycle time of the circuit by a factor of 3. For a circuit composed of 14 byte wide processes and a 8k instruction memory, 30s were required to generate the first MILP. It was solved in 2s. When the memory is modeled as a pipeline of half buffers, the second MILP could be formulated in 0.1s and solved in 0.6s. This MILP had half the number of integer variables as the first formulation.}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/4R8F-WF03, author = {Papadantonakis, Karl Spyros}, title = {Rigorous Analog Verification of Asynchronous Circuits}, school = {California Institute of Technology}, year = {2006}, doi = {10.7907/4R8F-WF03}, url = {https://resolver.caltech.edu/CaltechETD:etd-01132006-152609}, abstract = {This thesis shows that rigorous verification of some analog implementation of any Quasi-Delay-Insensitive (QDI) asynchronous circuit is possible. That is, we show that in an accurate analog model, any behavior will adhere to the digital computation specifications under any possible noise and environment timing. Unlike a traditional simulation, we can analyze all of the infinitely many possible analog behaviors, in a time linear in the circuit size. A problem that arises in asynchronous circuit design is that the analog implementations of digital computations do not in general exhibit all properties demanded by the digital model assumed in circuit construction. For example, the digital model is atomic, in a sense we define. By contrast, analog models are non-atomic, and, as a result, we can give examples of real circuits with operational failures. There exist other attributes of analog models which can cause failures, and no complete classification exists. Ultimately there is only one way to solve this problem: we must show that all possible analog behaviors obey the atomic model. We focus on CMOS implementations, and the associated accepted bulk-scale model. Given any canonically-generated implementation of a general computation, we can rigorously verify it. The only exception to this rule is that restoring delay elements must be inserted into some implementations (fortunately, this change has no semantic effect on QDI circuits, by definition). Our theorem guarantees that when any possible analog behavior is properly observed, we obtain a valid, atomic digital execution. Several rigorous verifications have been produced, including one for an asynchronous pipeline circuit with dual-rail data.}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @mastersthesis{10.7907/g43p-hv51, author = {Prakash, Piyush}, title = {Slack Matching}, school = {California Institute of Technology}, year = {2005}, doi = {10.7907/g43p-hv51}, url = {https://resolver.caltech.edu/CaltechETD:etd-05272005-134017}, abstract = {In this thesis we present a method for slack matching asynchronous circuits, described as a collection of handshaking expansions. We present an execution model for a restricted class of HSE. We define the number of messages that a process contains. The static slack, dynamic slack and dynamic threshold are defined. We state sufficient conditions under which the dynamic slack of a pipeline of half-buffers is the sum of that of the processes comprising the pipeline. The slack matching problem is formulated as that of ensuring that all pipelines and rings in a system can simultaneously contain a number of messages that is no less than the dynamic threshold but no greater than the dynamic slack. We describe an algorithm to formulate the slack matching problem as a mixed integer linear program.}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/5N2N-0W58, author = {Wong, Catherine Grace}, title = {High-Level Synthesis and Rapid Prototyping of Asynchronous VLSI Systems}, school = {California Institute of Technology}, year = {2004}, doi = {10.7907/5N2N-0W58}, url = {https://resolver.caltech.edu/CaltechTHESIS:11192009-161338958}, abstract = {

This thesis introduces data-driven decomposition (DDD), a new method for the high-level synthesis of asynchronous VLSI systems and the first method to target high-performance asynchronous circuits. Given a sequential description of circuit behavior, DDD produces an equivalent network of communicating processes that can each be directly implemented as fine-grained asynchronous pipeline stages. Control and datapath are integrated within each pipeline stage of the final system.

We present many aspects of the synthesis of asynchronous VLSI systems, including general circuit templates that DDD uses to estimate low-level performance and energy metrics while optimizing the concurrent system. We also introduce a new circuit model and new techniques for slack matching, a performance optimization that inserts pipelining into a system to modify asynchronous handshake dynamics and increase throughput. The entire method is then applied to a complex control unit from an asynchronous 8051 microcontroller, as an example.

This thesis also introduces a new architecture for an asynchronous field-programmable gate array (FPGA). The architecture is cluster-based and, unlike most FPGA designs, contains an entirely delay-insensitive interconnect. The basic reconfigurable cells of this FPGA fit the asynchronous pipeline-stage circuit-template used by DDD, and the reconfigurable clusters include circuitry that implements features assumed by an optimization phase of DDD, which reduces the energy consumption of the system.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/9GXT-BD03, author = {Ginis, Roman}, title = {Automating Resource Management for Distributed Business Processes}, school = {California Institute of Technology}, year = {2002}, doi = {10.7907/9GXT-BD03}, url = {https://resolver.caltech.edu/CaltechETD:etd-11012005-093745}, abstract = {A distributed business process is a set of related activities performed by independent resources offering services for lease. For instance, constructing an office building involves hundreds of activities such as excavating, plumbing and carpentry performed by machines and subcontractors, whose activities are related in time, space, cost and other dimensions. In the last decade Internet-based middleware has linked consumers with resources and services enabling the consumers to more efficiently locate, select and reserve the resources for use in business processes. This recent capability creates an opportunity for a new automation of resource management that can assign the optimal resources to the activities of a business process to maximize its utility to the consumer and yield substantial gains in operational efficiency. This thesis explores two basic problems towards automating the management of distributed business processes: 1. How to choose the best resources for the activities of a process (the Activity Resource Assignment - ARA - optimization problem); and 2. How to reserve the resources chosen for a process as an atomic operation when time has value, i.e., commit all resources or no resources (the Distributed Service Commit problem - DSC). I believe these will become the typical optimization and agreement problems between consumers and producers in a networked service economy. I propose a solution to the ARA optimization problem by modeling it as a special type of Integer Programming and I give a method for solving it efficiently for a large class of practical cases. Given a problem instance the method extracts the structure of the problem and using a new concept of variable independence recursively simplifies it while retaining at least one optimal solution. The reduction operation is guided by a novel procedure that makes use of the recent advances in tree-decomposition of graphs from the graph complexity theory. The solution to the DSC problem is an algorithm based on financial instruments and the two-phase commit protocol adapted for services. The method achieves an economically sensible atomic reservation agreement between multiple distributed resources and consumers in a free market environment. I expect the automation of resource management addressed in my thesis and elsewhere will pave the way for more efficient business operations in the networked economy.}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Chandy, K. Mani}, } @phdthesis{10.7907/9jpj-5s67, author = {Pénzes, Paul Ivan}, title = {Energy-Delay Complexity of Asynchronous Circuits}, school = {California Institute of Technology}, year = {2002}, doi = {10.7907/9jpj-5s67}, url = {https://resolver.caltech.edu/CaltechTHESIS:03022011-131111881}, abstract = {In this thesis, a circuit-level theory of energy-delay complexity is developed for asynchronous circuits. The energy-delay efficiency of a circuit is characterized using the metric Et^n , where E is the energy consumed by the computation, t is the delay of the computation, and n is a positive number that reflects a chosen trade-off between energy and delay. Based on theoretical and experimental evidence, it is argued that for a circuit optimized for minimal Et^n, the consumed energy is independent, in first approximation, of the types of gates (NAND, NOR, etc.) used by the circuit and is solely dependent on n and the total amount of wiring capacitance switched during computation. Conversely, the circuit speed is independent, in first approximation, of the wiring capacitance and depends only on n and the types of gates used. The complexity model allows us to compare the energy-delay efficiency of two circuits implementing the same computation. On the other hand, the complexity model itself does not say much about the actual transistor sizes that achieve the optimum. For this reason, the problem of transistor sizing of circuits optimized for Et^n is investigated, as well. A set of analytical formulas that closely approximate the optimal transistor sizes are explored. An efficient iteration procedure that can further improve the original analytical solution is then studied. Based on these results, a novel transistor-sizing algorithm for energy-delay efficiency is introduced. It is shown that the Et^n metric for the energy-delay efficiency index n ≥ 0 characterizes any optimal trade-off between the energy and the delay of a computation. For example, any problem of minimizing the energy of a system for a given target delay can be restated as minimizing Et^n for a certain n. The notion of minimum-energy function is developed and applied to the parallel and sequential composition of circuits in general and, in particular, to circuits optimized through transistor sizing and voltage scaling. Bounds on the energy and delay of the optimized circuits are computed, and necessary and sufficient conditions are given under which these bounds are reached. Necessary and sufficient conditions are also given under which components of a design can be optimized independently so as to yield a global optimum when composed. Through these applications, the utility of the minimum-energy function is demonstrated. The use of this minimum-energy function yields practical insight into ways of improving the overall energy-delay efficiency of circuits.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @mastersthesis{10.7907/PCCK-CS43, author = {Papadantonakis, Karl Spyros}, title = {What is “Deterministic CHP”, and is “Slack Elasticity” That Useful?}, school = {California Institute of Technology}, year = {2002}, doi = {10.7907/PCCK-CS43}, url = {https://resolver.caltech.edu/CaltechETD:etd-08222002-122806}, abstract = {This paper addresses the issue of slack elasticity in distributed computation, as defined by the Caltech Asynchronous VLSI group. We show with a counterexample that slack elasticity is not sufficient for process decomposition. We give criteria which imply slack elasticity and which are sufficient for several forms of process decomposition, and present a hierarchy of determinism.}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/B107-MW15, author = {Nyström, Mika}, title = {Asynchronous Pulse Logic}, school = {California Institute of Technology}, year = {2001}, doi = {10.7907/B107-MW15}, url = {https://resolver.caltech.edu/CaltechTHESIS:10152010-145548970}, abstract = {This thesis explores a new way of computing with CMOS digital circuits, single-track—handshake asynchronous pulse-logic (STAPL). These circuits are similar to quasi delay-insensitive (QDI) circuits, but the normal four-phase QDI handshake is replaced with a simpler two-phase pulsed handshake. While a delay-insensitive two-phase handshake requires complicated decoding circuits, the pulsed handshake maintains the simpler, electrically beneficial signaling senses of four-phase handshaking by using timing assumptions that are easy to meet.

We cover many aspects of designing moderately large digital systems out of STAPL circuits, from the communicating-process level to the production-rule and transistor level.

We study the theory of operation of pulsed asynchronous circuits, starting with simple pulse repeaters; hence we progress to a general theory of operation for pulsed asynchronous circuits. This theory is a generalization of the theory of operation of synchronous digital circuits.

We then develop the family of STAPL circuits. This is a complete family of dataflow processes: the presented circuits can compute unconditionally as well as conditionally; they can also store state and arbitrate.

Next, we present some aspects of automatic design-tools for compiling from a higher-level description to STAPL circuits. Many of these aspects apply equally well to tools for QDI circuits; in particular, we study boolean-simplification operations that may be used for improving the performance of slack-elastic asynchronous systems.

Finally, a simple 32-bit microprocessor is presented as a demonstration that the circuits and design methods work as described. Comparisons are made, mainly with QDI asynchronous design-styles: SPICE simulations in 0.6-µm CMOS suggest that a system built out of automatically compiled STAPL circuits performs at about three times higher throughput (650-700 MHz in 0.6-µm CMOS) compared with a similar system built out of carefully hand-compiled QDI circuits; the STAPL system uses about twice the energy per operation and twice the area; in other words, the STAPL system improves on the QDI system by four to five times as measured by the Et^2 and At^2 metrics.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/xzwa-p598, author = {Manohar, Rajit}, title = {The impact of asynchrony on computer architecture}, school = {California Institute of Technology}, year = {1999}, doi = {10.7907/xzwa-p598}, url = {https://resolver.caltech.edu/CaltechETD:etd-08112005-114144}, abstract = {The performance characteristics of asynchronous circuits are quite different from those of their synchronous counterparts. As a result, the best asynchronous design of a particular system does not necessarily correspond to the best synchronous design, even at the algorithmic level. The goal of this thesis is to examine certain aspects of computer architecture and design in the context of an asynchronous VLSI implementation.

We present necessary and sufficient conditions under which the degree of pipelining of a component can be modified without affecting the correctness of an asynchronous computation.

As an instance of the improvements possible using an asynchronous architecture, we present circuits to solve the prefix problem with average-case behavior better than that possible by any synchronous solution in the case when the prefix operator has a right zero. We show that our circuit implementations are area-optimal given their performance characteristics, and have the best possible average-case latency.

At the level of processor design, we present a mechanism for the implementation of precise exceptions in asynchronous processors. The novel feature of this mechanism is that it permits the presence of a data-dependent number of instructions in the execution pipeline of the processor.

Finally, at the level of processor architecture, we present the architecture of a processor with an independent instruction stream for branches. The instruction set permits loops and function calls to be executed with minimal control-flow overhead.}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/ehzs-y537, author = {Lee, Tak Kwan}, title = {A General Approach to Performance Analysis and Optimization of Asynchronous Circuits}, school = {California Institute of Technology}, year = {1995}, doi = {10.7907/ehzs-y537}, url = {https://resolver.caltech.edu/CaltechETD:etd-10172007-090528}, abstract = {A systematic approach for evaluating and optimizing the performance of asynchronous VLSI circuits is presented. Index-priority simulation is introduced to efficiently find minimal cycles in the state graph of a given circuit. These minimal cycles are used to determine the causality relationships between all signal transitions in the circuit. Once these relationships are known, the circuit is then modeled as an extended event-rule system, which can be used to describe many circuits, including ones that are inherently disjunctive. An accurate indication of the performance of the circuit is obtained by analytically computing the period of the corresponding extended event-rule system.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/PP38-4935, author = {Tierno, Jose Andres}, title = {An energy-complexity model for VLSI computations}, school = {California Institute of Technology}, year = {1995}, doi = {10.7907/PP38-4935}, url = {https://resolver.caltech.edu/CaltechETD:etd-10252007-094408}, abstract = {An energy complexity model for CSP programs to be implemented in CMOS VLSI is developed. This model predicts with some accuracy the energy dissipation of the “standard” asynchronous VLSI implementation of a CSP program, associated to a given trace of that program. This energy complexity is used in the analysis of CSP programs, in order to optimize this high level representation of asynchronous circuits for energy efficiency. A lower bound to the energy complexity of a CSP program is derived, based on the information theoretical entropy per symbol of the input/output behavior of the CSP program. This lower bound abstracts the specification of the circuit (that is, its input/output behavior), from the implementation of the specification (that is, the text of the program), and therefore applies to any program that meets the specification. A number of techniques are presented to write programs of low energy complexity, and are applied to several examples. To link the high level representation of circuits to the CMOS representation, several circuits are analyzed to provide standard translations for basic CSP operators into CMOS. In particular, a method for pipelining bus transfers using the sense-amplifier of the bus as a register is proposed.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/SR5V-KT18, author = {Van der Goot, Marcel Rene}, title = {Semantics of VLSI synthesis}, school = {California Institute of Technology}, year = {1995}, doi = {10.7907/SR5V-KT18}, url = {https://resolver.caltech.edu/CaltechETD:etd-10162007-093427}, abstract = {We develop a new form of formal operational semantics, suitable for concurrent programming languages. The semantics directly supports sequential and parallel composition, rendezvous synchronization, shared variables, and non-determinism. Based on an abstract notion of program execution, a refinement relation is defined. We show how the refinement relation can be used to prove that one program implements another. We use the operational semantics as a semantic framework for a synthesis method for asynchronous VLSI circuits. We define the semantics of the programming notations that are used, and use the refinement relation to prove the correctness of the program transformations that form the basis of the synthesis method. Among other transformations, we proof the correctness of the replacement of atomic synchronization actions by handshake protocols, and the transformation of a sequence of actions into a network of concurrently executing gates.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/0d7v-9d09, author = {Hazewindus, Pieter Johannes}, title = {Testing delay-insensitive circuits}, school = {California Institute of Technology}, year = {1992}, doi = {10.7907/0d7v-9d09}, url = {https://resolver.caltech.edu/CaltechETD:etd-07202007-132706}, abstract = {A method is developed to test delay-insensitive circuits, using the single stuck-at fault model. These circuits are synthesized from a high-level specification. Since the circuits are hazard-free by construction, there is no test for hazards in the circuit. Most faults cause the circuit to halt during test, since they cause an acknowledgement not to occur when it should. There are stuck-at faults that do not cause the circuit to halt under any condition. These are stimulating faults; they cause a premature firing of a production rule. For such a stimulating fault to be testable, the premature firing has to be propagated to a primary output. If this is not guaranteed to occur, then one or more test points have to be added to the circuit. Any stuck-at fault is testable, with the possible addition of test points. For combinational delay-insensitive circuits, finding test vectors is reduced to the same problem as for synchronous combinational logic. For sequential circuits, the synthesis method is used to find a test for each fault efficiently, to find the location of the test points, and to find a test that detects all faults in a circuit. The number of test points needed to fully test the circuit is very low, and the size of the additional testing circuitry is small. A test derived with a simple transformation of the handshaking expansion yields high fault coverage. Adding tests for the remaining faults results in a small complete test for the circuit.}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/kez1-7q52, author = {Burns, Steven Morgan}, title = {Performance analysis and optimization of asynchronous circuits}, school = {California Institute of Technology}, year = {1991}, doi = {10.7907/kez1-7q52}, url = {https://resolver.caltech.edu/CaltechETD:etd-07092007-072640}, abstract = {Analytical techniques are developed to determine the performance of asynchronous digital circuits. These techniques can be used to guide the designer during the synthesis of such a circuit, leading to a high-performance, efficient implementation. Optimization techniques are also developed that further improve this implementation by determining the optimal sizes of the low-level devices (CMOS transistors) that compose the circuit.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @mastersthesis{10.7907/01bb-3j05, author = {Burch, Jerry R.}, title = {A Comparison of Strict and Non-Strict Semantics for Lists}, school = {California Institute of Technology}, year = {1988}, doi = {10.7907/01bb-3j05}, url = {https://resolver.caltech.edu/CaltechTHESIS:03262012-113851465}, abstract = {[Introduction] Implementations of functional programming languages can be classified according to whether they apply eager-evaluation or lazy-evaluation. Eager-evaluation gives rise to strict semantics while lazy-evaluation gives rise to non-strict semantics. In this paper we define the syntax of a simple functional programming language, and specify strict and non-strict denotational semantics for that language. These semantics are specified by giving axioms for the domains and semantic functions involved. The axioms for the two different semantics are very similar, differing only in the specification of cons. However, this small difference results in the domains for the two semantics being quite different. Giving axioms, rather than just postulating particular domains and semantic functions, makes more explicit the similarities of the strict and the non-strict semantics. We give a model of the axioms of the nonstrict semantics in order to show their consistency, and show that any two such models are isomorphic.}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, } @phdthesis{10.7907/2ngs-bp80, author = {Li, Peyyun Peggy}, title = {A Parallel Execution Model for Logic Programming}, school = {California Institute of Technology}, year = {1986}, doi = {10.7907/2ngs-bp80}, url = {https://resolver.caltech.edu/CaltechETD:etd-03192008-143903}, abstract = {The Sync Model, a parallel execution method for logic programming, is proposed. The Sync Model is a multiple-solution data-driven model that realizes AND-parallelism and OR-parallelism in a logic program assuming a message-passing multiprocessor system. AND parallelism is implemented by constructing a dynamic data flow graph of the literals in the clause body with an ordering algorithm. OR parallelism is achieved by adding special Synchronization signals to the stream of partial solutions and synchronizing the multiple streams with a merge algorithm.

The Sync Model is proved to be sound and complete. Soundness means it only generates correct solutions and completeness means it generates all the correct solutions. The soundness and completeness of the Sync Model are implied by the correctness of the merge algorithm.

A new class of interconnection networks, the Sneptree, is also presented. The Sneptree is an augmented complete binary tree which can simulate an unbounded complete binary tree optimally. Amongst different connection patterns of the Sneptree, some are regular and extensible so as to be well suited for VLSI implementation. A recursive method is presented to generate the H-structure layout of one type of the Sneptree, called the Cyclic Sneptree. A message routing algorithm between any two leaf nodes of the Cyclic Sneptree is also presented. The routing algorithm, which is of O(n) complexity, gives a good approximation to the shortest path.

The Sneptree is an ideal architecture for the Sync model, in which a dynamic process tree is constructed. With a simple mapping algorithm, the Sync Model can be mapped onto the Sneptree with highly-balanced load and low overhead.

}, address = {1200 East California Boulevard, Pasadena, California 91125}, advisor = {Martin, Alain J.}, }