[
    {
        "id": "authors:thcpj-wfb23",
        "collection": "authors",
        "collection_id": "thcpj-wfb23",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20160901-124814686",
        "type": "book_section",
        "title": "DD1: A QDI, Radiation-Hard-by-Design, Near-Threshold 18uW/MIPS Microcontroller in 40nm Bulk CMOS",
        "book_title": "Asynchronous Circuits and Systems (ASYNC), 2015",
        "author": [
            {
                "family_name": "Keller",
                "given_name": "Sean",
                "clpid": "Keller-S"
            },
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            },
            {
                "family_name": "Moore",
                "given_name": "Chris",
                "clpid": "Moore-Chris"
            }
        ],
        "abstract": "This paper describes DD1, an asynchronous radiation-hard 8-bit AVR^\u00ae microcontroller (MCU) implemented in TSMC 40LP, a low-power bulk 40nm CMOS process. Designed for extreme reliability, DD1 uses quasi-delay-insensitive (QDI) asynchronous logic and contains full-custom radiation-hard memories and logic cells. The chip was found fully functional on first silicon over a range of operating voltages from near-threshold (500mV) to above the nominal V_(DD) (1.1V). It qualifies as both ultra-low power (&lt;;100\u03bcW/MHz) and radiation-hard by design. At 550mV the MCU operates at 1MIPS with a power consumption of 18\u03bcW/MIPS. At 1.1V it runs at 20MIPS consuming 75\u03bcW/MIPS (1.5mW total). After extensive testing, it was found to be total-dose and latch-up immune and has an upset immunity of 2E-6 SEE/device-day (CREME96 geosynchronous near-earth orbit).",
        "doi": "10.1109/ASYNC.2015.15",
        "isbn": "978-1-4799-8716-0",
        "publisher": "IEEE",
        "place_of_publication": "Piscataway, NJ",
        "publication_date": "2015-05",
        "pages": "37-44"
    },
    {
        "id": "authors:ebmvh-dk943",
        "collection": "authors",
        "collection_id": "ebmvh-dk943",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170320-175344479",
        "type": "book_section",
        "title": "Asynchronous logic for high variability nano-CMOS",
        "book_title": "16th IEEE International Conference on Electronics, Circuits and Systems - (ICECS 2009)",
        "author": [
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "abstract": "At the nanoscale level, parameter variations in fabricated devices cause extreme variability in delay. Delay variations are also the main issue in subthreshold operation. Consequently, asynchronous logic seems an ideal, and probably unavoidable choice, for the design of digital circuits in nano CMOS or other emerging technologies. This paper examines the robustness of one particular asynchronous logic: quasi-delay insensitive or QDI. We identify the three components of this logic that can be affected by extreme variability: staticizer, isochronic fork, and rings. We show that staticizers can be eliminated, and isochronic forks and rings can be made arbitrarily robust to timing variations.",
        "doi": "10.1109/ICECS.2009.5410925",
        "isbn": "978-1-4244-5090-9",
        "publisher": "IEEE",
        "place_of_publication": "Piscataway, NJ",
        "publication_date": "2009-12",
        "pages": "69-72"
    },
    {
        "id": "authors:4h10d-82w11",
        "collection": "authors",
        "collection_id": "4h10d-82w11",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20100506-101621122",
        "type": "book_section",
        "title": "A Necessary and Sufficient Timing Assumption for Speed-Independent Circuits",
        "book_title": "15th International Symposium on Advanced Research in Asynchronous Circuits and Systems : (ASYNC 2009) : proceedings : 17-19 May 2009 Chapel Hill, North Carolina, USA",
        "author": [
            {
                "family_name": "Keller",
                "given_name": "Sean",
                "clpid": "Keller-S"
            },
            {
                "family_name": "Katelmany",
                "given_name": "Michael",
                "clpid": "Katelmany-M"
            },
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "abstract": "This paper presents a proof that the adversary path timing\nassumption is both necessary and sufficient for correct SI\ncircuit operation. This assumption requires that the delay\nof a wire on one branch of a fork be less than the delay\nthrough a gate sequence beginning at another branch in the\nsame fork. Both the definition of the timing assumption and\nthe proof build on a general, formal notion of computation\ngiven with respect to production rule sets. This underlying\nframework can be used for a variety of proof efforts or\nas a basis for defining other useful notions involving asynchronous\ncomputation.",
        "doi": "10.1109/ASYNC.2009.27",
        "isbn": "9781424439331",
        "publisher": "IEEE Computer Society",
        "place_of_publication": "Los Alamitos, CA",
        "publication_date": "2009-05-29",
        "pages": "65-76"
    },
    {
        "id": "authors:086p7-shn78",
        "collection": "authors",
        "collection_id": "086p7-shn78",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20100722-151724013",
        "type": "book_section",
        "title": "Asynchronous Nano-Electronics: Preliminary Investigation",
        "book_title": "14th IEEE International Symposium on Asynchronous Circuits and Systems : ASYNC 2008",
        "author": [
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            },
            {
                "family_name": "Prakash",
                "given_name": "Piyush",
                "clpid": "Prakash-P"
            }
        ],
        "abstract": "This paper is a preliminary investigation in implementing\nasynchronous QDI logic in molecular nano-electronics,\ntaking into account the restricted geometry, the lack of control\non transistor strengths, the high timing variations. We\nshow that the main building blocks of QDI logic can be successfully\nimplemented; we illustrate the approach with the\nlayout of an adder stage. The proposed techniques to improve\nthe reliability of QDI apply to nano-CMOS as well.",
        "doi": "10.1109/ASYNC.2008.22",
        "isbn": "978-0-7695-3107-6",
        "publisher": "IEEE Computer Society",
        "place_of_publication": "Los Alamitos, CA",
        "publication_date": "2008-07-09",
        "pages": "58-68"
    },
    {
        "id": "authors:1xvjv-x0y73",
        "collection": "authors",
        "collection_id": "1xvjv-x0y73",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20110225-095524705",
        "type": "book_section",
        "title": "Slack Matching Quasi Delay-Insensitive Circuits",
        "book_title": "12th IEEE International Symposium on Asynchronous Circuits and Systems",
        "author": [
            {
                "family_name": "Prakash",
                "given_name": "Piyush",
                "clpid": "Prakash-P"
            },
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "abstract": "Slack matching is an optimization that determines\nthe amount of buffering that must be added to each channel of\na slack elastic asynchronous system in order to reduce its cycle\ntime to a specified target. We present two methods of expressing\nthe slack matching problem as a mixed integer linear programming\nproblem. The first method is applicable to systems composed\nof either full-buffers or half-buffers but not both. The second\nmethod is applicable to systems composed of any combination\nof full-buffers and half-buffers.",
        "doi": "10.1109/ASYNC.2006.27",
        "isbn": "0-7695-2498-2",
        "publisher": "IEEE",
        "place_of_publication": "Los Alamitos, CA",
        "publication_date": "2006",
        "pages": "195-204"
    },
    {
        "id": "authors:xyg8c-f3f93",
        "collection": "authors",
        "collection_id": "xyg8c-f3f93",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20110722-095429816",
        "type": "book_section",
        "title": "Can asynchronous techniques help the SoC designer?",
        "book_title": "IFIP VLSI-SoC 2006: IFIP TC 10/WG 10.5 International Conference on Very Large Scale Integration & System-on-Chip",
        "author": [
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "contributor": [
            {
                "family_name": "De",
                "given_name": "G.",
                "clpid": "De-G"
            },
            {
                "family_name": "Reis",
                "given_name": "R.",
                "clpid": "Reis-R"
            },
            {
                "family_name": "Simeu",
                "given_name": "E.",
                "clpid": "Simeu-E"
            }
        ],
        "abstract": "As technological advances make it possible to integrate an entire system on a single die, the designer of a system-on-chip (SoC) is confronted with increasing difficulties concerning complexity, reliability, energy and power consumption, and clock distribution. All those issues are aggravated by increasing parameters variability as a result of the same technological advances. This paper argues that because of the quasi-independence of asynchronous (QDI) circuits of timing, asynchronous logic alleviates the problems posed by parameter variability, and eliminates the clock distribution problem altogether. Furthermore, as some researchers attempt to turn the liability into an asset by exploiting parameter variability to design truly probabilistic computation, the flexibility and time-independence of asynchronous logic could be a natural match.",
        "doi": "10.1109/VLSISOC.2006.313284",
        "isbn": "978-3-901882-19-7",
        "publisher": "IEEE",
        "place_of_publication": "Piscataway, NJ",
        "publication_date": "2006",
        "pages": "7-11"
    },
    {
        "id": "authors:djnqn-n3y79",
        "collection": "authors",
        "collection_id": "djnqn-n3y79",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20170109-145144866",
        "type": "book_section",
        "title": "High-level synthesis of asynchronous systems by data-driven decomposition",
        "book_title": "DAC '03 Proceedings of the 40th annual Design Automation Conference",
        "author": [
            {
                "family_name": "Wong",
                "given_name": "Catherine G.",
                "clpid": "Wong-Catherine-G"
            },
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "contributor": [
            {
                "family_name": "Getreu",
                "given_name": "Ian",
                "clpid": "Getreu-I"
            }
        ],
        "abstract": "We present a method for decomposing a high-level program description of a circuit into a system of concurrent modules that can each be implemented as asynchronous pre-charge half-buffer pipeline stages (the circuits used in the asynchronous R3000 MIPS microprocessor). We apply it to designing the instruction fetch of an asynchronous 8051 microcontroller, with promising results. We discuss new clustering algorithms that will improve the performance figures further.",
        "doi": "10.1145/775832.775962",
        "isbn": "1-58113-688-9",
        "publisher": "ACM",
        "place_of_publication": "New York, NY",
        "publication_date": "2003-06",
        "pages": "508-513"
    },
    {
        "id": "authors:dg8qx-4wh54",
        "collection": "authors",
        "collection_id": "dg8qx-4wh54",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20161207-170651411",
        "type": "book_section",
        "title": "Transistor sizing of energy-delay-efficient circuits",
        "book_title": "TAU '02 Proceedings of the 8th ACM/IEEE international workshop on Timing issues in the specification and synthesis of digital systems",
        "author": [
            {
                "family_name": "P\u00e9nzes",
                "given_name": "Paul I.",
                "clpid": "P\u00e9nzes-P-I"
            },
            {
                "family_name": "Nystr\u00f6m",
                "given_name": "Mika",
                "clpid": "Nystr\u00f6m-M"
            },
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "contributor": [
            {
                "family_name": "LaPotin",
                "given_name": "David P.",
                "clpid": "LaPotin-D-P"
            }
        ],
        "abstract": "This paper studies the problem of transistor sizing of CMOS circuits optimized for energy-delay efficiency, i.e., for optimal Etn where E is the energy consumption and t is the delay of the circuit, while n is a fixed positive optimization index that reflects the chosen trade-off between energy and delay.\n\nWe propose a set of analytical formulas that closely approximate the optimal transistor sizes. We then study an efficient iteration procedure that can further improve the original analytical solution. Based on these results, we introduce a novel transistor sizing algorithm for energy-delay efficiency.",
        "doi": "10.1145/589411.589439",
        "isbn": "1-58113-526-2",
        "publisher": "ACM",
        "place_of_publication": "New York, NY",
        "publication_date": "2002-12",
        "pages": "126-133"
    },
    {
        "id": "authors:462qm-phg17",
        "collection": "authors",
        "collection_id": "462qm-phg17",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20161207-165804629",
        "type": "book_section",
        "title": "Energy-Delay Efficiency of VLSI Computations",
        "book_title": "GLSVLSI '02 Proceedings of the 12th ACM Great Lakes symposium on VLSI",
        "author": [
            {
                "family_name": "P\u00e9nzes",
                "given_name": "Paul I.",
                "clpid": "P\u00e9nzes-P-I"
            },
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "contributor": [
            {
                "family_name": "Ghose",
                "given_name": "Kanad",
                "clpid": "Ghose-K"
            },
            {
                "family_name": "Madden",
                "given_name": "Patrick H.",
                "clpid": "Madden-P-H"
            }
        ],
        "abstract": "In this paper we introduce an energy-delay efficiency metric that captures any trade-off between the energy and the delay of the computation. \n\nWe apply this new concept to the parallel and sequential composition of circuits in general and in particular to circuits optimized through transistor sizing. We bound the delay and energy of the optimized circuit and we give necessary and sufficient conditions under which these bounds are reached. We also give necessary and sufficient conditions under which subcomponents of a design can be optimized independently so as to yield global optimum when recomposed. \n\nWe demonstrate the utility of a minimum-energy function to capture high level compositional properties of circuits. The use of this minimum-energy function yields practical insight into ways of improving the overall energy-delay efficiency of circuits.",
        "doi": "10.1145/505306.505330",
        "isbn": "1-58113-462-2",
        "publisher": "ACM",
        "place_of_publication": "New York, NY",
        "publication_date": "2002-04",
        "pages": "104-111"
    },
    {
        "id": "authors:t7b22-swj11",
        "collection": "authors",
        "collection_id": "t7b22-swj11",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201210-161233167",
        "type": "book_section",
        "title": "Slack elasticity in concurrent computing",
        "book_title": "Mathematics of Program Construction",
        "author": [
            {
                "family_name": "Manohar",
                "given_name": "Rajit",
                "clpid": "Manohar-Rajit"
            },
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "contributor": [
            {
                "family_name": "Jeuring",
                "given_name": "Johan",
                "clpid": "Jeuring-Johan"
            }
        ],
        "abstract": "We present conditions under which we can modify the slack of a channel in a distributed computation without changing its behavior. These results can be used to modify the degree of pipelining in an asynchronous system. The generality of the result shows the wide variety of pipelining alternatives presented to the designer of a concurrent system. We give examples of program transformations which can be used in the design of concurrent systems whose correctness depends on the conditions presented.",
        "doi": "10.1007/bfb0054295",
        "isbn": "9783540645917",
        "publisher": "Springer",
        "place_of_publication": "Berlin, Heidelberg",
        "publication_date": "1998",
        "pages": "272-285"
    },
    {
        "id": "authors:h63hs-x5f75",
        "collection": "authors",
        "collection_id": "h63hs-x5f75",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201124-174613902",
        "type": "book_section",
        "title": "An action system specification of the Caltech asynchronous microprocessor",
        "book_title": "Mathematics of Program Construction",
        "author": [
            {
                "family_name": "Back",
                "given_name": "R. J. R.",
                "clpid": "Back-R-J-R"
            },
            {
                "family_name": "Martin",
                "given_name": "A. J.",
                "clpid": "Martin-A-J"
            },
            {
                "family_name": "Sere",
                "given_name": "K.",
                "clpid": "Sere-K"
            }
        ],
        "contributor": [
            {
                "family_name": "M\u00f6ller",
                "given_name": "Bernhard",
                "clpid": "M\u00f6ller-B"
            }
        ],
        "abstract": "The action system framework for modelling parallel programs is used to formally specify a microprocessor. First the microprocessor is specified as a sequential program. The sequential specification is then decomposed and refined into a concurrent program using correctness-preserving program transformations. Previously this microprocessor has been specified in a semi-formal manner at Caltech, where an asynchronous circuit for the microprocessor was derived from the specification. We propose a specification strategy that is based on the idea of spatial decomposition of the program variable space. Applying this strategy we give a completely formal derivation of a high level specification for the Caltech microprocessor. We also demonstrate the suitability of action systems and the stepwise refinement paradigm for formal VLSI circuit design.",
        "doi": "10.1007/3-540-60117-1_9",
        "isbn": "9783540601173",
        "publisher": "Springer",
        "place_of_publication": "Berlin, Heidelberg",
        "publication_date": "1995",
        "pages": "159-179"
    },
    {
        "id": "authors:82sn8-s7y18",
        "collection": "authors",
        "collection_id": "82sn8-s7y18",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20201008-131242613",
        "type": "book_section",
        "title": "Design of Synchronization Algorithms",
        "book_title": "Constructive Methods in Computing Science",
        "author": [
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            },
            {
                "family_name": "Van de Snepscheut",
                "given_name": "Jan L. A.",
                "clpid": "Van-de-Snepscheut-J-L-A"
            }
        ],
        "contributor": [
            {
                "family_name": "Broy",
                "given_name": "Manfred",
                "clpid": "Broy-M"
            }
        ],
        "abstract": "In these notes we discuss the design of concurrent programs that consist of a set of communicating sequential processes. The processes communicate via shared variables and synchronize via semaphores. We present an axiomatic definition of semaphores, and prove properties about them. The split binary semaphore is introduced and it is shown how it can be used in constructing the synchronization part of concurrent processes in order to maintain a given synchronization condition.",
        "doi": "10.1007/978-3-642-74884-4_13",
        "isbn": "9783642748868",
        "publisher": "Springer Berlin Heidelberg",
        "place_of_publication": "Berlin, Heidelberg",
        "publication_date": "1989",
        "pages": "447-478"
    },
    {
        "id": "authors:4x77s-atd96",
        "collection": "authors",
        "collection_id": "4x77s-atd96",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20161130-143104095",
        "type": "book_section",
        "title": "A message-passing model for highly concurrent computation",
        "book_title": "C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues",
        "author": [
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            }
        ],
        "contributor": [
            {
                "family_name": "Fox",
                "given_name": "Geoffrey",
                "clpid": "Fox-Geoffrey"
            }
        ],
        "abstract": "[no abstract]",
        "doi": "10.1145/62297.62360",
        "isbn": "0-89791-278-0",
        "publisher": "ACM",
        "place_of_publication": "New York, NY",
        "publication_date": "1988-01",
        "pages": "520-527"
    },
    {
        "id": "authors:6e783-84v22",
        "collection": "authors",
        "collection_id": "6e783-84v22",
        "cite_using_url": "https://resolver.caltech.edu/CaltechAUTHORS:20161215-172443490",
        "type": "book_section",
        "title": "The architecture and programming of the Ametek series 2010 multicomputer",
        "book_title": "Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues",
        "author": [
            {
                "family_name": "Seitz",
                "given_name": "Charles L.",
                "clpid": "Seitz-C-L"
            },
            {
                "family_name": "Athas",
                "given_name": "William C.",
                "clpid": "Athas-W-C"
            },
            {
                "family_name": "Flaig",
                "given_name": "Charles M.",
                "clpid": "Flaig-C-M"
            },
            {
                "family_name": "Martin",
                "given_name": "Alain J.",
                "clpid": "Martin-A-J"
            },
            {
                "family_name": "Seizovic",
                "given_name": "Jakov",
                "clpid": "Seizovic-J"
            },
            {
                "family_name": "Steele",
                "given_name": "Craig S.",
                "clpid": "Steele-C-S"
            },
            {
                "family_name": "Su",
                "given_name": "Wen-King",
                "clpid": "Su-Wen-King"
            }
        ],
        "contributor": [
            {
                "family_name": "Fox",
                "given_name": "Geoffrey",
                "clpid": "Fox-Geoffrey"
            }
        ],
        "abstract": "During the period following the completion of the Cosmic Cube experiment [1], and while commercial descendants of this first-generation multicomputer (message-passing concurrent computer) were spreading through a community that includes many of the attendees of this conference, members of our research group were developing a set of ideas about the physical design and programming for the second generation of medium-grain multicomputers. \n\nOur principal goal was to improve by as much as two orders of magnitude the relationship between message-passing and computing performance, and also to make the topology of the message-passing network practically invisible. Decreasing the communication latency relative to instruction execution times extends the application span of multicomputers from easily partitioned and distributed problems (eg, matrix computations, PDE solvers, finite element analysis, finite difference methods, distant or local field many-body problems, FFTs, ray tracing, distributed simulation of systems composed of loosely coupled physical processes) to computing problems characterized by \"high flux\" [2] or relatively fine-grain concurrent formulations [3, 4] (eg, searching, sorting, concurrent data structures, graph problems, signal processing, image processing, and distributed simulation of systems composed of many tightly coupled physical processes). Such applications place heavy demands on the message-passing network for high bandwidth, low latency, and non-local communication. Decreased message latency also improves the efficiency of the class of applications that have been developed on first-generation systems, and the insensitivity of message latency to process placement simplifies the concurrent formulation of application programs. \n\nOur other goals included a streamlined and easily layered set of message primitives, a node operating system based on a reactive programming model, open interfaces for accelerators and peripheral devices, and node performance improvements that could be achieved economically by using the same technology employed in contemporary workstation computers. \n\nBy the autumn of 1986, these ideas had become sufficiently developed, molded together, and tested through simulation to be regarded as a complete architectural design. We were fortunate that the Ametek Computer Research Division was ready and willing to work with us to develop this system as a commercial product. The Ametek Series 2010 multicomputer is the result of this joint effort.",
        "doi": "10.1145/62297.62302",
        "isbn": "0-89791-278-0",
        "publisher": "ACM",
        "place_of_publication": "New York, NY",
        "publication_date": "1988-01",
        "pages": "33-37"
    }
]