Combined Feed
https://feeds.library.caltech.edu/people/Johnsson-L/combined.rss
A Caltech Library Repository Feedhttp://www.rssboard.org/rss-specificationpython-feedgenenSat, 13 Apr 2024 01:24:32 +0000Towards a Formal Treatment of VLSI Arrays
https://resolver.caltech.edu/CaltechCSTR:1981.4191-tr-81
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}, {'id': 'Weiser-U', 'name': {'family': 'Weiser', 'given': 'Uri'}}, {'id': 'Cohen-Danny', 'name': {'family': 'Cohen', 'given': 'Danny'}}, {'id': 'Davis-A-L', 'name': {'family': 'Davis', 'given': 'Alan L.'}}]}
Year: 1981
DOI: 10.7907/gn21a-t6x26
This paper presents a formalism for describing the behavior of computational networks at the
algorithmic level. It establishes a direct correspondence between the mathematical expressions
defining a function and the computational networks which compute that function. By formally
manipulating the symbolic expressions that define a function, it is possible to obtain different
networks that compute the function. From this mathematical description of a network, one can
directly determine certain important characteristics of computational networks, such as
computational rate, performance and communication requirements. The use of this formalism for
design and verification is demonstrated on computational networks for Finite Impulse Response (FIR)
filters, matrix operations, and the Discrete Fourier Transform (DFT).
The progression of computations can often be modeled by wave fronts in an illuminating way. The
formalism supports this model. A computational network can be viewed in an abstract form that can
be represented as a graph. The duality between the graph representation and the mathematical
expressions is briefly introduced.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/gn21a-t6x26Computational Arrays for the Discrete Fourier Transform
https://resolver.caltech.edu/CaltechCSTR:1981.4168-tr-81
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}, {'id': 'Cohen-Danny', 'name': {'family': 'Cohen', 'given': 'Danny'}}]}
Year: 1981
DOI: 10.7907/n91j8-85m21
A mathematical approach towards the development of computational arrays for
the Discrete Fourier Transform (DFT) is pursued in this paper. Mathematical expressions
for the DFT are given a direct hardware interpretation. Different implementations are
developed by formal manipulation of the equations defining the DFT. Properties of the
implementations can be told directly from the corresponding equations. Special
consideration is given to the performance of implementations and corresponding hardware
requirements. The standard equations defining the DFT on N values corresponds if the
equations are given a direct hardware interpretation to an Implementation requiring N
to the power of 2 modules. By formal manipulation of the equations defining the DFT we develop
implementations requiring N and Log subscript2N modules respectively.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/n91j8-85m21VLSI algorithms for Doolittle's, Crout's, and Cholesky's methods
https://resolver.caltech.edu/CaltechAUTHORS:20120420-105611583
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 1982
DOI: 10.7907/4aq2m-bnw32
In order to take full advantage of the emerging
VLSI technology it is required to recognize its
limited communication capability and structure
algorithms accordingly. In this paper concurrent
algorithms for the methods of Crout, Doolittle and
Cholesky are described and compared with
concurrent algorithms for Gauss' , Given's and
Householder's method. The effect of pipe lining the
computations in two dimensional arrays is given
special attention.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/4aq2m-bnw32Pipelined linear equation solvers and VLSI
https://resolver.caltech.edu/CaltechAUTHORS:20120419-115610622
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 1982
DOI: 10.7907/en871-srv73
Many of the commonly used methods for solution of linear systems of equations on sequential machines can be given a concurrent formulation. The concurrent algorithms take advantage of independence of operations in order to reduce the time complexity of the methods. During the course of computations specified by the algorithm data has to be routed to the various places of computation. Pipelining
can be used to avoid broadcasting in VLSI arrays for computation. Pipelining will in general allow for a reduced cycle time but may force data to be spread out in
time, as is the case for Gaussian elimination. What the required spacing is depends on the pipelining and the data flow.
In the paper concurrent algorithms and their pipelining for Gaussian elimination, Householder transformations and Given's rotations are discussed, Gaussian elimination and Given's rotations can use two dimensional arrays while Householder transformation uses a one dimensional array. If partial pivoting is necessary in Gaussian elimination, then one dimension of the array is essentially lost and s
linear array is almost as efficient as a two-dimensional array. Householder transformations that are numerically stable may perform the triangulation in shorter time, if partial pivoting is necessary in Gaussian elimination. The amount of arithmetic that a node in the arrays perform is somewhat different for the different methods. The difference is largest for the boundary cells. However, it
should be feasible to design a common node of very low complexity that very efficiently supports a range of methods for the solution of linear systems of
equations.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/en871-srv73A Computational Array for the QR-Method
https://resolver.caltech.edu/CaltechAUTHORS:20120423-165211870
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 1982
DOI: 10.7907/madaw-z5041
The QR-method is a method for the solution of linear system of equations. The matrix R is upper triangular and Q is a unitary matrix. In equation solving Q is not always computed explicitly. The matrix R can be obtained by applying a sequence of unitary transformations to the matrix defining the system of equations. Householder's method or Given's method can be used to determine
unitary transformation matrices. This paper describes a concurrent algorithm and corresponding array for computing the triangular matrix R by Householder transformations. Particular attention is given to issues such as broadcasting
and pipelining.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/madaw-z5041Submicron Systems Architecture: Semiannual Technical Report
https://resolver.caltech.edu/CaltechCSTR:1982.5052-tr-82
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}, {'id': 'Seitz-C-L', 'name': {'family': 'Seitz', 'given': 'Charles L.'}}]}
Year: 1982
DOI: 10.7907/vwwfw-anp96
No Abstract.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/vwwfw-anp96Concurrent Algorithms for the Conjugate Gradient Method
https://resolver.caltech.edu/CaltechCSTR:1982.5040-tr-82
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 1982
DOI: 10.7907/6a5xt-r6216
A few concurrent algorithms for the basic conjugate gradient method
is devised and discussed. Most of the algorithms have a topology that
is naturally determined by characteristic dimensions of the system and
the operations of each step of the conjugate gradient method. The
topologies map well onto buildable structures of sparsely interconnected
processors while preserving unit communication distance. The topology
of the algorithms are:
1) A binary tree
2) A composition of a binary tree and a ring the nodes of
which forms the leaves of the tree.
3 ) A linear array with some additional processing elements.
It is also discussed how these algorithms maps onto Boolean n-cubes.
The algorithms all have the property that a communication operation
is associated with each computation.
No claim is made as to the optimality from a space-time complexity
point of the algorithms presented here. However, the processor
utilization for some algorithms and topologies are close to 100% and the
space*time complexity of those algorithms are of the same order as the
arithmetic complexity of common sequential machine algorithms.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/6a5xt-r6216A Formal Derivation of Array Implementations of FFT Algorithms
https://resolver.caltech.edu/CaltechAUTHORS:20120420-155106097
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}, {'id': 'Cohen-Danny', 'name': {'family': 'Cohen', 'given': 'Danny'}}]}
Year: 1982
DOI: 10.7907/4e69j-smn59
Fast Fourier Transform, FFT, algorithms are interesting for direct hardware implementation in VLSI. The description of FFT algorithms is typically made either in terms of graphs illustrating the dependency between different data elements or in terms of mathematical expressions without any notion of how the computations are implemented in space or
time. Expressions in the notation used in this paper can be given an interpretation in the implementation domain. The notation is in this paper used to derive a description of array implementations of decimation-in-frequency and decimation-in-time FFT algorithms. Correctness of the implementations is guaranteed by way of derivation.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/4e69j-smn59The Tree Machine: An Evaluation of Strategies For Reducing Program Loading Time
https://resolver.caltech.edu/CaltechCSTR:1983.5084-tr-83
Authors: {'items': [{'id': 'Li-Pey-yun', 'name': {'family': 'Li', 'given': 'Pey-yun Peggy'}}, {'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 1983
DOI: 10.7907/gz4dm-3tg53
The Caltech Tree Machine has an ensemble architecture, Processors
are interconnected into a binary tree. Each node executes its own code.
No two nodes need to execute identical code. Nodes are synchronized by
messages between adjacent nodes. Since the number of nodes is intended
to be large, in the order of thousands, great care needs to be exercised
in devising loading strategies to make the loading time as short as
possible. A constraint is also imposed by the very limited storage
associated with a processor.
Nodes are assigned a type that identifies the code it shall execute.
Nodes of the same type execute identical code. Tree Machine programs
are frequently very regular. By exploiting this regularity, compact
descriptions of the types of all nodes in the tree can be created. The
limited storage of a node, and the desire to only use local information
in the expansion of the compacted description implies constraints on the
compression/decompression algorithms.
A loading time proportional to the height of the tree is attainable
in many cases with the algorithms presented. This time is also the
worst case performance for one of the algorithms. The other algorithms
have a worst case performance of 0 square root of N/f and O square root of (N to the power of 1/log2f), where N is the total number of nodes in a tree with fanout f. The algorithms with a
less favorable upper bound, in some cases allow a more compact tree
description, than the algorithm with the best upper bound.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/gz4dm-3tg53Highly Concurrent Algorithms for Solving Linear Systems of Equations
https://resolver.caltech.edu/CaltechCSTR:1983.5079-tr-83
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 1983
DOI: 10.7907/64hjx-fv005
No Abstract.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/64hjx-fv005QED on the connection machine
https://resolver.caltech.edu/CaltechAUTHORS:20160503-161223169
Authors: {'items': [{'id': 'Baillie-C-F', 'name': {'family': 'Baillie', 'given': 'Clive F.'}}, {'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'S. Lennart'}}, {'id': 'Ortiz-L', 'name': {'family': 'Ortiz', 'given': 'Luis'}}, {'id': 'Pawley-G-S', 'name': {'family': 'Pawley', 'given': 'G. Stuart'}}]}
Year: 1988
DOI: 10.1145/63047.63082
Physicists believe that the world is described in terms of gauge theories. A popular technique for investigating these theories is to discretize them onto a lattice and simulate numerically by a computer, yielding so-called lattice gauge theory. Such computations require at least 1014 floating-point operations, necessitating the use of advanced architecture supercomputers such as the Connection Machine made by Thinking Machines Corporation. Currently the most important gauge theory to be solved is that describing the sub-nuclear world of high energy physics: Quantum Chromo-dynamics (QCD). The simplest example of a gauge theory is Quantum Electro-dynamics (QED), the theory which describes the interaction of electrons and photons. Simulation of QCD requires computer software very similar to that for the simpler QED problem. Our current QED code achieves a computational rate of 1.6 million lattice site updates per second for a Monte Carlo algorithm, and 7.4 million site updates per second for a microcanonical algorithm. The estimated performance for a Monte Carlo QCD code is 200,000 site updates per second (or 5.6 Gflops/sec).https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/v14qg-yfd69Residue Arithmetic and VLSI
https://resolver.caltech.edu/CaltechCSTR:1983.5092-tr-83
Authors: {'items': [{'id': 'Chiang-Chao-Lin', 'name': {'family': 'Chiang', 'given': 'Chao-Lin'}}, {'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 2002
DOI: 10.7907/77gav-sns10
In the residue number system arithmetic is carried
out on each digit individually. There is no carry chain.
This locality is of particular interest in VLSI. An
evaluation of different implementations of residue arithmetic is carried out, and the effects of reduced feature sizes estimated. At the current state of technology the traditional table lookup method is preferable for a range that requires a maximum modulus that is represented by up to 4 bits, while an array of adders offers the best performance fur 7 or more bits. A combination of adders and
tables covers 5 and 6 bits the best. At 0.5 mu m feature
size table lookup is competitive only up to 3 bits, These
conclusions are based on sample designs in nMOS.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/77gav-sns10Gaussian Elimination on Sparse Matricies and Concurrency
https://resolver.caltech.edu/CaltechCSTR:1980.4087-tr-80
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 2002
DOI: 10.7907/f5pmx-pnx37
No Abstract.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/f5pmx-pnx37Computational Arrays for Band Matrix Equations
https://resolver.caltech.edu/CaltechCSTR:1981.4287-tr-81
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 2002
DOI: 10.7907/70tmd-29e82
No Abstract.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/70tmd-29e82VLSI Architecture and Design
https://resolver.caltech.edu/CaltechAUTHORS:20120418-110634950
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}]}
Year: 2012
DOI: 10.7907/bmm7d-81x26
Integrated circuit technology is rapidly approaching a state where feature sizes of one micron or less are tractable. Chip sizes are increasing slowly. These two developments result in considerably increased complexity in chip design. The physical characteristics of integrated circuit technology are also changing. The cost of communication will be dominating making new architectures and algorithms both feasible and desirable. A large
number of processors on a single chip will be possible. The cost of communication will make
designs enforcing locality superior to other types of designs.
Scaling down feature sizes results in increase of the delay that wires introduce. The delay even of metal wires will become significant. Time tends to be a local property which will make the design of globally synchronous systems more difficult. Self-timed systems will eventually become a necessity.
With the chip complexity measured in terms of logic devices increasing by more than an order of magnitude over the next few years the importance of efficient design methodologies and tools become crucial. Hierarchical and structured design are ways of dealing with the complexity of chip design. Structered design focuses on the information
flow and enforces a high degree of regularity. Both hierarchical and structured design encourage the use of cell libraries. The geometry of the cells in such libraries should be parameterized so that for instance cells can adjust there size to neighboring cells and make the proper interconnection. Cells with this quality can be used as a basis for "Silicon Compilers".https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/bmm7d-81x26A mathematical approach to modelling the flow of data and control in computational networks
https://resolver.caltech.edu/CaltechAUTHORS:20120420-102640427
Authors: {'items': [{'id': 'Johnsson-L', 'name': {'family': 'Johnsson', 'given': 'Lennart'}}, {'id': 'Cohen-Danny', 'name': {'family': 'Cohen', 'given': 'Danny'}}]}
Year: 2012
DOI: 10.7907/ekw3n-6et55
This paper proposes a mathematical formalism for the synthesis and qualitative analysis of computational networks that treats data and control in the same manner. Expressions in this notation are given a direct interpretation in the implementation domain. Topology,
broadcasting, pipelining, and similar properties of implementations can be determined directly from the expressions.
This treatment of computational networks emphasizes the space/time tradeoff of implementations. A full instantiation in space of most computational problems is unrealistic, even in VLSI (Finnegan [4]). Therefore, computations also have to be at least partially
instantiated in the time domain, requiring the use of explicit control mechanisms, which typically cause the data flow to be nonstationary and sometimes turbulent.https://authors.library.caltech.eduhttps://authors.library.caltech.edu/records/ekw3n-6et55