Article records
https://feeds.library.caltech.edu/people/Bruck-J/article.rss
A Caltech Library Repository Feedhttp://www.rssboard.org/rss-specificationpython-feedgenenThu, 30 Nov 2023 17:51:59 +0000A generalized convergence theorem for neural networks
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit88
Authors: Bruck, Jehoshua; Goodman, Joseph W.
Year: 1988
DOI: 10.1109/18.21239
A neural network model is presented in which each neuron performs a threshold logic function. The model always converges to a stable state when operating in a serial mode and to a cycle of length at most 2 when operating in a fully parallel mode. This property is the basis for the potential applications of the model, such as associative memory devices and combinatorial optimization. The two convergence theorems (for serial and fully parallel modes of operation) are reviewed, and a general convergence theorem is presented that unifies the two known cases. New relations between the neural network model and the problem of finding a minimum cut in a graph are obtained.https://authors.library.caltech.edu/records/q014y-7yq74Neural networks, error-correcting codes, and polynomials over the binary n-cube
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit89
Authors: Bruck, Jehoshua; Blaum, Mario
Year: 1989
DOI: 10.1109/18.42215
Several ways of relating the concept of error-correcting codes to the concept of neural networks are presented. Performing maximum-likelihood decoding in a linear block error-correcting code is shown to be equivalent to finding a global maximum of the energy function of a certain neural network. Given a linear block code, a neural network can be constructed in such a way that every codeword corresponds to a local maximum. The connection between maximization of polynomials over the n-cube and error-correcting codes is also investigated; the results suggest that decoding techniques can be a useful tool for solving such maximization problems. The results are generalized to both nonbinary and nonlinear codes.https://authors.library.caltech.edu/records/rdz4x-z2r71The hardness of decoding linear codes with preprocessing
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit90b
Authors: Bruck, Jehoshua; Naor, Moni
Year: 1990
DOI: 10.1109/18.52484
The problem of maximum-likelihood decoding of linear block codes is known to be hard. The fact that the problem remains hard even if the code is known in advance, and can be preprocessed for as long as desired in order to device a decoding algorithm, is shown. The hardness is based on the fact that existence of a polynomial-time algorithm implies that the polynomial hierarchy collapses. Thus, some linear block codes probably do not have an efficient decoder. The proof is based on results in complexity theory that relate uniform and nonuniform complexity classes.https://authors.library.caltech.edu/records/fgb06-gdy28On the number of spurious memories in the Hopfield model
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit90a
Authors: Bruck, Jehoshua; Roychowdhury, Vwani P.
Year: 1990
DOI: 10.1109/18.52486
The outer-product method for programming the Hopfield model is discussed. The method can result in many spurious stable states-exponential in the number of vectors that are to be stored-even in the case when the vectors are orthogonal.https://authors.library.caltech.edu/records/5k7nn-rdz62Efficient algorithms for reconfiguration in VLSI/WSI arrays
https://resolver.caltech.edu/CaltechAUTHORS:ROYieeetc90
Authors: Roychowdhury, Vwani P.; Bruck, Jehoshua; Kailath, Thomas
Year: 1990
DOI: 10.1109/12.54841
The issue of developing efficient algorithms for reconfiguring processor arrays in the presence of faulty processors and fixed hardware resources is discussed. The models discussed consist of a set of identical processors embedded in a flexible interconnection structure that is configured in the form of a rectangular grid. An array grid model based on single-track switches is considered. An efficient polynomial time algorithm is proposed for determining feasible reconfigurations for an array with a given distribution of faulty processors. In the process, it is shown that the set of conditions in the reconfigurability theorem is not necessary. A polynomial time algorithm is developed for finding feasible reconfigurations in an augmented single-track model and in array grid models with multiple-track switcheshttps://authors.library.caltech.edu/records/d3v4c-rrn48Decoding the Golay code with Venn diagrams
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit90
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1990
DOI: 10.1109/18.53756
A decoding algorithm, based on Venn diagrams, for decoding the [23, 12, 7] Golay code is presented. The decoding algorithm is based on the design properties of the parity sets of the code. As for other decoding algorithms for the Golay code, decoding can be easily done by hand.https://authors.library.caltech.edu/records/88482-3cz58On the Convergence Properties of the Hopfield Model
https://resolver.caltech.edu/CaltechAUTHORS:20120426-132042598
Authors: Bruck, Jehoshua
Year: 1990
DOI: 10.1109/5.58341
The main contribution of the present work is showing that the known convergence properties of the Hopfield model can be reduced to a very simple case, for which an elementary proof is provided. The convergence properties of the Hopfield model are dependent on the structure of the interconnections matrix W and the method by which the nodes are updated. Three cases are known: (1) convergence to a stable state when operating in a serial mode with symmetric W, (2) convergence to a cycle of length 2, at most, when operating in a fully parallel mode with symmetric W, and (3) convergence to a cycle of length 4 when operating in a fully parallel mode with antisymmetric W. The three known results are reviewed and it is proven that the fully parallel mode of operation is a special case of the serial model of operation. There are three more cases than can be considered using this characterization: serial mode of operation, antisymmetric W; serial mode of operation, arbitrary W; and fully parallel mode of operation, arbitrary W. By exhibiting exponential lower bounds on the length of the cycles in other cases, it is proven that the three known cases are the only interesting ones.https://authors.library.caltech.edu/records/m7hdv-r5t49Neural computation of arithmetic functions
https://resolver.caltech.edu/CaltechAUTHORS:20120503-090033553
Authors: Siu, Kai-Yeung; Bruck, Jehoshua
Year: 1990
DOI: 10.1109/5.58350
A neuron is modeled as a linear threshold gate, and the network architecture considered is the layered feedforward network. It is shown how common arithmetic functions such as multiplication and sorting can be efficiently computed in a shallow neural network. Some known results are improved by showing that the product of two n-bit numbers and sorting of n n-bit numbers can be computed by a polynomial-size neural network using only four and five unit delays, respectively. Moreover, the weights of each threshold element in the neural networks require O(log n)-bit (instead of n -bit) accuracy. These results can be extended to more complicated functions such as multiple products, division, rational functions, and approximation of analytic functions.https://authors.library.caltech.edu/records/e9eyf-cfj77Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs
https://resolver.caltech.edu/CaltechAUTHORS:ALOieeetit92
Authors: Alon, Noga; Bruck, Jehoshua; Naor, Joseph; Naor, Moni; Roth, Ron M.
Year: 1992
DOI: 10.1109/18.119713
A novel technique, based on the pseudo-random properties of certain graphs known as expanders, is used to obtain novel simple explicit constructions of asymptotically good codes. In one of the constructions, the expanders are used to enhance Justesen codes by replicating, shuffling, and then regrouping the code coordinates. For any fixed (small) rate, and for a sufficiently large alphabet, the codes thus obtained lie above the Zyablov bound. Using these codes as outer codes in a concatenated scheme, a second asymptotic good construction is obtained which applies to small alphabets (say, GF(2)) as well. Although these concatenated codes lie below the Zyablov bound, they are still superior to previously known explicit constructions in the zero-rate neighborhood.https://authors.library.caltech.edu/records/750gt-mvs87Tolerating faults in hypercubes using subcube partitioning
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc92a
Authors: Bruck, Jehoshua; Cypher, Robert; Soroker, Danny
Year: 1992
DOI: 10.1109/12.142686
We examine the issue of running algorithms on a hypercube which has both node and edge faults, and we assume a worst case distribution of the faults. We prove that for any constant c, an n-dimensional hypercube (n-cube) with n^c faulty components contains a fault-free subgraph that can implement a large class of hypercube algorithms with only a constant factor slowdown. In addition, our approach yields practical implementations for small numbers of faults. For example, we show that any regular algorithm can be implemented on an n-cube that has at most n-1 faults with slowdowns of at most 2 for computation and at most 4 for communication.
To the best of our knowledge this is the first result showing that an n-cube can tolerate more than O(n) arbitrarily placed faults with a constant factor slowdown.https://authors.library.caltech.edu/records/ygryd-nqj63New techniques for constructing EC/AUED codes
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc92b
Authors: Bruck, Jehoshua; Blaum, Mario
Year: 1992
DOI: 10.1109/12.166607
The most common method to construct a t-error correcting/all unidirectional error detecting (EC/AUED) code is to choose a t-error correcting (EC) code and then to append a tail in such a way that the new code can detect more than t errors when they are unidirectional. The tail is a function of the weight of the codeword.
We present two new techniques for constructing t-EC/AUED codes. The first technique modifies the t-EC code in such a way that the weight distribution of the original code is reduced. So, a smaller tail is needed. Frequently, this technique gives less overall redundancy than the best
available t-EC/AUED codes.https://authors.library.caltech.edu/records/4qk7p-k9813Coding for skew-tolerant parallel asynchronous communications
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit93a
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1993
DOI: 10.1109/18.212269
A communication channel consisting of several subchannels transmitting simultaneously and asynchronously is considered, an example being a board with several chips, where the subchannels are wires connecting the chips and differences in the lengths of the wires can result in asynchronous reception. A scheme that allows transmission without an acknowledgment of the message, therefore permitting pipelined communication and providing a higher bandwidth, is described. The scheme allows a certain number of transitions from a second message to arrive before reception of the current message has been completed, a condition called skew. Necessary and sufficient conditions for codes that can detect skew as well as for codes that are skew-tolerant, i.e. can correct the skew and allow continuous operation, are derived. Codes that satisfy the necessary and sufficient conditions are constructed, their optimality is studied, and efficient decoding algorithms are devised. Potential applications of the scheme are in on-chip, on-board, and board to board communications, enabling much higher communication bandwidth.https://authors.library.caltech.edu/records/csasz-fbr13Depth Efficient Neural Networks for Division and Related Problems
https://resolver.caltech.edu/CaltechAUTHORS:20120309-113620511
Authors: Siu, Kai-Yeung; Bruck, Jehoshua; Kailath, Thomas; Hofmeister, Thomas
Year: 1993
DOI: 10.1109/18.256501
An artificial neural network (ANN) is commonly modeled by a threshold circuit, a network of interconnected processing units called linear threshold gates. The depth of a circuit represents the number of unit delays or the time for parallel computation. The size of a circuit is the number of
gates and measures the amount of hardware. It was known
that traditional logic circuits consisting of only unbounded fan-in AND, OR, NOT gates would require at least Ω(log n/log log n) depth to compute common arithmetic functions such as the product or the quotient of two n-bit numbers, if the circuit size is polynomially bounded (in n). It is shown that ANN'S can be much more powerful than traditional logic circuits, assuming that each threshold gate can be built with a cost that is comparable to that of AND/OloRg ic gates. In particular, the main results show
that powering and division can be computed by polynomial-size ANN'S of depth 4, and multiple product can be computed by polynomial-size ANN'S of depth 5. Moreover, using the techniques developed here, a previous result can be improved by showing that the sorting of n n-bit numbers can be carried out in a depth-3 polynomial size ANN. Furthermore, it is shown that the sorting network is optimal in depth.https://authors.library.caltech.edu/records/qprpj-d1a77Fault-tolerant meshes and hypercubes with minimal numbers of spares
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc93a
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1993
DOI: 10.1109/12.241598
Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two- and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single faulty processor or communication link can seriously affect the performance of these machines.
This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. Our approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. We optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k = 1, we present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. We also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, we give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes.https://authors.library.caltech.edu/records/tgzxt-6qy80Constructions of skew-tolerant and skew-detecting codes
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit93b
Authors: Blaum, Mario; Bruck, Jehoshua; Khachatrian, Levon H.
Year: 1993
DOI: 10.1109/18.259671
The paradigm of skew-tolerant parallel asynchronous communication was introduced by Blaum and Bruck (see ibid., vol. 39, 1993) along with constructions for codes that can tolerate or detect skew. Some of these constructions were improved by Khachatrian (1991). In this paper these constructions are improved upon further, and the authors prove that the new constructions are, in a certain sense, optimal.https://authors.library.caltech.edu/records/2msng-t0y47A Note on "A Systematic (12,8) Code for Correcting Single Errors and Detecting Adjacent Errors"
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetc94
Authors: Blaum, Mario; Bruck, Jehoshua; Tolhuizen, Ludo
Year: 1994
DOI: 10.1109/12.250619
J.W. Schwartz and J.K. Wolf (ibid., vol. 39, no. 11, pp. 1403-1404, Nov. 1990) gave a parity check matrix for a systematic (12,8) binary code that corrects all single errors and detects eight of the nine double adjacent errors within any of the three 4-bit nibbles. We present a parity check matrix for a systematic (12,8) binary code that corrects all single errors and detects any pair of errors within a nibble.https://authors.library.caltech.edu/records/3dcp9-bvy40Fault-tolerant de Bruijn and shuffle-exchange networks
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds94
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1994
DOI: 10.1109/71.282566
This paper addresses the problem of creating a fault-tolerant interconnection network for a parallel computer. Three topologies, namely, the base-2 de Bruijn graph, the base-m de Bruijn graph, and the shuffle-exchange, are studied. For each topology an N+k node fault-tolerant graph is defined. These fault-tolerant graphs have the property that given any set of k node faults, the remaining N nodes contain the desired topology as a subgraph. All of the constructions given are the best known in terms of the degree of the fault-tolerant graph. We also investigate the use of buses to reduce the degrees of the fault-tolerant graphs still further.https://authors.library.caltech.edu/records/x2g2k-n9975Embedding cube-connected cycles graphs into faulty hypercubes
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeeetc94
Authors: Bruck, Jehoshua; Cypher, Robert; Soroker, Danny
Year: 1994
DOI: 10.1109/12.324546
We consider the problem of embedding a cube-connected cycles graph (CCC) into a hypercube with edge faults. Our main result is an algorithm that, given a list of faulty edges, computes an embedding of the CCC that spans all of the nodes and avoids all of the faulty edges. The algorithm has optimal running time and tolerates the maximum number of faults (in a worst-case setting). Because ascend-descend algorithms can be implemented efficiently on a CCC, this embedding enables the implementation of ascend-descend algorithms, such as bitonic sort, on hypercubes with edge faults. We also present a number of related results, including an algorithm for embedding a CCC into a hypercube with edge and node faults and an algorithm for embedding a spanning torus into a hypercube with edge faults.https://authors.library.caltech.edu/records/6qna5-eqy71Wildcard dimensions, coding theory and fault-tolerant meshes and hypercubes
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc95
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1995
DOI: 10.1109/12.367998
Hypercubes, meshes and tori are well known interconnection networks for parallel computers. The sets of edges in those graphs can be partitioned to dimensions. It is well known that the hypercube can be extended by adding a wildcard dimension resulting in a folded hypercube that has better fault-tolerant and communication capabilities. First we prove that the folded hypercube is optimal in the sense that only a single wildcard dimension can be added to the hypercube. We then investigate the idea of adding wildcard dimensions to d-dimensional meshes and tori. Using techniques from error correcting codes we construct d-dimensional meshes and tori with wildcard dimensions. Finally, we show how these constructions can be used to tolerate edge and node faults in mesh and torus networks.https://authors.library.caltech.edu/records/n6qsv-wbx29EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures
https://resolver.caltech.edu/CaltechAUTHORS:20120216-065736330
Authors: Blaum, Mario; Brady, Jim; Bruck, Jehoshua; Menon, Jai
Year: 1995
DOI: 10.1109/12.364531
We present a novel method, that we call EVENODD, for tolerating up to two disk failures in RAID architectures. EVENODD employs the addition of only two redundant disks and consists of simple exclusive-OR computations. This redundant storage is optimal, in the sense that two failed disks cannot be retrieved with less than two redundant disks. A major advantage of EVENODD is that it only requires parity hardware, which is typically present in standard RAID-5 controllers. Hence, EVENODD can be implemented on standard RAID-5 controllers without any hardware changes. The most commonly used scheme that employes optimal redundant storage (i.e., two extra disks) is based on Reed-Solomon (RS) error-correcting codes. This scheme requires computation over finite fields and results in a more complex implementation. For example, we show that the complexity of implementing EVENODD in a disk array with 15 disks is about 50% of the one required when using the RS scheme. The new scheme is not limited to RAID architectures: it can be used in any system requiring large symbols and relatively short codes, for instance, in multitrack magnetic recording. To this end, we also present a decoding algorithm for one column (track) in error.https://authors.library.caltech.edu/records/4azcy-f4q81CCL: a portable and tunable collective communication library for scalable parallel computers
https://resolver.caltech.edu/CaltechAUTHORS:BALieeetpds95
Authors: Bala, Vasanth; Bruck, Jehoshua; Cypher, Robert; Elustondo, Pablo; Ho, Alex; Ho, Ching-Tien; Kipnis, Shlomo; Snir, Marc
Year: 1995
DOI: 10.1109/71.342126
A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model.https://authors.library.caltech.edu/records/t2dhr-6gm34Delay-insensitive pipelined communication on parallel buses
https://resolver.caltech.edu/CaltechAUTHORS:20120215-131718595
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1995
DOI: 10.1109/12.381951
Consider a communication channel that consists of several subchannels transmitting simultaneously and asynchronously. As an example of this scheme, we can consider a board with several chips. The subchannels represent wires connecting between the chips where differences in the lengths of the wires might result in asynchronous reception. In current technology, the receiver acknowledges reception of the message before the transmitter sends the following message. Namely, pipelined utilization of the channel is not possible. Our main contribution is a scheme that enables transmission without an acknowledgment of the message, therefore enabling pipelined communication and providing a higher bandwidth. However, our scheme allows for a certain number of transitions from a second message to arrive before reception of the current message has been completed, a condition that we call skew. We have derived necessary and sufficient conditions for codes that can tolerate a certain amount of skew among adjacent messages (therefore, allowing for continuous operation) and detect a larger amount of skew when the original skew is exceeded. These results generalize previously known results. We have constructed codes that satisfy the necessary and sufficient conditions, studied their optimality, and devised efficient decoding algorithms. To the best of our knowledge, this is the first known scheme that permits efficient asynchronous communications without acknowledgment. Potential applications are in on-chip, on-board, and board to board communications, enabling much higher communication bandwidth.https://authors.library.caltech.edu/records/7rdht-08e65Computing global combine operations in the multiport postal model
https://resolver.caltech.edu/CaltechAUTHORS:BARieeetpds95
Authors: Bar-Noy, Amotz; Bruck, Jehoshua; Ho, Ching-Tien; Kipnis, Shlomo; Schieber, Baruch
Year: 1995
DOI: 10.1109/71.406965
Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multiport postal model. This model is characterized by three parameters: n-the number of processors, k-the number of ports per processor, and λ-the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent from k other processors λ-1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of communication rounds and minimizes the time spent by any processor in sending and receiving messageshttps://authors.library.caltech.edu/records/da8ms-0fp39On the design and implementation of broadcast and global combine operations using the postal model
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds96
Authors: Bruck, Jehoshua; De Coster, Luc; Dewulf, Natalie; Ho, Ching-Tien; Lauwereins, Rudy
Year: 1996
DOI: 10.1109/71.491579
There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1.
Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%.https://authors.library.caltech.edu/records/43aw7-78d60MDS array codes with independent parity symbols
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit96
Authors: Blaum, Mario; Bruck, Jehoshua; Vardy, Alexander
Year: 1996
DOI: 10.1109/18.485722
A new family of maximum distance separable (MDS) array codes is presented. The code arrays contain p information columns and r independent parity columns, each column consisting of p-1 bits, where p is a prime. We extend a previously known construction for the case r=2 to three and more parity columns. It is shown that when r=3 such extension is possible for any prime p. For larger values of r, we give necessary and sufficient conditions for our codes to be MDS, and then prove that if p belongs to a certain class of primes these conditions are satisfied up to r ≤ 8. One of the advantages of the new codes is that encoding and decoding may be accomplished using simple cyclic shifts and XOR operations on the columns of the code array. We develop efficient decoding procedures for the case of two- and three-column errors. This again extends the previously known results for the case of a single-column error. Another primary advantage of our codes is related to the problem of efficient information updates. We present upper and lower bounds on the average number of parity bits which have to be updated in an MDS code over GF (2^m), following an update in a single information bit. This average number is of importance in many storage applications which require frequent updates of information. We show that the upper bound obtained from our codes is close to the lower bound and, most importantly, does not depend on the size of the code symbols.https://authors.library.caltech.edu/records/8w2ps-yt124Fault-tolerant cube graphs and coding theory
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit96
Authors: Bruck, Jehoshua; Ho, Ching-Tien
Year: 1996
DOI: 10.1109/18.556609
Hypercubes, meshes, tori, and Omega networks are well-known interconnection networks for parallel computers. The structure of those graphs can be described in a more general framework called cube graphs. The idea is to assume that every node in a graph with ql nodes is represented by a unique string of l symbols over GF(q). The edges are specified by a set of offsets, those are vectors of length l over GF(q), where the two endpoints of an edge are an offset apart. We study techniques for tolerating edge faults in cube graphs that are based on adding redundant edges. The redundant graph has the property that the structure of the original graph can be maintained in the presence of edge faults. Our main contribution is a technique for adding the redundant edges that utilizes constructions of error-correcting codes and generalizes existing ad hoc techniques.https://authors.library.caltech.edu/records/jtga1-pr659An on-line algorithm for checkpoint placement
https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc97b
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1997
DOI: 10.1109/12.620479
Checkpointing enables us to reduce the time to recover from a fault by saving intermediate states of the program in a
reliable storage. The length of the intervals between checkpoints affects the execution time of programs. On one hand, long intervals lead to long reprocessing time, while, on the other hand, too frequent checkpointing leads to high checkpointing overhead. In this paper, we present an on-line algorithm for placement of checkpoints. The algorithm uses knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only on-line knowledge about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost.https://authors.library.caltech.edu/records/074wp-fx331Efficient algorithms for all-to-all communications in multiport message-passing systems
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds97
Authors: Bruck, Jehoshua; Ho, Ching-Tien; Kipnis, Shlomo; Upfal, Eli; Weathersby, Derrick
Year: 1997
DOI: 10.1109/71.642949
We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected message-passing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k ≥ 1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth.
In the index operation among n processors, initially, each processor has n blocks of data, and the goal is to exchange the ith block of processor j with the jth block of processor i. We present a class of index algorithms that is designed for all values of n and that features a trade-off between the communication start-up time and the data transfer time. This class of algorithms includes two special cases: an algorithm that is optimal with respect to the measure of the start-up time, and an algorithm that is optimal with respect to the measure of the data transfer time. We also present experimental results featuring the performance tuneability of our index algorithms on the IBM SP-1 parallel system.
In the concatenation operation, among n processors, initially, each processor has one block of data, and the goal is to concatenate the n blocks of data from the n processors, and to make the concatenation result known to all the processors. We present a concatenation algorithm that is optimal, for most values of n, in the number of communication rounds and in the amount of data transferred.https://authors.library.caltech.edu/records/ypzfe-0bb45Performance optimization of checkpointing schemes with task duplication
https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc97a
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1997
DOI: 10.1109/12.641939
In checkpointing schemes with task duplication, checkpointing serves two purposes: detecting faults by comparing the processors' states at checkpoints, and reducing fault recovery time by supplying a safe point to rollback to. In this paper, we show that, by tuning the checkpointing schemes to a given architecture, a significant reduction in the execution time can be achieved. The main idea is to use two types of checkpoints: compare-checkpoints (comparing the states of the redundant processes to detect faults) and store-checkpoints (storing the states to reduce recovery time). With two types of checkpoints, we can use both the comparison and storage operations in an efficient way and improve the performance of checkpointing schemes. Results we obtained show that, in some cases, using compare and store checkpoints can reduce the overhead of DMR checkpointing schemes by as much as 30 percent.https://authors.library.caltech.edu/records/hckdg-rf028A coding approach for detection of tampering in write-once optical disks
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetc98
Authors: Blaum, Mario; Bruck, Jehoshua; Rubin, Kurt; Lenth, Wilfried
Year: 1998
DOI: 10.1109/12.656095
We present coding methods for protecting against tampering of write-once optical disks, which turns them into a secure digital medium for applications where critical information must be stored in a way that prevents or allows detection of an attempt at falsification. Our method involves adding a small amount of redundancy to a modulated sector of data. This extra redundancy is not used for normal operation, but can be used for determining, say, as a testimony in court, that a disk has not been tampered with.https://authors.library.caltech.edu/records/g0495-at522Analysis of checkpointing schemes with task duplication
https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc98
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/12.663769
This paper suggests a technique for analyzing the performance of checkpointing schemes with task duplication. We show how this technique can be used to derive the average execution time of a task and other important parameters related to the performance of checkpointing schemes. The analysis results are used to study and compare the performance of four existing checkpointing schemes. Our comparison results show that, in general, the number of processors used, not the complexity of the scheme, has the most effect on the scheme performance.https://authors.library.caltech.edu/records/zd31m-rh050Interleaving schemes for multidimensional cluster errors
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit98
Authors: Blaum, Mario; Bruck, Jehoshua; Vardy, Alexander
Year: 1998
DOI: 10.1109/18.661516
We present two-dimensional and three-dimensional interleaving techniques for correcting two- and three-dimensional bursts (or clusters) of errors, where a cluster of errors is characterized by its area or volume. Correction of multidimensional error clusters is required in holographic storage, an emerging application of considerable importance. Our main contribution is the construction of efficient two-dimensional and three-dimensional interleaving schemes. The proposed schemes are based on t-interleaved arrays of integers, defined by the property that every connected component of area or volume t consists of distinct integers. In the two-dimensional case, our constructions are optimal: they have the lowest possible interleaving degree. That is, the resulting t-interleaved arrays contain the smallest possible number of distinct integers, hence minimizing the number of codewords required in an interleaving scheme. In general, we observe that the interleaving problem can be interpreted as a graph-coloring problem, and introduce the useful special class of lattice interleavers. We employ a result of Minkowski, dating back to 1904, to establish both upper and lower bounds on the interleaving degree of lattice interleavers in three dimensions. For the case t≡0 mod 6, the upper and lower bounds coincide, and the Minkowski lattice directly yields an optimal lattice interleaver. For t≠0 mod 6, we construct efficient lattice interleavers using approximations of the Minkowski lattice.https://authors.library.caltech.edu/records/t4s49-2nn79Deterministic voting in distributed systems using error-correcting codes
https://resolver.caltech.edu/CaltechAUTHORS:XULieeetpds98
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/71.706052
Distributed voting is an important problem in reliable computing. In an N Modular Redundant (NMR) system, the N computational modules execute identical tasks and they need to periodically vote on their current states. In this paper, we propose a deterministic majority voting algorithm for NMR systems. Our voting algorithm uses error-correcting codes to drastically reduce the average case communication complexity. In particular, we show that the efficiency of our voting algorithm can be improved by choosing the parameters of the error-correcting code to match the probability of the computational faults. For example, consider an NMR system with 31 modules, each with a state of m bits, where each module has an independent computational error probability of 10^-3. In, this NMR system, our algorithm can reduce the average case communication complexity to approximately 1.0825 m compared with the communication complexity of 31 m of the naive algorithm in which every module broadcasts its local result to all other modules. We have also implemented the voting algorithm over a network of workstations. The experimental performance results match well the theoretical predictions.https://authors.library.caltech.edu/records/5kcm3-eqy92Programmable neural logic
https://resolver.caltech.edu/CaltechAUTHORS:BOHieeetcpmtb98
Authors: Bohossian, Vasken; Hasler, Paul; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/96.730415
Circuits of threshold elements (Boolean input, Boolean output neurons) have been shown to be surprisingly powerful. Useful functions such as XOR, ADD and MULTIPLY can be implemented by such circuits more efficiently than by traditional AND/OR circuits. In view of that, we have designed and built a programmable threshold element. The weights are stored on polysilicon floating gates, providing long-term retention without refresh. The weight value is increased using tunneling and decreased via hot electron injection. A weight is stored on a single transistor allowing the development of dense arrays of threshold elements. A 16-input programmable neuron was fabricated in the standard 2 μm double-poly, analog process available from MOSIS.
We also designed and fabricated the multiple threshold element introduced in [5]. It presents the advantage of reducing the area of the layout from O(n^2) to O(n); (n being the number of variables) for a broad class of Boolean functions, in particular symmetric Boolean functions such as PARITY.
A long term goal of this research is to incorporate programmable single/multiple threshold elements, as building blocks in field programmable gate arrays.https://authors.library.caltech.edu/records/21z5w-1t664Partial-sum queries in OLAP data cubes using covering codes
https://resolver.caltech.edu/CaltechAUTHORS:HOCieeetc98
Authors: Ho, Ching-Tien; Bruck, Jehoshua; Agrawal, Rakesh
Year: 1998
DOI: 10.1109/12.737680
A partial-sum query obtains the summation over a set of specified cells of a data cube. We establish a connection between the covering problem in the theory of error-correcting codes and the partial-sum problem and use this connection to devise algorithms for the partial-sum problem with efficient space-time trade-offs. For example, using our algorithms, with 44 percent additional storage, the query response time can be improved by about 12 percent; by roughly doubling the storage requirement, the query response time can be improved by about 34 percent.https://authors.library.caltech.edu/records/z54j1-r2r11X-code: MDS array codes with optimal encoding
https://resolver.caltech.edu/CaltechAUTHORS:XULieeetit99b
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1999
DOI: 10.1109/18.746809
We present a new class of MDS (maximum distance separable) array codes of size n×n (n a prime number) called X-code. The X-codes are of minimum column distance 3, namely, they can correct either one column error or two column erasures. The key novelty in X-code is that it has a simple geometrical construction which achieves encoding/update optimal complexity, i.e., a change of any single information bit affects exactly two parity bits. The key idea in our constructions is that all parity symbols are placed in rows rather than columns.https://authors.library.caltech.edu/records/2s8tf-z1m79Efficient digital-to-analog encoding
https://resolver.caltech.edu/CaltechAUTHORS:GIBieeetit99
Authors: Gibson, Michael A.; Bruck, Jehoshua
Year: 1999
DOI: 10.1109/18.771156
An important issue in analog circuit design is the problem of digital-to-analog conversion, i.e., the encoding of Boolean variables into a single analog value which contains enough information to reconstruct the values of the Boolean variables. A natural question is: what is the complexity of implementing the digital-to-analog encoding function? That question was answered by Wegener (see Inform. Processing Lett., vol.60, no.1, p.49-52, 1995), who proved matching lower and upper bounds on the size of the circuit for the encoding function. In particular, it was proven that [(3n-1)/2] 2-input arithmetic gates are necessary and sufficient for implementing the encoding function of n Boolean variables. However, the proof of the upper bound is not constructive. In this paper, we present an explicit construction of a digital-to-analog encoder that is optimal in the number of 2-input arithmetic gates. In addition, we present an efficient analog-to-digital decoding algorithm. Namely, given the encoded analog value, our decoding algorithm reconstructs the original Boolean values. Our construction is suboptimal in that it uses constants of maximum size n log n bits; the nonconstructive proof uses constants of maximum size 2n+[log n] bits.https://authors.library.caltech.edu/records/b6f6r-g8w29Low-density MDS codes and factors of complete graphs
https://resolver.caltech.edu/CaltechAUTHORS:XULieeetit99a
Authors: Xu, Lihao; Bohossian, Vasken; Bruck, Jehoshua; Wagner, David G.
Year: 1999
DOI: 10.1109/18.782102
We present a class of array code of size n×l, where l=2n or 2n+1, called B-Code. The distances of the B-Code and its dual are 3 and l-1, respectively. The B-Code and its dual are optimal in the sense that i) they are maximum-distance separable (MDS), ii) they have an optimal encoding property, i.e., the number of the parity bits that are affected by change of a single information bit is minimal, and iii) they have optimal length. Using a new graph description of the codes, we prove an equivalence relation between the construction of the B-Code (or its dual) and a combinatorial problem known as perfect one-factorization of complete graphs, thus obtaining constructions of two families of the B-Code and its dual, one of which is new. Efficient decoding algorithms are also given, both for erasure correcting and for error correcting. The existence of perfect one-factorizations for every complete graph with an even number of nodes is a 35 years long conjecture in graph theory. The construction of B-Codes of arbitrary odd length will provide an affirmative answer to the conjecture.https://authors.library.caltech.edu/records/6gk8r-c3h23Efficient Exact Stochastic Simulation of Chemical Systems with Many Species and Many Channels
https://resolver.caltech.edu/CaltechAUTHORS:20170719-082029624
Authors: Gibson, Michael A.; Bruck, Jehoshua
Year: 2000
DOI: 10.1021/jp993732q
There are two fundamental ways to view coupled systems of chemical equations: as continuous, represented by differential equations whose variables are concentrations, or as discrete, represented by stochastic processes whose variables are numbers of molecules. Although the former is by far more common, systems with very small numbers of molecules are important in some applications (e.g., in small biological cells or in surface processes). In both views, most complicated systems with multiple reaction channels and multiple chemical species cannot be solved analytically. There are exact numerical simulation methods to simulate trajectories of discrete, stochastic systems, (methods that are rigorously equivalent to the Master Equation approach) but these do not scale well to systems with many reaction pathways. This paper presents the Next Reaction Method, an exact algorithm to simulate coupled chemical reactions that is also efficient: it (a) uses only a single random number per simulation event, and (b) takes time proportional to the logarithm of the number of reactions, not to the number of reactions itself. The Next Reaction Method is extended to include time-dependent rate constants and non-Markov processes and is applied to a sample application in biology (the lysis/lysogeny decision circuit of lambda phage). The performance of the Next Reaction Method on this application is compared with one standard method and an optimized version of that standard method.https://authors.library.caltech.edu/records/zkvwc-5wc86MDS array codes for correcting a signle criss-cross error
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit00b
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 2000
DOI: 10.1109/18.841187
We present a family of maximum-distance separable (MDS) array codes of size (p-1)×(p-1), p a prime number, and minimum criss-cross distance 3, i.e., the code is capable of correcting any row or column in error, without a priori knowledge of what type of error occurred. The complexity of the encoding and decoding algorithms is lower than that of known codes with the same error-correcting power, since our algorithms are based on exclusive-OR operations over lines of different slopes, as opposed to algebraic operations over a finite field. We also provide efficient encoding and decoding algorithms for errors and erasures.https://authors.library.caltech.edu/records/rnkg3-kxn52Scaffold proteins may biphasically affect the levels of mitogen-activated protein kinase signaling and reduce its threshold properties
https://resolver.caltech.edu/CaltechAUTHORS:LEVpnas00
Authors: Levchenko, Andre; Bruck, Jehoshua; Sternberg, Paul W.
Year: 2000
PMCID: PMC18517
In addition to preventing crosstalk among related signaling pathways, scaffold proteins might facilitate signal transduction by preforming multimolecular complexes that can be rapidly activated by incoming signal. In many cases, such as mitogen-activated protein kinase (MAPK) cascades, scaffold proteins are necessary for full activation of a signaling pathway. To date, however, no detailed biochemical model of scaffold action has been suggested. Here we describe a quantitative computer model of MAPK cascade with a generic scaffold protein. Analysis of this model reveals that formation of scaffold-kinase complexes can be used effectively to regulate the specificity, efficiency, and amplitude of signal propagation. In particular, for any generic scaffold there exists a concentration value optimal for signal amplitude. The location of the optimum is determined by the concentrations of the kinases rather than their binding constants and in this way is scaffold independent. This effect and the alteration of threshold properties of the signal propagation at high scaffold concentrations might alter local signaling properties at different subcellular compartments. Different scaffold levels and types might then confer specialized properties to tune evolutionarily conserved signaling modules to specific cellular contexts.https://authors.library.caltech.edu/records/pv9g2-e7t51Tolerating multiple faults in multistage interconnection networks with minimal extra stages
https://resolver.caltech.edu/CaltechAUTHORS:FANieeetc00
Authors: Fan, Chenggong Charles; Bruck, Jehoshua
Year: 2000
DOI: 10.1109/12.869334
Adams and Siegel (1982) proposed an extra stage cube interconnection network that tolerates one switch failure with one extra stage. We extend their results and discover a class of extra stage interconnection networks that tolerate multiple switch failures with a minimal number of extra stages. Adopting the same fault model as Adams and Siegel, the faulty switches can be bypassed by a pair of demultiplexer/multiplexer combinations. It is easy to show that, to maintain point to point and broadcast connectivities, there must be at least S extra stages to tolerate I switch failures. We present the first known construction of an extra stage interconnection network that meets this lower-bound. This 12-dimensional multistage interconnection network has n+f stages and tolerates I switch failures. An n-bit label called mask is used for each stage that indicates the bit differences between the two inputs coming into a common switch. We designed the fault-tolerant construction such that it repeatedly uses the singleton basis of the n-dimensional vector space as the stage mask vectors. This construction is further generalized and we prove that an n-dimensional multistage interconnection network is optimally fault-tolerant if and only if the mask vectors of every n consecutive stages span the n-dimensional vector space.https://authors.library.caltech.edu/records/c097g-9ph90Coding for tolerance and detection of skew in parallel asynchronous communications
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit00
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 2000
DOI: 10.1109/18.887847
We provide a new definition for the concept of skew in parallel asynchronous communications introduced by Blaum and Bruck (1993). The new definition extends and strengthens previously known results on skew. We give necessary and sufficient conditions for codes that can tolerate a certain amount of skew under the new definition. We also extend the results to codes that can tolerate a certain amount of skew and detect a larger amount of skew when the tolerating threshold is exceededhttps://authors.library.caltech.edu/records/ma02m-d0258Computing in the RAIN: a reliable array of independent nodes
https://resolver.caltech.edu/CaltechAUTHORS:BOHieeetpds01
Authors: Bohossian, Vasken; Fan, Chenggong C.; LeMahieu, Paul S.; Riedel, Marc D.; Xu, Lihao; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/71.910866
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology.https://authors.library.caltech.edu/records/y4cys-77k02Introduction to the special section on dependable network computing
https://resolver.caltech.edu/CaltechAUTHORS:AVRieeetpds01
Authors: Avresky, D. R.; Bruck, Jehoshua; Culler, David E.
Year: 2001
DOI: 10.1109/TPDS.2001.910865
Dependable network computing is becoming a key part of our daily economic and social life. Every day, millions of users and businesses are utilizing the Internet infrastructure for real-time electronic commerce transactions, scheduling important events, and building relationships. While network traffic and the number of users are rapidly growing, the mean-time between failures (MTTF) is surprisingly short; according to recent studies, in the majority of Internet backbone paths, the MTTF is 28 days. This leads to a strong requirement for highly dependable networks, servers, and software systems. The challenge is to build interconnected systems, based on available technology, that are inexpensive, accessible, scalable, and dependable. This special section provides insights into a number of these exciting challenges.https://authors.library.caltech.edu/records/cdqjv-fnx20The Raincore API for clusters of networking elements
https://resolver.caltech.edu/CaltechAUTHORS:FANieeeic01
Authors: Fan, Chenggong Charles; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/4236.957897
Clustering technology offers a way to increase overall reliability and performance of Internet information flow by strengthening one link in the chain without adding others. We have implemented this technology in a distributed computing architecture for network elements. The architecture, called Raincore, originated in the Reliable Array of Independent Nodes, or RAIN, research collaboration between the California Institute of Technology and the US National Aeronautics and Space Agency's Jet Propulsion Laboratory. The RAIN project focused on developing high-performance, fault-tolerant, portable clustering technology for spaceborne computing . The technology that emerged from this project became the basis for a spinoff company, Rainfinity, which has the exclusive intellectual property rights to the RAIN technology. The authors describe the Raincore conceptual architecture and distributed services, which are designed to make it easy for developers to port their applications to run on top of a cluster of networking elements. We include two applications: a Web server prototype that was part of the original RAIN research project and a commercial firewall cluster product from Rainfinity.https://authors.library.caltech.edu/records/w0jh4-vgp11A group membership algorithm with a practical specification
https://resolver.caltech.edu/CaltechAUTHORS:FRAieeetpds01
Authors: Franceschetti, Martin; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/71.969128
Presents a solvable specification and gives an algorithm for the group membership problem in asynchronous systems with crash failures. Our specification requires processes to maintain a consistent history in their sequences of views. This allows processes to order failures and recoveries in time and simplifies the programming of high level applications. Previous work has proven that the group membership problem cannot be solved in asynchronous systems with crash failures. We circumvent this impossibility result building a weaker, yet nontrivial specification. We show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification. We also relate our solution to other methods and give a classification of progress properties that can be achieved under different models.https://authors.library.caltech.edu/records/3sj39-fx517Splitting schedules for Internet broadcast communication
https://resolver.caltech.edu/CaltechAUTHORS:FOLieeetit02.854
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2002
DOI: 10.1109/18.978728
The broadcast disk provides an effective way to transmit information from a server to many clients. Work has been done to schedule the broadcast of information in a way that minimizes the expected waiting time of the clients. Much of this work has treated the information as indivisible blocks. We look at splitting items into smaller pieces that need not be broadcast consecutively. This allows us to have better schedules with lower expected waiting times. We look at the case of two items of the same length, each split into two halves, and show how to achieve optimal performance. We prove the surprising result that there are only two possible types of optimal cyclic schedules for items 1, and 2. These start with 1122 and 122122. For example, with demand probabilities p1= 0.08 and p2= 0.92, the best order to use in broadcasting the halves of items 1 and 2 is a cyclic schedule with cycle 122122222. We also look at items of different lengths and show that much of the analysis remains the same, resulting in a similar set of optimal schedules.https://authors.library.caltech.edu/records/gvhjj-4rk59Algebraic techniques for constructing minimal weight threshold functions
https://resolver.caltech.edu/CaltechAUTHORS:BOHsiamjdm03
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 2003
DOI: 10.1137/S0895480197326048
A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The best known lower bounds on the size of threshold circuits are for depth-2 circuits with small (polynomial-size) weights. However, in general, the weights are arbitrary integers and can be of exponential size in the number of input variables. Namely, obtaining progress in lower bounds for threshold circuits seems to be related to understanding the role of large weights. In the present literature, a distinction is made between the two extreme cases of linear threshold functions with polynomial-size weights, as opposed to those with exponential-size weights. Our main contributions are in devising two novel methods for constructing threshold functions with minimal weights and filling up the gap between polynomial and exponential weight growth by further refining the separation. Namely, we prove that the class of linear threshold functions with polynomial-size weights can be divided into subclasses according to the degree of the polynomial. In fact, we prove a more general result — that there exists a minimal weight linear threshold function for any arbitrary number of inputs and any weight size.https://authors.library.caltech.edu/records/namf7-9qd10Covering algorithms, continuum percolation and the geometry of wireless networks
https://resolver.caltech.edu/CaltechAUTHORS:BOOaoap03
Authors: Booth, Lorna; Bruck, Jehoshua; Franceschetti, Massimo; Meester, Ronald
Year: 2003
DOI: 10.1214/aoap/1050689601
Continuum percolation models in which each point of a two-dimensional Poisson point process is the centre of a disc of given (or random) radius r, have been extensively studied. In this paper, we consider the generalization in which a deterministic algorithm (given the points of the point process) places the discs on the plane, in such a way that each disc covers at least one point of the point process and that each point is covered by at least one disc. This gives a model for wireless communication networks, which was the original motivation to study this class of problems.
We look at the percolation properties of this generalized model, showing that an unbounded connected component of discs does not exist, almost surely, for small values of the density lambda of the Poisson point process, for any covering algorithm. In general, it turns out not to be true that unbounded connected components arise when lambda is taken sufficiently high. However, we identify some large families of covering algorithms, for which such an unbounded component does arise for large values of lambda.
We show how a simple scaling operation can change the percolation properties of the model, leading to the almost sure existence of an unbounded connected component for large values of lambda, for any covering algorithm.
Finally, we show that a large class of covering algorithms, which arise in many practical applications, can get arbitrarily close to achieving a minimal density of covering discs. We also construct an algorithm that achieves this minimal density.https://authors.library.caltech.edu/records/nw85n-cq682A Geometric Theorem for Network Design
https://resolver.caltech.edu/CaltechAUTHORS:FRAieeetc04
Authors: Franceschetti, Massimo; Cook, Matthew; Bruck, Jehoshua
Year: 2004
DOI: 10.1109/TC.2004.1268406
Consider an infinite square grid G. How many discs of given radius r, centered at the vertices of G, are required, in the worst case, to completely cover an arbitrary disc of radius r placed on the plane? We show that this number is an integer in the set {3,4,5,6} whose value depends on the ratio of r to the grid spacing. One application of this result is to design facility location algorithms with constant approximation factors. Another application is to determine if a grid network design, where facilities are placed on a regular grid in a way that each potential customer is within a reasonably small radius around the facility, is cost effective in comparison to a nongrid design. This can be relevant to determine a cost effective design for base station placement in a wireless network.https://authors.library.caltech.edu/records/6ys3y-24856A random walk model of wave propagation
https://resolver.caltech.edu/CaltechAUTHORS:FRAieeetap04
Authors: Franceschetti, Massimo; Bruck, Jehoshua; Shulman, Leonard J.
Year: 2004
DOI: 10.1109/TAP.2004.827540
This paper shows that a reasonably accurate description of propagation loss in small urban cells can be obtained with a simple stochastic model based on the theory of random walks, that accounts for only two parameters: the amount of clutter and the amount of absorption in the environment. Despite the simplifications of the model, the derived analytical solution correctly describes the smooth transition of power attenuation from an inverse square law with the distance to the transmitter, to an exponential attenuation as this distance is increased - as it is observed in practice. Our analysis suggests using a simple exponential path loss formula as an alternative to the empirical formulas that are often used for prediction. Results are validated by comparison with experimental data collected in a small urban cell.https://authors.library.caltech.edu/records/ekacz-8vk35Regulatory modules that generate biphasic signal
response in biological systems
https://resolver.caltech.edu/CaltechAUTHORS:20111014-095736081
Authors: Levchenko, A.; Bruck, J.; Sternberg, P. W.
Year: 2004
DOI: 10.1049/sb:20045014
Biochemical networks might be composed of modules. It is still not clear how biochemical modules can be defined and characterised. Here we propose a functional approach to
module definition, considering different classes of biphasic regulation modules, which effect optimal cell response to intermediate signal strength. Each regulation class might possess unique properties that make it especially suitable for particular biological functions.https://authors.library.caltech.edu/records/v4mec-z2551Multicluster interleaving on paths and cycles
https://resolver.caltech.edu/CaltechAUTHORS:JIAieeetit05
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2005
DOI: 10.1109/TIT.2004.840893
Interleaving codewords is an important method not only for combatting burst errors, but also for distributed data retrieval. This paper introduces the concept of multicluster interleaving (MCI), a generalization of traditional interleaving problems. MCI problems for paths and cycles are studied. The following problem is solved: how to interleave integers on a path or cycle such that any m (m/spl ges/2) nonoverlapping clusters of order 2 in the path or cycle have at least three distinct integers. We then present a scheme using a "hierarchical-chain structure" to solve the following more general problem for paths: how to interleave integers on a path such that any m (m/spl ges/2) nonoverlapping clusters of order L (L/spl ges/2) in the path have at least L+1 distinct integers. It is shown that the scheme solves the second interleaving problem for paths that are asymptotically as long as the longest path on which an MCI exists, and clearly, for shorter paths as well.https://authors.library.caltech.edu/records/7b4k9-c3k29Continuum Percolation with Unreliable and Spread-Out Connections
https://resolver.caltech.edu/CaltechAUTHORS:20191009-093219813
Authors: Franceschetti, Massimo; Booth, Lorna; Cook, Matthew; Meester, Ronald; Bruck, Jehoshua
Year: 2005
DOI: 10.1007/s10955-004-8826-0
We derive percolation results in the continuum plane that lead to what appears to be a general tendency of many stochastic network models. Namely, when the selection mechanism according to which nodes are connected to each other, is sufficiently spread out, then a lower density of nodes, or on average fewer connections per node, are sufficient to obtain an unbounded connected component. We look at two different transformations that spread-out connections and decrease the critical percolation density while preserving the average node degree. Our results indicate that real networks can exploit the presence of spread-out and unreliable connections to achieve connectivity more easily, provided they can maintain the average number of functioningconnections per node.https://authors.library.caltech.edu/records/nwdmc-pv183An automated system for measuring parameters of nematode sinusoidal movement
https://resolver.caltech.edu/CaltechAUTHORS:CRObmcg05
Authors: Cronin, Christopher J.; Mendel, Jane E.; Mukhtar, Saleem; Kim, Young-Mee; Stirbl, Robert C.; Bruck, Jehoshua; Sternberg, Paul W.
Year: 2005
DOI: 10.1186/1471-2156-6-5
PMCID: PMC549551
Background: Nematode sinusoidal movement has been used as a phenotype in many studies of C. elegans development, behavior and physiology. A thorough understanding of the ways in which genes control these aspects of biology depends, in part, on the accuracy of phenotypic analysis. While worms that move poorly are relatively easy to describe, description of hyperactive movement and movement modulation presents more of a challenge. An enhanced capability to analyze all the complexities of nematode movement will thus help our understanding of how genes control behavior.
Results: We have developed a user-friendly system to analyze nematode movement in an automated and quantitative manner. In this system nematodes are automatically recognized and a computer-controlled microscope stage ensures that the nematode is kept within the camera field of view while video images from the camera are stored on videotape. In a second step, the images from the videotapes are processed to recognize the worm and to extract its changing position and posture over time. From this information, a variety of movement parameters are calculated. These parameters include the velocity of the worm's centroid, the velocity of the worm along its track, the extent and frequency of body bending, the amplitude and wavelength of the sinusoidal movement, and the propagation of the contraction wave along the body. The length of the worm is also determined and used to normalize the amplitude and wavelength measurements.
To demonstrate the utility of this system, we report here a comparison of movement parameters for a small set of mutants affecting the Go/Gq mediated signaling network that controls acetylcholine release at the neuromuscular junction. The system allows comparison of distinct genotypes that affect movement similarly (activation of Gq-alpha versus loss of Go-alpha function), as well as of different mutant alleles at a single locus (null and dominant negative alleles of the goa-1 gene, which encodes Go-alpha). We also demonstrate the use of this system for analyzing the effects of toxic agents. Concentration-response curves for the toxicants arsenite and aldicarb, both of which affect motility, were determined for wild-type and several mutant strains, identifying P-glycoprotein mutants as not significantly more sensitive to either compound, while cat-4 mutants are more sensitive to arsenite but not aldicarb.
Conclusions: Automated analysis of nematode movement facilitates a broad spectrum of experiments. Detailed genetic analysis of multiple alleles and of distinct genes in a regulatory network is now possible. These studies will facilitate quantitative modeling of C. elegans movement, as well as a comparison of gene function. Concentration-response curves will allow rigorous analysis of toxic agents as well as of pharmacological agents. This type of system thus represents a powerful analytical tool that can be readily coupled with the molecular genetics of nematodes.https://authors.library.caltech.edu/records/mbs8e-rsm97Network file storage with graceful performance degradation
https://resolver.caltech.edu/CaltechAUTHORS:20161107-163620734
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2005
DOI: 10.1145/1063786.1063788
A file storage scheme is proposed for networks containing heterogeneous clients. In the scheme, the performance measured by file-retrieval delays degrades gracefully under increasingly serious faulty circumstances. The scheme combines coding with storage for better performance. The problem is NP-hard for general networks; and this article focuses on tree networks with asymmetric edges between adjacent nodes. A polynomial-time memory-allocation algorithm is presented, which determines how much data to store on each node, with the objective of minimizing the total amount of data stored in the network. Then a polynomial-time data-interleaving algorithm is used to determine which data to store on each node for satisfying the quality-of-service requirements in the scheme. By combining the memory-allocation algorithm with the data-interleaving algorithm, an optimal solution to realize the file storage scheme in tree networks is established.https://authors.library.caltech.edu/records/yq3za-eej922020 Computing: Can computers help to explain biology?
https://resolver.caltech.edu/CaltechAUTHORS:20150319-090453634
Authors: Brent, Roger; Bruck, Jehoshua
Year: 2006
DOI: 10.1038/440416a
The road leading from computer formalisms to explaining biological function will be difficult, but Roger Brent and Jehoshua Bruck suggest three hopeful paths that could take us closer to this goal.https://authors.library.caltech.edu/records/fv21j-z6y85The encoding complexity of network coding
https://resolver.caltech.edu/CaltechAUTHORS:LANieeetit06
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/TIT.2006.874434
In the multicast network coding problem, a source s needs to deliver h packets to a set of k terminals over an underlying communication network G. The nodes of the multicast network can be broadly categorized into two groups. The first group includes encoding nodes, i.e., nodes that generate new packets by combining data received from two or more incoming links. The second group includes forwarding nodes that can only duplicate and forward the incoming packets. Encoding nodes are, in general, more expensive due to the need to equip them with encoding capabilities. In addition, encoding nodes incur delay and increase the overall complexity of the network. Accordingly, in this paper, we study the design of multicast coding networks with a limited number of encoding nodes. We prove that in a directed acyclic coding network, the number of encoding nodes required to achieve the capacity of the network is bounded by h/sup 3/k/sup 2/. Namely, we present (efficiently constructible) network codes that achieve capacity in which the total number of encoding nodes is independent of the size of the network and is bounded by h/sup 3/k/sup 2/. We show that the number of encoding nodes may depend both on h and k by presenting acyclic coding networks that require /spl Omega/(h/sup 2/k) encoding nodes. In the general case of coding networks with cycles, we show that the number of encoding nodes is limited by the size of the minimum feedback link set, i.e., the minimum number of links that must be removed from the network in order to eliminate cycles. We prove that the number of encoding nodes is bounded by (2B+1)h/sup 3/k/sup 2/, where B is the minimum size of a feedback link set. Finally, we observe that determining or even crudely approximating the minimum number of required encoding nodes is an NP-hard problem.https://authors.library.caltech.edu/records/kan0p-0j970Optimal Interleaving on Tori
https://resolver.caltech.edu/CaltechAUTHORS:JIAsiamjdm06
Authors: Jiang, Anxiao (Andrew); Cook, Matthew; Bruck, Jehoshua
Year: 2006
DOI: 10.1137/040618655
This paper studies $t$-interleaving on two-dimensional tori. Interleaving has applications in distributed data storage and burst error correction, and is closely related to Lee metric codes. A $t$-interleaving of a graph is defined as a vertex coloring in which any connected subgraph of $t$ or fewer vertices has a distinct color at every vertex. We say that a torus can be perfectly t-interleaved if its t-interleaving number (the minimum number of colors needed for a t-interleaving) meets the sphere-packing lower bound, $\lceil t^2/2 \rceil$. We show that a torus is perfectly t-interleavable if and only if its dimensions are both multiples of $\frac{t^2+1}{2}$ (if t is odd) or t (if t is even). The next natural question is how much bigger the t-interleaving number is for those tori that are not perfectly t-interleavable, and the most important contribution of this paper is to find an optimal interleaving for all sufficiently large tori, proving that when a torus is large enough in both dimensions, its t-interleaving number is at most just one more than the sphere-packing lower bound. We also obtain bounds on t-interleaving numbers for the cases where one or both dimensions are not large, thus completing a general characterization of t-interleaving numbers for two-dimensional tori. Each of our upper bounds is accompanied by an efficient t-interleaving scheme that constructively achieves the bound.https://authors.library.caltech.edu/records/m54rs-art16MAP: Medial axis based geometric routing in sensor networks
https://resolver.caltech.edu/CaltechAUTHORS:20100505-134021747
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2007
DOI: 10.1007/s11276-006-9857-z
One of the challenging tasks in the deployment of dense wireless networks (like sensor networks) is in devising a routing scheme for node to node communication. Important consideration includes scalability, routing complexity, quality of communication paths and the load sharing of the routes. In this paper, we show that a compact and expressive abstraction of network connectivity by the medial axis enables efficient and localized routing. We propose MAP, a Medial Axis based naming and routing Protocol that does not require geographical locations, makes routing decisions locally, and achieves good load balancing. In its preprocessing phase, MAP constructs the medial axis of the sensor field, defined as the set of nodes with at least two closest boundary nodes. The medial axis of the network captures both the complex geometry and non-trivial topology of the sensor field. It can be represented succinctly by a graph whose size is in the order of the complexity of the geometric features (e.g., the number of holes). Each node is then given a name related to its position with respect to the medial axis. The routing scheme is derived through local decisions based on the names of the source and destination nodes and guarantees delivery with reasonable and natural routes. We show by both theoretical analysis and simulations that our medial axis based geometric routing scheme is scalable, produces short routes, achieves excellent load balancing, and is very robust to variations in the network model.https://authors.library.caltech.edu/records/762qx-x3s09Constrained Codes as Networks of Relations
https://resolver.caltech.edu/CaltechAUTHORS:SCHWieeetit08
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2008
DOI: 10.1109/TIT.2008.920245
We address the well-known problem of determining the capacity of constrained coding systems. While the one-dimensional case is well understood to the extent that there are techniques for rigorously deriving the exact capacity, in contrast, computing the exact capacity of a two-dimensional constrained coding system is still an elusive research challenge. The only known exception in the two-dimensional case is an exact (however, not rigorous) solution to the (1,∞)-run-length limited (RLL) system on the hexagonal lattice. Furthermore, only exponential-time algorithms are known for the related problem of counting the exact number of constrained two-dimensional information arrays.
We present the first known rigorous technique that yields an exact capacity of a two-dimensional constrained coding system. In addition, we devise an efficient (polynomial time) algorithm for counting the exact number of constrained arrays of any given size. Our approach is a composition of a number of ideas and techniques: describing the capacity problem as a solution to a counting problem in networks of relations, graph-theoretic tools originally developed in the field of statistical mechanics, techniques for efficiently simulating quantum circuits, as well as ideas from the theory related to the spectral distribution of Toeplitz matrices. Using our technique, we derive a closed-form solution to the capacity related to the Path-Cover constraint in a two-dimensional triangular array (the resulting calculated capacity is 0.72399217...). Path-Cover is a generalization of the well known one-dimensional(0,1)-RLL constraint for which the capacity is known to be 0.69424....https://authors.library.caltech.edu/records/g35sa-7k789The Alpha Project: a model system for systems biology research
https://resolver.caltech.edu/CaltechAUTHORS:20090728-082033135
Authors: Yu, R. C.; Resnekov, O.; Abola, A. P.; Andrews, S. S.; Benjamin, K. R.; Bruck, J.; Burbulis, I. E.; Colman-Lerner, A.; Endy, D.; Gordon, A.; Holl, M.; Lok, L.; Pesce, C. G.; Serra, E.; Smith, R. D.; Thomson, T. M.; Tsong, A. E.; Brent, R.
Year: 2008
DOI: 10.1049/iet-syb:20080127
One goal of systems biology is to understand how genome-encoded parts interact to produce quantitative
phenotypes. The Alpha Project is a medium-scale, interdisciplinary systems biology effort that aims to achieve this
goal by understanding fundamental quantitative behaviours of a prototypic signal transduction pathway, the yeast
pheromone response system from Saccharomyces cerevisiae. The Alpha Project distinguishes itself from many other
systems biology projects by studying a tightly bounded and well-characterised system that is easily modified by
genetic means, and by focusing on deep understanding of a discrete number of important and accessible
quantitative behaviours. During the project, the authors have developed tools to measure the appropriate data
and develop models at appropriate levels of detail to study a number of these quantitative behaviours. The
authors have also developed transportable experimental tools and conceptual frameworks for understanding
other signalling systems. In particular, the authors have begun to interpret system behaviours and their
underlying molecular mechanisms through the lens of information transmission, a principal function of signalling
systems. The Alpha Project demonstrates that interdisciplinary studies that identify key quantitative behaviours
and measure important quantities, in the context of well-articulated abstractions of system function and
appropriate analytical frameworks, can lead to deeper biological understanding. The authors' experience may
provide a productive template for systems biology investigations of other cellular systems.https://authors.library.caltech.edu/records/scn1j-3n835Optimal Universal Schedules for Discrete Broadcast
https://resolver.caltech.edu/CaltechAUTHORS:LANieeetit08
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2008
DOI: 10.1109/TIT.2008.928296
We study broadcast systems that distribute a series of data updates to a large number of passive clients. The updates are sent over a broadcast channel in the form of discrete packets. We assume that clients periodically access the channel to obtain the most recent update. Such scenarios arise in many practical applications, such as distribution of traffic information and market updates to mobile wireless devices.https://authors.library.caltech.edu/records/pznyh-27r04Graphene-based atomic-scale switches
https://resolver.caltech.edu/CaltechAUTHORS:STAnl08
Authors: Standley, Brian; Bao, Wenzhong; Zhang, Hang; Bruck, Jehoshua; Lau, Chun Ning; Bockrath, Marc
Year: 2008
DOI: 10.1021/nl801774a
Graphene's remarkable mechanical and electrical properties, combined with its compatibility with existing planar silicon-based technology, make it an attractive material for novel computing devices. We report the development of a nonvolatile memory element based on graphene break junctions. Our devices have demonstrated thousands of writing cycles and long retention times. We propose a model for device operation based on the formation and breaking of carbon atomic chains that bridge the junctions. We demonstrate information storage based on the concept of rank coding, in which information is stored in the relative conductance of graphene switches in a memory cell.https://authors.library.caltech.edu/records/7wy4q-67270Computation with finite stochastic chemical reaction networks
https://resolver.caltech.edu/CaltechAUTHORS:20111020-132840264
Authors: Soloveichik, David; Cook, Matthew; Winfree, Erik; Bruck, Jehoshua
Year: 2008
DOI: 10.1007/s11047-008-9067-y
A highly desired part of the synthetic biology toolbox is an embedded chemical microcontroller, capable of autonomously following a logic program specified by a set of instructions, and interacting with its cellular environment. Strategies for incorporating logic in aqueous chemistry have focused primarily on implementing components, such as logic gates, that are composed into larger circuits, with each logic gate in the circuit corresponding to one or more molecular species. With this paradigm, designing and producing new molecular species is necessary to perform larger computations. An alternative approach begins by noticing that chemical systems on the small scale are fundamentally discrete and stochastic. In particular, the exact molecular counts of each molecular species present, is an intrinsically available form of information. This might appear to be a very weak form of information, perhaps quite difficult for computations to utilize. Indeed, it has been shown that error-free Turing universal computation is impossible in this setting. Nevertheless, we show a design of a chemical computer that achieves fast and reliable Turing-universal computation using molecular counts. Our scheme uses only a small number of different molecular species to do computation of arbitrary complexity. The total probability of error of the computation can be made arbitrarily small (but not zero) by adjusting the initial molecular counts of certain species. While physical implementations would be difficult, these results demonstrate that molecular counts can be a useful form of information for small molecular systems such as those operating within cellular environments.https://authors.library.caltech.edu/records/1abkk-rra71Network Coding: A Computational Perspective
https://resolver.caltech.edu/CaltechAUTHORS:LANieeetit09
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/TIT.2008.2008135
In this work, we study the computational perspective of network coding, focusing on two issues. First, we address the computational complexity of finding a network code for acyclic multicast networks. Second, we address the issue of reducing the amount of computation performed by network nodes. In particular, we consider the problem of finding a network code with the minimum possible number of encoding nodes, i.e., nodes that generate new packets by performing algebraic operations on packets received over incoming links.https://authors.library.caltech.edu/records/emahk-qb398Localization and routing in sensor networks by local angle information
https://resolver.caltech.edu/CaltechAUTHORS:20090504-113921187
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2009
DOI: 10.1145/1464420.1464427
Location information is useful both for network organization and for sensor data integrity. In this article, we study the anchor-free 2D localization problem by using local angle measurements. We prove that given a unit disk graph and the angles between adjacent edges, it is NP-hard to find a valid embedding in the plane such that neighboring nodes are within distance 1 from each other and non-neighboring nodes are at least distance √2/2 away. Despite the negative results, however, we can find a planar spanner of a unit disk graph by using only local angles. The planar spanner can be used to generate a set of virtual coordinates that enable efficient and local routing schemes such as geographical routing or approximate shortest path routing. We also proposed a practical anchor-free embedding scheme by solving a linear program. We show by simulation that it gives both a good local embedding, with neighboring nodes embedded close and non-neighboring nodes far away, and a satisfactory global view such that geographical routing and approximate shortest path routing on the embedded graph are almost identical to those on the original (true) embedding.https://authors.library.caltech.edu/records/x2rfx-3jc47Shortening array codes and the perfect 1-factorization conjecture
https://resolver.caltech.edu/CaltechAUTHORS:20090717-115258499
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/TIT.2008.2009850
The existence of a perfect 1-factorization of the complete graph with n nodes, namely, K_n , for arbitrary even number n, is a 40-year-old open problem in graph theory. So far, two infinite families of perfect 1-factorizations have been shown to exist, namely, the factorizations of K_(p+1) and K_2p , where p is an arbitrary prime number (p > 2) . It was shown in previous work that finding a perfect 1-factorization of K_n is related to a problem in coding, specifically, it can be reduced to constructing an MDS (Minimum Distance Separable), lowest density array code. In this paper, a new method for shortening arbitrary array codes is introduced. It is then used to derive the K_(p+1) family of perfect 1-factorization from the K_2p family. Namely, techniques from coding theory are used to prove a new result in graph theory-that the two factorization families are related.https://authors.library.caltech.edu/records/d12hj-hqr49Cyclic lowest density MDS array codes
https://resolver.caltech.edu/CaltechAUTHORS:20090514-113712174
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/TIT.2009.2013024
Three new families of lowest density maximum-distance separable (MDS) array codes are constructed, which are cyclic or quasi-cyclic. In addition to their optimal redundancy (MDS) and optimal update complexity (lowest density), the symmetry offered by the new codes can be utilized for simplified implementation in storage applications. The proof of the code properties has an indirect structure: first MDS codes that are not cyclic are constructed, and then transformed to cyclic codes by a minimum-distance preserving transformation.https://authors.library.caltech.edu/records/dnkfk-thb37Rank Modulation for Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20090820-152000947
Authors: Jiang, Anxiao (Andrew); Mateescu, Robert; Schwartz, Moshe; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/TIT.2009.2018336
We explore a novel data representation scheme for multilevel flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. The only allowed charge-placement mechanism is a ldquopush-to-the-toprdquo operation, which takes a single cell of the set and makes it the top-charged cell. The resulting scheme eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. We present unrestricted Gray codes spanning all possible n-cell states and using only "push-to-the-top" operations, and also construct balanced Gray codes. One important application of the Gray codes is the realization of logic multilevel cells, which is useful in conventional storage solutions. We also investigate rewriting schemes for random data modification. We present both an optimal scheme for the worst case rewrite performance and an approximation scheme for the average-case rewrite performance.https://authors.library.caltech.edu/records/9zr6t-5rz21Interleaving schemes on circulant graphs with two offsets
https://resolver.caltech.edu/CaltechAUTHORS:20090925-102048176
Authors: Slivkins, Aleksandrs; Bruck, Jehoshua
Year: 2009
DOI: 10.1016/j.disc.2009.01.020
Interleaving is used for error-correcting on a bursty noisy channel. Given a graph G describing the topology of the channel, we label the vertices of G so that each label-set is sufficiently sparse. The interleaving scheme corrects for any error burst of size at most t; it is a labeling where the distance between any two vertices in the same label-set is at least t.
We consider interleaving schemes on infinite circulant graphs with two offsets 1 and d. In such a graph the vertices are integers; edge ij exists if and only if |i−j|∈{1,d}. Our goal is to minimize the number of labels used.
Our constructions are covers of the graph by the minimal number of translates of some label-set S. We focus on minimizing the index of S, which is the inverse of its density rounded up. We establish lower bounds and prove that our constructions are optimal or almost optimal, both for the index of S and for the number of labels.https://authors.library.caltech.edu/records/xejj9-qf205On the Capacity of the Precision-Resolution System
https://resolver.caltech.edu/CaltechAUTHORS:20100407-095238680
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/TIT.2009.2039089
Arguably, the most prominent constrained system in storage applications is the (d,k)-run-length limited (RLL) system, where every binary sequence obeys the constraint that every two adjacent 1's are separated by at least d consecutive 0's and at most k consecutive 0's, namely, runs of 0's are length limited. The motivation for the RLL constraint arises mainly from the physical limitations of the read and write technologies in magnetic and optical storage systems. We revisit the rationale for the RLL system, reevaluate its relationship to the constraints of the physical media and propose a new framework that we call the Precision-Resolution (PR) system. Specifically, in the PR system there is a separation between the encoder constraints (which relate to the precision of writing information into the physical media) and the decoder constraints (which relate to its resolution, namely, the ability to distinguish between two different signals received by reading the physical media). We compute the capacity of a general PR system and compare it to the traditional RLL system.https://authors.library.caltech.edu/records/92wrr-4jm54Codes for Asymmetric Limited-Magnitude Errors With Application to Multilevel Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20100415-101346040
Authors: Cassuto, Yuval; Schwartz, Moshe; Bohossian, Vasken; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/TIT.2010.2040971
Several physical effects that limit the reliability and performance of multilevel flash memories induce errors that have low magnitudes and are dominantly asymmetric. This paper studies block codes for asymmetric limited-magnitude errors over q-ary channels. We propose code constructions and bounds for such channels when the number of errors is bounded by t and the error magnitudes are bounded by ℓ. The constructions utilize known codes for symmetric errors, over small alphabets, to protect large-alphabet symbols from asymmetric limited-magnitude errors. The encoding and decoding of these codes are performed over the small alphabet whose size depends only on the maximum error magnitude and is independent of the alphabet size of the outer code. Moreover, the size of the codes is shown to exceed the sizes of known codes (for related error models), and asymptotic rate-optimality results are proved. Extensions of the construction are proposed to accommodate variations on the error model and to include systematic codes as a benefit to practical implementation.https://authors.library.caltech.edu/records/rj4a5-1yq88Correcting Charge-Constrained Errors in the Rank-Modulation Scheme
https://resolver.caltech.edu/CaltechAUTHORS:20100617-092653362
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua; Schwartz, Moshe
Year: 2010
DOI: 10.1109/TIT.2010.2043764
We investigate error-correcting codes for a the
rank-modulation scheme with an application to flash memory
devices. In this scheme, a set of n cells stores information in the
permutation induced by the different charge levels of the individual
cells. The resulting scheme eliminates the need for discrete
cell levels, overcomes overshoot errors when programming cells (a
serious problem that reduces the writing speed), and mitigates the
problem of asymmetric errors. In this paper, we study the properties
of error-correcting codes for charge-constrained errors in the
rank-modulation scheme. In this error model the number of errors
corresponds to the minimal number of adjacent transpositions required
to change a given stored permutation to another erroneous
one—a distance measure known as Kendall's τ-distance.We show
bounds on the size of such codes, and use metric-embedding techniques
to give constructions which translate a wealth of knowledge
of codes in the Lee metric to codes over permutations in Kendall's
τ-metric. Specifically, the one-error-correcting codes we construct
are at least half the ball-packing upper bound.https://authors.library.caltech.edu/records/dqy7b-w7b59Storage Coding for Wear Leveling in Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20170309-140500073
Authors: Jiang, Anxiao (Andrew); Mateescu, Robert; Yaakobi, Eitan; Bruck, Jehoshua; Siegel, Paul H.; Vardy, Alexander; Wolf, Jack K.
Year: 2010
DOI: 10.1109/TIT.2010.2059833
Flash memory is a nonvolatile computer memory comprised of blocks of cells, wherein each cell is implemented as either NAND or NOR floating gate. NAND flash is currently the most widely used type of flash memory. In a NAND flash memory, every block of cells consists of numerous pages; rewriting even a single page requires the whole block to be erased and reprogrammed. Block erasures determine both the longevity and the efficiency of a flash memory. Therefore, when data in a NAND flash memory are reorganized, minimizing the total number of block erasures required to achieve the desired data movement is an important goal. This leads to the flash data movement problem studied in this paper. We show that coding can significantly reduce the number of block erasures required for data movement, and present several optimal or nearly optimal data-movement algorithms based upon ideas from coding theory and combinatorics. In particular, we show that the sorting-based (noncoding) schemes require O(n log n) erasures to move data among n blocks, whereas coding-based schemes require only O(n) erasures. Furthermore, coding-based schemes use only one auxiliary block, which is the best possible and achieve a good balance between the number of erasures in each of the n+1 blocks.https://authors.library.caltech.edu/records/cfs8k-bx907Rewriting Codes for Joint Information Storage in Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20101108-153955478
Authors: Jiang, Anxiao; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/TIT.2010.2059530
Memories whose storage cells transit irreversibly between
states have been common since the start of the data storage
technology. In recent years, flash memories have become a very
important family of such memories. A flash memory cell has q
states—state 0.1.....q-1 - and can only transit from a lower
state to a higher state before the expensive erasure operation takes
place. We study rewriting codes that enable the data stored in a
group of cells to be rewritten by only shifting the cells to higher
states. Since the considered state transitions are irreversible, the
number of rewrites is bounded. Our objective is to maximize the
number of times the data can be rewritten. We focus on the joint
storage of data in flash memories, and study two rewriting codes
for two different scenarios. The first code, called floating code, is for
the joint storage of multiple variables, where every rewrite changes
one variable. The second code, called buffer code, is for remembering
the most recent data in a data stream. Many of the codes
presented here are either optimal or asymptotically optimal. We
also present bounds to the performance of general codes. The results
show that rewriting codes can integrate a flash memory's
rewriting capabilities for different variables to a high degree.https://authors.library.caltech.edu/records/zy6p3-5qb43Neural network computation with DNA strand displacement cascades
https://resolver.caltech.edu/CaltechAUTHORS:20110801-112437228
Authors: Qian, Lulu; Winfree, Erik; Bruck, Jehoshua
Year: 2011
DOI: 10.1038/nature10262
The impressive capabilities of the mammalian brain—ranging from perception, pattern recognition and memory formation to decision making and motor activity control—have inspired their re-creation in a wide range of artificial intelligence systems for applications such as face recognition, anomaly detection, medical diagnosis and robotic vehicle control. Yet before neuron-based brains evolved, complex biomolecular circuits provided individual cells with the 'intelligent' behaviour required for survival. However, the study of how molecules can 'think' has not produced an equal variety of computational models and applications of artificial chemical systems. Although biomolecular systems have been hypothesized to carry out neural-network-like computations in vivo and the synthesis of artificial chemical analogues has been proposed theoretically, experimental work has so far fallen short of fully implementing even a single neuron. Here, building on the richness of DNA computing and strand displacement circuitry, we show how molecular systems can exhibit autonomous brain-like behaviours. Using a simple DNA gate architecture that allows experimental scale-up of multilayer digital circuits, we systematically transform arbitrary linear threshold circuits (an artificial neural network model) into DNA strand displacement cascades that function as small neural networks. Our approach even allows us to implement a Hopfield associative memory with four fully connected artificial neurons that, after training in silico, remembers four single-stranded DNA patterns and recalls the most similar one when presented with an incomplete pattern. Our results suggest that DNA strand displacement cascades could be used to endow autonomous chemical systems with the capability of recognizing patterns of molecular events, making decisions and responding to the environment.https://authors.library.caltech.edu/records/fryqk-9d474Transforming Probabilities With Combinational Logic
https://resolver.caltech.edu/CaltechAUTHORS:20110922-104607230
Authors: Qian, Weikang; Riedel, Marc D.; Zhou, Hongchao; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/TCAD.2011.2144630
Schemes for probabilistic computation can exploit
physical sources to generate random values in the form of
bit streams. Generally, each source has a fixed bias and so
provides bits with a specific probability of being one. If many different probability values are required, it can be expensive to generate all of these directly from physical sources. This paper demonstrates novel techniques for synthesizing combinational logic that transforms source probabilities into different target probabilities. We consider three scenarios in terms of whether the source probabilities are specified and whether they can be
duplicated. In the case that the source probabilities are not specified and can be duplicated, we provide a specific choice, the set {0.4, 0.5}; we show how to synthesize logic that transforms probabilities from this set into arbitrary decimal probabilities. Further, we show that for any integer n ≥ 2, there exists a single probability that can be transformed into arbitrary base-n fractional probabilities. In the case that the source probabilities
are specified and cannot be duplicated, we provide two methods for synthesizing logic to transform them into target probabilities. In the case that the source probabilities are not specified, but once chosen cannot be duplicated, we provide an optimal choice.https://authors.library.caltech.edu/records/x1f81-z2p41Constant-Weight Gray Codes for Local Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20120420-092834437
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/TIT.2011.2162570
We consider the local rank-modulation (LRM) scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. LRM is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study constant-weight Gray codes for the LRM scheme in order to simulate conventional multilevel flash cells while retaining the benefits of rank modulation. We present a practical construction of codes with asymptotically-optimal rate and weight asymptotically half the length, thus having an asymptotically-optimal charge difference between adjacent cells. Next, we turn to examine the existence of optimal codes by specifically studying codes of weight 2 and 3. In the former case, we upper bound the code efficiency, proving that there are no such asymptotically-optimal cyclic codes. In contrast, for the latter case we construct codes which are asymptotically-optimal. We conclude by providing necessary conditions for the existence of cyclic and cyclic optimal Gray codes.https://authors.library.caltech.edu/records/xa8fe-w9a19Low-Complexity Array Codes for Random and Clustered 4-Erasures
https://resolver.caltech.edu/CaltechAUTHORS:20120203-154015264
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/TIT.2011.2171518
A new family of low-complexity array codes is proposed for correcting 4 column erasures. The new codes are tailored for the new error model of clustered column erasures that captures the properties of high-order failure combinations in storage arrays. The model of clustered column erasures considers the number of erased columns, together with the number of clusters into which they fall, without pre-defining the sizes of the clusters. This model addresses the problem of correlated device failures in storage arrays, whereby each failure event may affect multiple devices in a single cluster. The new codes correct essentially all combinations of clustered 4 erasures, i.e., those combinations that fall into three or less clusters. The new codes are significantly more efficient, in all relevant complexity measures, than the best known 4-erasure correcting codes. These measures include encoding complexity, decoding complexity and update complexity.https://authors.library.caltech.edu/records/3xrrc-q9d41On the Capacity and Programming of Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120326-082904019
Authors: Jiang, Anxiao (Andrew); Li, Hao; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/TIT.2011.2177755
Flash memories are currently the most widely used type of nonvolatile memories. A flash memory consists of floating-gate cells as its storage elements, where the charge level stored in a cell is used to represent data. Compared to magnetic recording and optical recording, flash memories have the unique property that the cells are programmed using an iterative procedure that monotonically shifts each cell's charge level upward toward its target value. In this paper, we model the cell as a monotonic storage channel, and explore its capacity and optimal programming. We present two optimal programming algorithms based on a few different noise models and optimization objectives.https://authors.library.caltech.edu/records/m53em-ky777Efficient Generation of Random Bits From Finite State Markov Chains
https://resolver.caltech.edu/CaltechAUTHORS:20120503-095551724
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/TIT.2011.2175698
The problem of random number generation from an uncorrelated random source (of unknown probability distribution) dates back to von Neumann's 1951 work. Elias (1972) generalized von Neumann's scheme and showed how to achieve optimal efficiency in unbiased random bits generation. Hence, a natural question is what if the sources are correlated? Both Elias and Samuelson proposed methods for generating unbiased random bits in the case of correlated sources (of unknown probability distribution), specifically, they considered finite Markov chains. However, their proposed methods are not efficient or have implementation difficulties. Blum (1986) devised an algorithm for efficiently generating random bits from degree-2 finite Markov chains in expected linear time, however, his beautiful method is still far from optimality on information-efficiency. In this paper, we generalize Blum's algorithm to arbitrary degree finite Markov chains and combine it with Elias's method for efficient generation of unbiased bits. As a result, we provide the first known algorithm that generates unbiased random bits from an arbitrary finite Markov chain, operates in expected linear time and achieves the information-theoretic upper bound on efficiency.https://authors.library.caltech.edu/records/hnf87-01v85Cyclic Boolean circuits
https://resolver.caltech.edu/CaltechAUTHORS:20121109-103154708
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2012
DOI: 10.1016/j.dam.2012.03.039
A Boolean circuit is a collection of gates and wires that performs a mapping from Boolean inputs to Boolean outputs. The accepted wisdom is that such circuits must have acyclic (i.e., loop-free or feed-forward) topologies. In fact, the model is often defined this way–as a directed acyclic graph (DAG). And yet simple examples suggest that this is incorrect. We advocate that Boolean circuits should have cyclic topologies (i.e., loops or feedback paths). In other work, we demonstrated the practical implications of this view: digital circuits can be designed with fewer gates if they contain cycles. In this paper, we explore the theoretical underpinnings of the idea. We show that the complexity of implementing Boolean functions can be lower with cyclic topologies than with acyclic topologies. With examples, we show that certain Boolean functions can be implemented by cyclic circuits with as little as one-half the number of gates that are required by equivalent acyclic circuits. We also show a quadratic upper bound: given a cyclic Boolean circuit with m gates, there exists an equivalent acyclic Boolean circuit with m^2 gates.https://authors.library.caltech.edu/records/hkdp1-4se80Zigzag Codes: MDS Array Codes With Optimal Rebuilding
https://resolver.caltech.edu/CaltechAUTHORS:20130321-102330661
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2013
DOI: 10.1109/TIT.2012.2227110
Maximum distance separable (MDS) array codes are widely used in storage systems to protect data against erasures. We address the rebuilding ratio problem, namely, in the case of erasures, what is the fraction of the remaining information that needs to be accessed in order to rebuild exactly the lost information? It is clear that when the number of erasures equals the maximum number of erasures that an MDS code can correct, then the rebuilding ratio is 1 (access all the remaining information). However, the interesting and more practical case is when the number of erasures is smaller than the erasure correcting capability of the code. For example, consider an MDS code that can correct two erasures: What is the smallest amount of information that one needs to access in order to correct a single erasure? Previous work showed that the rebuilding ratio is bounded between 1/2 and 3/4; however, the exact value was left as an open problem. In this paper, we solve this open problem and prove that for the case of a single erasure with a two-erasure correcting code, the rebuilding ratio is 1/2. In general, we construct a new family of r-erasure correcting MDS array codes that has optimal rebuilding ratio of 1/(r) in the case of a single erasure. Our array codes have efficient encoding and decoding algorithms (for the cases r=2 and r=3, they use a finite field of size 3 and 4, respectively) and an optimal update property.https://authors.library.caltech.edu/records/c80wj-h6a11On the Average Complexity of Reed–Solomon List Decoders
https://resolver.caltech.edu/CaltechAUTHORS:20130429-085506668
Authors: Cassuto, Yuval; Bruck, Jehoshua; McEliece, Robert J.
Year: 2013
DOI: 10.1109/TIT.2012.2235522
The number of monomials required to interpolate a received word in an algebraic list decoder for Reed-Solomon codes depends on the instantaneous channel error, and not only on the decoder design parameters. The implications of this fact are that the decoder should be able to exhibit lower decoding complexity for low-weight errors and, consequently, enjoy a better average-case decoding complexity and a higher decoding throughput. On the analytical side, this paper studies the dependence of interpolation costs on instantaneous errors, in both hard- and soft-decision decoders. On the algorithmic side, it provides an efficient interpolation algorithm, based on the state-of-the-art interpolation algorithm, that enjoys reduced running times for reduced interpolation costs.https://authors.library.caltech.edu/records/z781c-c3a08Nonuniform Codes for Correcting Asymmetric Errors in Data Storage
https://resolver.caltech.edu/CaltechAUTHORS:20130617-115747407
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2013
DOI: 10.1109/TIT.2013.2241175
The construction of asymmetric error-correcting codes is a topic that was studied extensively, however; the existing approach for code construction assumes that every codeword should tolerate t asymmetric errors. Our main observation is that in contrast to symmetric errors, asymmetric errors are content dependent. For example, in Z-channels, the all-1 codeword is prone to have more errors than the all-0 codeword. This motivates us to develop nonuniform codes whose codewords can tolerate different numbers of asymmetric errors depending on their Hamming weights. The idea in a nonuniform codes' construction is to augment the redundancy in a content-dependent way and guarantee the worst case reliability while maximizing the code size. In this paper, we first study nonuniform codes for Z-channels, namely, they only suffer one type of errors, say 1→ 0. Specifically, we derive their upper bounds, analyze their asymptotic performances, and introduce two general constructions. Then, we extend the concept and results of nonuniform codes to general binary asymmetric channels, where the error probability for each bit from 0 to 1 is smaller than that from 1 to 0.https://authors.library.caltech.edu/records/y34mf-tx768Trajectory Codes for Flash Memory
https://resolver.caltech.edu/CaltechAUTHORS:20130826-103812140
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2013
DOI: 10.1109/TIT.2013.2251755
A generalized rewriting model is defined for flash memory that represents stored data and permitted rewrite operations by a directed graph. This model is a generalization of previously introduced rewriting models of codes, including floating codes, write-once memory codes, and buffer codes. This model is used to design a new rewriting code for flash memories. The new code, referred to as trajectory code, allows stored data to be rewritten as many times as possible without block erasures. It is proved that the trajectory codes are asymptotically optimal for a wide range of scenarios. In addition, rewriting codes that use a randomized rewriting scheme are presented that obtain good performance with high probability for all possible rewrite sequences.https://authors.library.caltech.edu/records/gt5x1-9xd26Generalized Gray Codes for Local Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20131017-105937816
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2013
DOI: 10.1109/TIT.2013.2268534
We consider the local rank-modulation scheme, in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. Local rank-modulation is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study gray codes for the local rank-modulation scheme in order to simulate conventional multilevel flash cells while retaining the benefits of rank modulation. Unlike the limited scope of previous works, we consider code constructions for the entire range of parameters including the code length, sliding-window size, and overlap between adjacent windows. We show that the presented codes have asymptotically optimal rate. We also provide efficient encoding, decoding, and next-state algorithms.https://authors.library.caltech.edu/records/gnz6e-9vx94Access Versus Bandwidth in Codes for Storage
https://resolver.caltech.edu/CaltechAUTHORS:20140425-151109319
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2014
DOI: 10.1109/TIT.2014.2305698
Maximum distance separable (MDS) codes are widely used in storage systems to protect against disk (node) failures. A node is said to have capacity l over some field F, if it can store that amount of symbols of the field. An (n, k, l) MDS code uses n nodes of capacity l to store k information nodes. The MDS property guarantees the resiliency to any n-k node failures. An optimal bandwidth (respectively, optimal access) MDS code communicates (respectively, accesses) the minimum amount of data during the repair process of a single failed node. It was shown that this amount equals a fraction of 1/(n - k) of data stored in each node. In previous optimal bandwidth constructions, l scaled polynomially with k in codes when the asymptotic rate is less than 1. Moreover, in constructions with a constant number of parities, i.e., when the rate approaches 1, l is scaled exponentially with k. In this paper, we focus on the case of linear codes with linear repair operations and constant number of parities n - k = r, and ask the following question: given the capacity of a node l what is the largest number of information disks k in an optimal bandwidth (respectively, access) (k + r, k, l) MDS code? We give an upper bound for the general case, and two tight bounds in the special cases of two important families of codes. The first is a family of codes with optimal update property, and the second is a family with optimal access property. Moreover, the bounds show that in some cases optimal-bandwidth codes have larger k than optimal-access codes, and therefore these two measures are not equivalent.https://authors.library.caltech.edu/records/677nb-sgx08Synthesis of Stochastic Flow Networks
https://resolver.caltech.edu/CaltechAUTHORS:20140627-103709955
Authors: Zhou, Hongchao; Chen, Ho-Lin; Bruck, Jehoshua
Year: 2014
DOI: 10.1109/TC.2012.270
A stochastic flow network is a directed graph with incoming edges (inputs) and outgoing edges (outputs), tokens enter through the input edges, travel stochastically in the network, and can exit the network through the output edges. Each node in the network is a splitter, namely, a token can enter a node through an incoming edge and exit on one of the output edges according to a predefined probability distribution. Stochastic flow networks can be easily implemented by beam splitters, or by DNA-based chemical reactions, with promising applications in optical computing, molecular computing and stochastic computing. In this paper, we address a fundamental synthesis question: Given a finite set of possible splitters and an arbitrary rational probability distribution, design a stochastic flow network, such that every token that enters the input edge will exit the outputs with the prescribed probability distribution. The problem of probability transformation dates back to von Neumann's 1951 work and was followed, among others, by Knuth and Yao in 1976. Most existing works have been focusing on the "simulation" of target distributions. In this paper, we design optimal-sized stochastic flow networks for "synthesizing" target distributions. It shows that when each splitter has two outgoing edges and is unbiased, an arbitrary rational probability ɑ/b with ɑ ≤ b ≤ 2^n can be realized by a stochastic flow network of size n that is optimal. Compared to the other stochastic systems, feedback (cycles in networks) strongly improves the expressibility of stochastic flow networks.https://authors.library.caltech.edu/records/edqx4-2j933Guest Editorial: Communication Methodologies for the Next-Generation Storage Systems
https://resolver.caltech.edu/CaltechAUTHORS:20140529-153422648
Authors: Dolecek, Lara; Blaum, Mario; Bruck, Jehoshua; Jiang, Anxiao (Andrew); Ramchandran, Kannan; Vasic, Bane
Year: 2014
DOI: 10.1109/JSAC.2014.140501
This issue consists of 22 high-caliber papers with contributions from both academia and industry. The papers are organized into the following six sections: (i) Channel Modeling and Signal Processing Algorithms for Emerging Memory Technologies, (ii) Error Control Coding Techniques for Flash Memories, (iii) Algebraic Methods with Applications to Non-Volatile Memories, (iv) Polar Codes with Application to Storage, (v) Performance Limits of Storage Systems, and (vi) Codes for Distributed Network Storage.https://authors.library.caltech.edu/records/n9ytp-az569Systematic Error-Correcting Codes for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20150202-150301749
Authors: Zhou, Hongchao; Schwartz, Moshe; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2014
DOI: 10.1109/TIT.2014.2365499
The rank-modulation scheme has been recently proposed for efficiently storing data in nonvolatile memories. In this paper, we explore [n, k, d] systematic error-correcting codes for rank modulation. Such codes have length n, k information symbols, and minimum distance d. Systematic codes have the benefits of enabling efficient information retrieval in conjunction with memory-scrubbing schemes. We study systematic codes for rank modulation under Kendall's T-metric as well as under the ℓ∞-metric. In Kendall's T-metric, we present [k + 2, k, 3] systematic codes for correcting a single error, which have optimal rates, unless systematic perfect codes exist. We also study the design of multierror-correcting codes, and provide a construction of [k + t + 1, k, 2t + 1] systematic codes, for large-enough k. We use nonconstructive arguments to show that for rank modulation, systematic codes achieve the same capacity as general error-correcting codes. Finally, in the ℓ∞-metric, we construct two [n, k, d] systematic multierror-correcting codes, the first for the case of d = 0(1) and the second for d = Θ(n). In the latter case, the codes have the same asymptotic rate as the best codes currently known in this metric.https://authors.library.caltech.edu/records/zcv5z-eg591Logic operations in memory using a memristive Akers array
https://resolver.caltech.edu/CaltechAUTHORS:20150105-103814566
Authors: Levy, Yifat; Bruck, Jehoshua; Cassuto, Yuval; Friedman, Eby G.; Kolodny, Avinoam; Yaakobi, Eitan; Kvatinsky, Shahar
Year: 2014
DOI: 10.1016/j.mejo.2014.06.006
In-memory computation is one of the most promising features of memristive memory arrays. In this paper, we propose an array architecture that supports in-memory computation based on a logic array first proposed in 1972 by Sheldon Akers. The Akers logic array satisfies this objective since this array can realize any Boolean function, including bit sorting. We present a hardware version of a modified Akers logic array, where the values stored within the array serve as primary inputs. The proposed logic array uses memristors, which are nonvolatile memory devices with noteworthy properties. An Akers logic array with memristors combines memory and logic operations, where the same array stores data and performs computation. This combination opens opportunities for novel non-von Neumann computer architectures, while reducing power and enhancing memory bandwidth.https://authors.library.caltech.edu/records/ezrw6-4p418Rank-Modulation Rewrite Coding for Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20150814-153713652
Authors: En Gad, Eyal; Yaakobi, Eitan; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2015
DOI: 10.1109/TIT.2015.2442579
The current flash memory technology focuses on the cost minimization of its static storage capacity. However, the resulting approach supports a relatively small number of program-erase cycles. This technology is effective for consumer devices (e.g., smartphones and cameras) where the number of program-erase cycles is small. However, it is not economical for enterprise storage systems that require a large number of lifetime writes. The proposed approach in this paper for alleviating this problem consists of the efficient integration of two key ideas: 1) improving reliability and endurance by representing the information using relative values via the rank modulation scheme and 2) increasing the overall (lifetime) capacity of the flash device via rewriting codes, namely, performing multiple writes per cell before erasure. This paper presents a new coding scheme that combines rank-modulation with rewriting. The key benefits of the new scheme include: 1) the ability to store close to 2 bit per cell on each write with minimal impact on the lifetime of the memory and 2) efficient encoding and decoding algorithms that make use of capacity-achieving write-once-memory codes that were proposed recently.https://authors.library.caltech.edu/records/5e76c-xe789Algorithms for Generating Probabilities with Multivalued Stochastic Relay Circuits
https://resolver.caltech.edu/CaltechAUTHORS:20151221-152740679
Authors: Lee, David T.; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/TC.2015.2401027
The problem of random number generation dates back to Von Neumann's work in 1951. Since then, many algorithms have been developed for generating unbiased bits from complex correlated sources as well as for generating arbitrary distributions from unbiased bits. An equally interesting, but less studied aspect is the structural component of random number generation. That is, given a set of primitive sources of randomness, and given composition rules induced by a device or nature, how can we build networks that generate arbitrary probability distributions? In this paper, we study the generation of arbitrary probability distributions in multivalued relay circuits, a generalization in which relays can take on any of N states and the logical 'and' and 'or' are replaced with 'min' and 'max' respectively. These circuits can be thought of as modeling the timing of events which depend on other event occurrences. We describe a duality property and give algorithms that synthesize arbitrary rational probability distributions. We prove that these networks are robust to errors and design a universal probability generator which takes input bits and outputs any desired binary probability distribution.https://authors.library.caltech.edu/records/9f9ps-p5918The Capacity of String-Duplication Systems
https://resolver.caltech.edu/CaltechAUTHORS:20160119-142638953
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2015.2505735
It is known that the majority of the human genome consists of duplicated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from duplicated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence using simple duplication rules, including those resembling genomic-duplication processes. In other words, our goal is to find the capacity, or the expressive power, of these string-duplication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-duplication systems. The study of these fundamental biologically inspired systems is an important step toward modeling and analyzing more complex biological processes.https://authors.library.caltech.edu/records/e2hgt-sj167Codes Correcting Erasures and Deletions for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20160225-140310853
Authors: Gabrys, Ryan; Yaakobi, Eitan; Farnoud (Hassanzadeh), Farzad; Sala, Frederic; Bruck, Jehoshua; Dolecek, Lara
Year: 2016
DOI: 10.1109/TIT.2015.2493147
Error-correcting codes for permutations have received considerable attention in the past few years, especially in applications of the rank modulation scheme for flash memories. While codes over several metrics have been studied, such as the Kendall τ, Ulam, and Hamming distances, no recent research has been carried out for erasures and deletions over permutations. In rank modulation, flash memory cells represent a permutation, which is induced by their relative charge levels. We explore problems that arise when some of the cells are either erased or deleted. In each case, we study how these erasures and deletions affect the information carried by the remaining cells. In particular, we study models that are symbol-invariant, where unaffected elements do not change their corresponding values from those in the original permutation, or permutation-invariant, where the remaining symbols are modified to form a new permutation with fewer elements. Our main approach in tackling these problems is to build upon the existing works of error-correcting codes and leverage them in order to construct codes in each model of deletions and erasures. The codes we develop are in certain cases asymptotically optimal, while in other cases, such as for codes in the Ulam distance, improve upon the state of the art results.https://authors.library.caltech.edu/records/km1t9-d3q89Bounds for Permutation Rate-Distortion
https://resolver.caltech.edu/CaltechAUTHORS:20160119-151223798
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2015.2504521
We study the rate-distortion relationship in the set of permutations endowed with the Kendall t-metric and the Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case distortion analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall τ-metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.https://authors.library.caltech.edu/records/3hq71-2m313Constructions and Decoding of Cyclic Codes Over b-Symbol Read Channels
https://resolver.caltech.edu/CaltechAUTHORS:20160426-074534474
Authors: Yaakobi, Eitan; Bruck, Jehoshua; Siegel, Paul H.
Year: 2016
DOI: 10.1109/TIT.2016.2522434
Symbol-pair read channels, in which the outputs of the read process are pairs of consecutive symbols, were recently studied by Cassuto and Blaum. This new paradigm is motivated by the limitations of the reading process in high density data storage systems. They studied error correction in this new paradigm, specifically, the relationship between the minimum Hamming distance of an error correcting code and the minimum pair distance, which is the minimum Hamming distance between symbol-pair vectors derived from codewords of the code. It was proved that for a linear cyclic code with minimum Hamming distance d_H, the corresponding minimum pair distance is at least d_H +3. In this paper, we show that, for a given linear cyclic code with a minimum Hamming distance d_H, the minimum pair distance is at least d_H + (d_H/2). We then describe a decoding algorithm, based upon a bounded distance decoder for the cyclic code, whose symbol-pair error correcting capabilities reflect the larger minimum pair distance. Finally, we consider the case where the read channel output is a larger number, b ≥3, of consecutive symbols, and we provide extensions of several concepts, results, and code constructions to this setting.https://authors.library.caltech.edu/records/g86g2-92h26Systematic Error-Correcting Codes for Permutations and Multi-Permutations
https://resolver.caltech.edu/CaltechAUTHORS:20160825-142242242
Authors: Buzaglo, Sarit; Yaakobi, Eitan; Etzion, Tuvi; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2016.2543739
Multi-permutations and in particular permutations appear in various applications in an information theory. New applications, such as rank modulation for flash memories, have suggested the need to consider error-correcting codes for multi-permutations. In this paper, we study systematic error-correcting codes for multi-permutations in general and for permutations in particular. For a given number of information symbols k, and for any integer t, we present a construction of (k+r,k)systematic t-error-correcting codes, for permutations of length k+r, where the number of redundancy symbols r is relatively small. In particular, for a given t and for sufficiently large k, we obtain r=t+1, while a lower bound on the number of redundancy symbols is shown to be t. The same construction is also applied to obtain related systematic error-correcting codes for any types of multi-permutations.https://authors.library.caltech.edu/records/s703z-6bk56Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes
https://resolver.caltech.edu/CaltechAUTHORS:20160622-104849133
Authors: En Gad, Eyal; Li, Yue; Kliewer, Jörg; Langberg, Michael; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2016.2539967
We propose efficient coding schemes for two communication settings: 1) asymmetric channels and 2) channels with an informed encoder. These settings are important in non-volatile memories, as well as optical and broadcast communication. The schemes are based on non-linear polar codes, and they build on and improve recent work on these settings. In asymmetric channels, we tackle the exponential storage requirement of previously known schemes that resulted from the use of large Boolean functions. We propose an improved scheme that achieves the capacity of asymmetric channels with polynomial computational complexity and storage requirement. The proposed non-linear scheme is then generalized to the setting of channel coding with an informed encoder using a multicoding technique. We consider specific instances of the scheme for flash memories that incorporate error-correction capabilities together with rewriting. Since the considered codes are non-linear, they eliminate the requirement of previously known schemes (called polar write-once-memory codes) for shared randomness between the encoder and the decoder. Finally, we mention that the multicoding scheme is also useful for broadcast communication in Marton's region, improving upon previous schemes for this setting.https://authors.library.caltech.edu/records/hynhb-7fv07Explicit Minimum Storage Regenerating Codes
https://resolver.caltech.edu/CaltechAUTHORS:20160930-131654865
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2016.2553675
In distributed storage, a file is stored in a set of nodes and protected by erasure-correcting codes. Regenerating code is a type of code with two properties: first, it can reconstruct the entire file in the presence of any r node erasures for some specified integer r; second, it can efficiently repair an erased node from any subset of remaining nodes with a given size. In the repair process, the amount of information transmitted from each node normalized by the storage size per node is termed repair bandwidth (fraction). When the storage size per node is minimized, the repair bandwidth is lower bounded by 1/r, where r is the number of parity nodes. A code attaining this lower bound is said to have optimal repair. We consider codes with minimum storage size per node and optimal repair, called minimum storage regenerating (MSR) codes. In particular, if an MSR code has r parities and any r erasures occur, then by transmitting all the information from the remaining nodes, the original file can be reconstructed. On the other hand, if only one erasure occurs, only a fraction of 1/r of the information in each remaining node needs to be transmitted. If we view each node as a vector or a column over some field, then the code forms a 2-D array. Given the length of the column l and the number of parities r, we explicitly construct the high-rate MSR codes. The number of systematic nodes of our construction is (r + 1) log_rl, which is longer than previously known results. Besides, we construct the MSR codes with other desirable properties: first, the codes with low complexity when the information is updated, and second, the codes with low access or storage node I/O cost during repair.https://authors.library.caltech.edu/records/17jmt-mh471Approximate sorting of data streams with limited storage
https://resolver.caltech.edu/CaltechAUTHORS:20161202-085415227
Authors: Farnoud (Hassanzadeh), Farzad; Yaakobi, Eitan; Bruck, Jehoshua
Year: 2016
DOI: 10.1007/s10878-015-9930-6
We consider the problem of approximate sorting of a data stream (in one pass) with limited internal storage where the goal is not to rearrange data but to output a permutation that reflects the ordering of the elements of the data stream as closely as possible. Our main objective is to study the relationship between the quality of the sorting and the amount of available storage. To measure quality, we use permutation distortion metrics, namely the Kendall tau, Chebyshev, and weighted Kendall metrics, as well as mutual information, between the output permutation and the true ordering of data elements. We provide bounds on the performance of algorithms with limited storage and present a simple algorithm that asymptotically requires a constant factor as much storage as an optimal algorithm in terms of mutual information and average Kendall tau distortion. We also study the case in which only information about the most recent elements of the stream is available. This setting has applications to learning user preference rankings in services such as Netflix, where items are presented to the user one at a time.https://authors.library.caltech.edu/records/qqdg7-0na97Communication Efficient Secret Sharing
https://resolver.caltech.edu/CaltechAUTHORS:20161011-150403696
Authors: Huang, Wentao; Langberg, Michael; Kliewer, Joerg; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2016.2616144
A secret sharing scheme is a method to store information securely and reliably. Particularly, in a threshold secret sharing scheme, a secret is encoded into n shares, such that any set of at least t_1 shares suffice to decode the secret, and any set of at most t_2 < t_1 shares reveal no information about the secret. Assuming that each party holds a share and a user wishes to decode the secret by receiving information from a set of parties; the question we study is how to minimize the amount of communication between the user and the parties. We show that the necessary amount of communication, termed "decoding bandwidth", decreases as the number of parties that participate in decoding increases. We prove a tight lower bound on the decoding bandwidth, and construct secret sharing schemes achieving the bound. Particularly, we design a scheme that achieves the optimal decoding bandwidth when d parties participate in decoding, universally for all t_1 ≤ d ≤ n. The scheme is based on a generalization of Shamir's secret sharing scheme and preserves its simplicity and efficiency. In addition, we consider the setting of secure distributed storage where the proposed communication efficient secret sharing schemes not only improve decoding bandwidth but further improve disk access complexity during decoding.https://authors.library.caltech.edu/records/qtvry-hmz07Optimal Rebuilding of Multiple Erasures in MDS Codes
https://resolver.caltech.edu/CaltechAUTHORS:20170119-080421044
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/TIT.2016.2633411
Maximum distance separable (MDS) array codes are widely used in storage systems due to their computationally efficient encoding and decoding procedures. An MDS code with r redundancy nodes can correct any r node erasures by accessing (reading) all the remaining information in the surviving nodes. However, in practice, e erasures are a more likely failure event, for some 1≤ehttps://authors.library.caltech.edu/records/bk2bc-4f252Switch Codes: Codes for Fully Parallel Reconstruction
https://resolver.caltech.edu/CaltechAUTHORS:20170315-151626957
Authors: Wang, Zhiying; Kiah, Han Mao; Cassuto, Yuval; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/TIT.2017.2664867
Network switches and routers scale in rate by distributing the packet read/write operations across multiple memory banks. Rate scaling is achieved so long as sufficiently many packets can be written and read in parallel. However, due to the non-determinism of the read process, parallel pending read requests may contend on memory banks, and thus significantly lower the switching rate. In this paper, we provide a constructive study of codes that guarantee fully parallel data reconstruction without contention. We call these codes "switch codes," and construct three optimal switch-code families with different parameters. All the constructions use only simple XOR-based encoding and decoding operations, an important advantage when operated in ultra-high speeds. Switch codes achieve their good performance by spanning simultaneous disjoint local-decoding sets for all their information symbols. Switch codes may be regarded as an extreme version of the previously studied batch codes, where the switch version requires parallel reconstruction of all the information symbols.https://authors.library.caltech.edu/records/323t5-36794Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms
https://resolver.caltech.edu/CaltechAUTHORS:20170330-092251690
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/TIT.2017.2688361
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present two families of codes for correcting errors due to tandem duplications of a fixed length; the first family can correct any number of errors while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant k , where we are primarily focused on the cases of k=2,3 . Finally, we provide a full classification of the sets of lengths allowed in tandem duplication that result in a unique root for all sequences.https://authors.library.caltech.edu/records/fwh9p-xx682Capacity and Expressiveness of Genomic Tandem Duplication
https://resolver.caltech.edu/CaltechAUTHORS:20170719-165632439
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/TIT.2017.2728079
The majority of the human genome consists of repeated sequences. An important type of repeated sequences common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTCGC, TGTG is a tandem repeat, that may be generated from AGTCTGC by a tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a seed, i.e. a small initial string, by tandem duplications of bounded length. We study the capacity of such a system, a notion that quantifies the system's generating power. Our results include exact capacity values for certain tandem duplication string systems. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system as the capability of expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. In particular, based on a celebrated result by Axel Thue from 1906, presenting a construction for ternary squarefree sequences, we show that for alphabets of size 4 or larger, bounded tandem duplication systems, regardless of the seed and the bound on duplication length, are not fully expressive, i.e. they cannot generate all strings even as substrings of other strings. Note that the alphabet of size 4 is of particular interest as it pertains to the genomic alphabet. Building on this result, we also show that these systems do not have full capacity. In general, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems.https://authors.library.caltech.edu/records/f06ka-6gp96Duplication Distance to the Root for Binary Sequences
https://resolver.caltech.edu/CaltechAUTHORS:20170726-162754925
Authors: Alon, Noga; Bruck, Jehoshua; Farnoud (Hassanzadeh), Farzad; Jain, Siddharth
Year: 2017
DOI: 10.1109/TIT.2017.2730864
We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form x = abc → y = abbc, where x and y are sequences and a, b, and c are their substrings, needed to generate a binary sequence of length n starting from a square-free sequence from the set {0, 1, 01, 10, 010, 101}. This problem is a restricted case of finding the duplication/deduplication distance between two sequences, defined as the minimum number of duplication and deduplication operations required to transform one sequence to the other. We consider both exact and approximate tandem duplications. For exact duplication, denoting the maximum distance to the root of a sequence of length n by f(n), we prove that f(n) = Θ(n). For the case of approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show that the maximum distance has a sharp transition from linear in n to logarithmic at β = 1/2. We also study the duplication distance to the root for the set of sequences arising from a given root and for special classes of sequences, namely, the De Bruijn sequences, the Thue-Morse sequence, and the Fibonacci words. The problem is motivated by genomic tandem duplication mutations and the smallest number of tandem duplication events required to generate a given biological sequence.https://authors.library.caltech.edu/records/vnqxp-zs828Probabilistic switching circuits in DNA
https://resolver.caltech.edu/CaltechAUTHORS:20180117-072812871
Authors: Wilhelm, Daniel; Bruck, Jehoshua; Qian, Lulu
Year: 2018
DOI: 10.1073/pnas.1715926115
PMCID: PMC5798357
A natural feature of molecular systems is their inherent stochastic behavior. A fundamental challenge related to the programming of molecular information processing systems is to develop a circuit architecture that controls the stochastic states of individual molecular events. Here we present a systematic implementation of probabilistic switching circuits, using DNA strand displacement reactions. Exploiting the intrinsic stochasticity of molecular interactions, we developed a simple, unbiased DNA switch: An input signal strand binds to the switch and releases an output signal strand with probability one-half. Using this unbiased switch as a molecular building block, we designed DNA circuits that convert an input signal to an output signal with any desired probability. Further, this probability can be switched between 2^n different values by simply varying the presence or absence of n distinct DNA molecules. We demonstrated several DNA circuits that have multiple layers and feedback, including a circuit that converts an input strand to an output strand with eight different probabilities, controlled by the combination of three DNA molecules. These circuits combine the advantages of digital and analog computation: They allow a small number of distinct input molecules to control a diverse signal range of output molecules, while keeping the inputs robust to noise and the outputs at precise values. Moreover, arbitrarily complex circuit behaviors can be implemented with just a single type of molecular building block.https://authors.library.caltech.edu/records/9wjw1-zhb03Estimation of duplication history under a stochastic model for tandem repeats
https://resolver.caltech.edu/CaltechAUTHORS:20190211-084750757
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2019
DOI: 10.1186/s12859-019-2603-1
Background: Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have led to their formation. The ability to extract this information can enhance our understanding of evolutionary mechanisms.
Results: We present a stochastic model for the formation of tandem repeats via tandem duplication and substitution mutations. Based on the analysis of this model, we develop a method for estimating the relative mutation rates of duplications and substitutions, as well as the total number of mutations, in the history of a tandem repeat sequence. We validate our estimation method via Monte Carlo simulation and show that it outperforms the state-of-the-art algorithm for discovering the duplication history. We also apply our method to tandem repeat sequences in the human genome, where it demonstrates the different behaviors of micro- and mini-satellites and can be used to compare mutation rates across chromosomes. It is observed that chromosomes that exhibit the highest mutation activity in tandem repeat regions are the same as those thought to have the highest overall mutation rates. However, unlike previous works that rely on comparing human and chimpanzee genomes to measure mutation rates, the proposed method allows us to find chromosomes with the highest mutation activity based on a single genome, in essence by comparing (approximate) copies of the pattern in tandem repeats.
Conclusion: The prevalence of tandem repeats in most organisms and the efficiency of the proposed method enable studying various aspects of the formation of tandem repeats and the surrounding sequences in a wide range of settings.https://authors.library.caltech.edu/records/wxrqy-7c610On the Uncertainty of Information Retrieval in Associative Memories
https://resolver.caltech.edu/CaltechAUTHORS:20181101-121348346
Authors: Yaakobi, Eitan; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/tit.2018.2878750
We (people) are memory machines. Our decision processes, emotions, and interactions with the world around us are based on and driven by associations to our memories. This natural association paradigm will become critical in future memory systems, namely, the key question will not be "How do I store more information?" but rather, "Do I have the relevant information? How do I retrieve it?"
The focus of this paper is to make a first step in this direction. We define and solve a very basic problem in associative retrieval. Given a word W, the words in the memory that are t-associated with W are the words in the ball of radius t around W. In general, given a set of words, say W, X and Y, the words that are t-associated with {W, X, Y} are those in the memory that are within distance t from all the three words. Our main goal is to study the maximum size of the t-associated set as a function of the number of input words and the minimum distance of the words in memory - we call this value the uncertainty of an associative memory. In this work we consider the Hamming distance and derive the uncertainty of the associative memory that consists of all the binary vectors with an arbitrary number of input words. In addition, we study the retrieval problem, namely, how do we get the t-associated set given the inputs? We note that this paradigm is a generalization of the sequences reconstruction problem that was proposed by Levenshtein (2001). In this model, a word is transmitted over multiple channels. A decoder receives all the channel outputs and decodes the transmitted word. Levenshtein computed the minimum number of channels that guarantee a successful decoder - this value happens to be the uncertainty of an associative memory with two input words.https://authors.library.caltech.edu/records/w6ppr-j3j45The Entropy Rate of Some Pólya String Models
https://resolver.caltech.edu/CaltechAUTHORS:20190829-100210948
Authors: Elishco, Ohad; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/tit.2019.2936556
We study random string-duplication systems, which we call Pólya string models. These are motivated by a class of mutations that are common in most organisms and lead to an abundance of repeated sequences in their genomes. Unlike previous works that study the combinatorial capacity of string-duplication systems, or in a probabilistic setting, various string statistics, this work provides the exact entropy rate or bounds on it, for several probabilistic models. The entropy rate determines the compressibility of the resulting sequences, as well as quantifying the amount of sequence diversity that these mutations can create. In particular, we study the entropy rate of noisy string-duplication systems, including the tandem-duplication, end-duplication, and interspersed-duplication systems, where in all cases we study duplication of length 1 only. Interesting connections are drawn between some systems and the signature of random permutations, as well as to the beta distribution common in population genetics.https://authors.library.caltech.edu/records/134ne-9h832Two Deletion Correcting Codes from Indicator Vectors
https://resolver.caltech.edu/CaltechAUTHORS:20191031-124926783
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/tit.2019.2950290
Construction of capacity achieving deletion correcting codes has been a baffling challenge for decades. A recent breakthrough by Brakensiek et al ., alongside novel applications in DNA storage, have reignited the interest in this longstanding open problem. In spite of recent advances, the amount of redundancy in existing codes is still orders of magnitude away from being optimal. In this paper, a novel approach for constructing binary two-deletion correcting codes is proposed. By this approach, parity symbols are computed from indicator vectors (i.e., vectors that indicate the positions of certain patterns) of the encoded message, rather than from the message itself. Most interestingly, the parity symbols and the proof of correctness are a direct generalization of their counterparts in the Varshamov-Tenengolts construction. Our techniques require 7log(n)+o(log(n)) redundant bits to encode an n-bit message, which is closer to optimal than previous constructions. Moreover, the encoding and decoding algorithms have O(n) time complexity.https://authors.library.caltech.edu/records/e5f1z-nwq17Evolution of k-mer Frequencies and Entropy in Duplication and Substitution Mutation Systems
https://resolver.caltech.edu/CaltechAUTHORS:20191004-142813980
Authors: Lou, Hao; Schwartz, Moshe; Bruck, Jehoshua; Farnoud (Hassanzadeh), Farzad
Year: 2020
DOI: 10.1109/TIT.2019.2946846
Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processes, designing tractable stochastic models and analyzing them are challenging. In this paper, we study two kinds of systems, each representing a set of mutations. In the first system, tandem duplications and substitution mutations are allowed and in the other, interspersed duplications. We provide stochastic models and, via stochastic approximation, study the evolution of substring frequencies for these two systems separately. Specifically, we show that k-mer frequencies converge almost surely and determine the limit set. Furthermore, we present a method for finding upper bounds on entropy for such systems.https://authors.library.caltech.edu/records/5ywab-5d764On Coding over Sliced Information
https://resolver.caltech.edu/CaltechAUTHORS:20210315-103437288
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/tit.2021.3063709
The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal up to constants. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is order-wise equivalent to the amount required in the classical error correcting paradigm.https://authors.library.caltech.edu/records/w65dp-23509On Optimal k-Deletion Correcting Codes
https://resolver.caltech.edu/CaltechAUTHORS:20201008-083807800
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/TIT.2020.3028702
Levenshtein introduced the problem of constructing k-deletion correcting codes in 1966, proved that the optimal redundancy of those codes is O(k log N) for constant k, and proposed an optimal redundancy single-deletion correcting code (using the so-called VT construction). However, the problem of constructing optimal redundancy k-deletion correcting codes remained open. Our key contribution is a major step towards a complete solution to this longstanding open problem for constant k. We present a k-deletion correcting code that has redundancy 8k log N + o(log N) when k = o(√log log N) and encoding/decoding algorithms of complexity O(n^(2k+1)).https://authors.library.caltech.edu/records/2b9rp-5tt90Glioblastoma signature in the DNA of blood-derived cells
https://resolver.caltech.edu/CaltechAUTHORS:20211007-150341511
Authors: Jain, Siddharth; Mazaheri, Bijan; Raviv, Netanel; Bruck, Jehoshua
Year: 2021
DOI: 10.1371/journal.pone.0256831
PMCID: PMC8425531
Current approach for the detection of cancer is based on identifying genetic mutations typical to tumor cells. This approach is effective only when cancer has already emerged, however, it might be in a stage too advanced for effective treatment. Cancer is caused by the continuous accumulation of mutations; is it possible to measure the time-dependent information of mutation accumulation and predict the emergence of cancer? We hypothesize that the mutation history derived from the tandem repeat regions in blood-derived DNA carries information about the accumulation of the cancer driver mutations in other tissues. To validate our hypothesis, we computed the mutation histories from the tandem repeat regions in blood-derived exomic DNA of 3874 TCGA patients with different cancer types and found a statistically significant signal with specificity ranging from 66% to 93% differentiating Glioblastoma patients from other cancer patients. Our approach and findings offer a new direction for future cancer prediction and early cancer detection based on information derived from blood-derived DNA.https://authors.library.caltech.edu/records/a4n5r-1f829Generator based approach to analyze mutations in genomic datasets
https://resolver.caltech.edu/CaltechAUTHORS:20200728-093329251
Authors: Jain, Siddharth; Xiao, Xiongye; Bogdan, Paul; Bruck, Jehoshua
Year: 2021
DOI: 10.1038/s41598-021-00609-8
PMCID: PMC8548350
In contrast to the conventional approach of directly comparing genomic sequences using sequence alignment tools, we propose a computational approach that performs comparisons between sequence generators. These sequence generators are learned via a data-driven approach that empirically computes the state machine generating the genomic sequence of interest. As the state machine based generator of the sequence is independent of the sequence length, it provides us with an efficient method to compute the statistical distance between large sets of genomic sequences. Moreover, our technique provides a fast and efficient method to cluster large datasets of genomic sequences, characterize their temporal and spatial evolution in a continuous manner, get insights into the locality sensitive information about the sequences without any need for alignment. Furthermore, we show that the technique can be used to detect local regions with mutation activity, which can then be applied to aid alignment techniques for the fast discovery of mutations. To demonstrate the efficacy of our technique on real genomic data, we cluster different strains of SARS-CoV-2 viral sequences, characterize their evolution and identify regions of the viral sequence with mutations.https://authors.library.caltech.edu/records/vemqv-kzm85Iterative Programming of Noisy Memory Cells
https://resolver.caltech.edu/CaltechAUTHORS:20220104-235424700
Authors: Horovitz, Michal; Yaakobi, Eitan; Gad, Eyal En; Bruck, Jehoshua
Year: 2022
DOI: 10.1109/tcomm.2021.3130660
In this paper, we study a model that mimics the programming operation of memory cells. This model was first introduced by Lastras-Montano et al. for continuous-alphabet channels, and later by Bunte and Lapidoth for discrete memoryless channels (DMC). Under this paradigm we assume that cells are programmed sequentially and individually. The programming process is modeled as transmission over a channel, such that it is possible to read the cell state in order to determine its programming success, and in case of programming failure, to reprogram the cell again. Reprogramming a cell can reduce the bit error rate, however this comes with the price of increasing the overall programming time and thereby affecting the writing speed of the memory. An iterative programming scheme is an algorithm which specifies the number of attempts to program each cell. Given the programming channel and constraints on the average and maximum number of attempts to program a cell, we study programming schemes which maximize the number of bits that can be reliably stored in the memory. We extend the results by Bunte and Lapidoth and study this problem when the programming channel is either discrete-input memoryless symmetric channel (including the BSC,BEC, BI-AWGN) or the Z channel. For the BSC and the BEC our analysis is also extended for the case where the error probabilities on consecutive writes are not necessarily the same. Lastly, we also study a related model which is motivated by the synthesis process of DNA molecules.https://authors.library.caltech.edu/records/7s6dj-4z436