Combined Feed
https://feeds.library.caltech.edu/people/Bruck-J/combined.rss
A Caltech Library Repository Feedhttp://www.rssboard.org/rss-specificationpython-feedgenenThu, 30 Nov 2023 19:05:20 +0000A generalized convergence theorem for neural networks
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit88
Authors: Bruck, Jehoshua; Goodman, Joseph W.
Year: 1988
DOI: 10.1109/18.21239
A neural network model is presented in which each neuron performs a threshold logic function. The model always converges to a stable state when operating in a serial mode and to a cycle of length at most 2 when operating in a fully parallel mode. This property is the basis for the potential applications of the model, such as associative memory devices and combinatorial optimization. The two convergence theorems (for serial and fully parallel modes of operation) are reviewed, and a general convergence theorem is presented that unifies the two known cases. New relations between the neural network model and the problem of finding a minimum cut in a graph are obtained.https://authors.library.caltech.edu/records/q014y-7yq74Some new EC/AUED codes
https://resolver.caltech.edu/CaltechAUTHORS:20120524-150825080
Authors: Bruck, Jehoshua; Blaum, Mario
Year: 1989
DOI: 10.1109/FTCS.1989.105568
A novel construction that differs from the traditional way of constructing systematic EC/AUED/(error-correcting/all unidirectional error-detecting) codes is presented. The usual method is to take a systematic t-error-correcting code and then append a tail so that the code can detect more than t errors when they are unidirectional. In the authors' construction, the t-error-correcting code is modified in such a way that the weight distribution of the original code is reduced. The authors then have to add a smaller tail. Frequently they have less redundancy than the best available systematic t-EC/AUED codes.https://authors.library.caltech.edu/records/9e9r1-rdr73Polynomial Threshold Elements?
https://resolver.caltech.edu/CaltechAUTHORS:20120524-151412678
Authors: Bruck, Jehoshua
Year: 1989
DOI: 10.1109/ITW.1989.761437https://authors.library.caltech.edu/records/6aary-38x86Neural networks, error-correcting codes, and polynomials over the binary n-cube
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit89
Authors: Bruck, Jehoshua; Blaum, Mario
Year: 1989
DOI: 10.1109/18.42215
Several ways of relating the concept of error-correcting codes to the concept of neural networks are presented. Performing maximum-likelihood decoding in a linear block error-correcting code is shown to be equivalent to finding a global maximum of the energy function of a certain neural network. Given a linear block code, a neural network can be constructed in such a way that every codeword corresponds to a local maximum. The connection between maximization of polynomials over the n-cube and error-correcting codes is also investigated; the results suggest that decoding techniques can be a useful tool for solving such maximization problems. The results are generalized to both nonbinary and nonlinear codes.https://authors.library.caltech.edu/records/rdz4x-z2r71Harmonic analysis of neural networks
https://resolver.caltech.edu/CaltechAUTHORS:20120524-090912809
Authors: Bruck, Jehoshua
Year: 1989
DOI: 10.1109/ACSSC.1989.1200767
Neural networks models have attracted a lot of
interest in recent years mainly because there
were perceived as a new idea for computing.
These models can be described as a network in
which every node computes a linear threshold
function. One of the main difficulties in analyzing
the properties of these networks is the fact
that they consist of nonlinear elements. I will
present a novel approach, based on harmonic
analysis of Boolean functions, to analyze neural
networks. In particular I will show how this
technique can be applied to answer the following
two fundamental questions (i) what is the computational
power of a polynomial threshold element
with respect to linear threshold elements?
(ii) Is it possible to get exponentially many spurious
memories when we use the outer-product
method for programming the Hopfield model?https://authors.library.caltech.edu/records/aw7d3-cej07The hardness of decoding linear codes with preprocessing
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit90b
Authors: Bruck, Jehoshua; Naor, Moni
Year: 1990
DOI: 10.1109/18.52484
The problem of maximum-likelihood decoding of linear block codes is known to be hard. The fact that the problem remains hard even if the code is known in advance, and can be preprocessed for as long as desired in order to device a decoding algorithm, is shown. The hardness is based on the fact that existence of a polynomial-time algorithm implies that the polynomial hierarchy collapses. Thus, some linear block codes probably do not have an efficient decoder. The proof is based on results in complexity theory that relate uniform and nonuniform complexity classes.https://authors.library.caltech.edu/records/fgb06-gdy28On the number of spurious memories in the Hopfield model
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit90a
Authors: Bruck, Jehoshua; Roychowdhury, Vwani P.
Year: 1990
DOI: 10.1109/18.52486
The outer-product method for programming the Hopfield model is discussed. The method can result in many spurious stable states-exponential in the number of vectors that are to be stored-even in the case when the vectors are orthogonal.https://authors.library.caltech.edu/records/5k7nn-rdz62Efficient algorithms for reconfiguration in VLSI/WSI arrays
https://resolver.caltech.edu/CaltechAUTHORS:ROYieeetc90
Authors: Roychowdhury, Vwani P.; Bruck, Jehoshua; Kailath, Thomas
Year: 1990
DOI: 10.1109/12.54841
The issue of developing efficient algorithms for reconfiguring processor arrays in the presence of faulty processors and fixed hardware resources is discussed. The models discussed consist of a set of identical processors embedded in a flexible interconnection structure that is configured in the form of a rectangular grid. An array grid model based on single-track switches is considered. An efficient polynomial time algorithm is proposed for determining feasible reconfigurations for an array with a given distribution of faulty processors. In the process, it is shown that the set of conditions in the reconfigurability theorem is not necessary. A polynomial time algorithm is developed for finding feasible reconfigurations in an augmented single-track model and in array grid models with multiple-track switcheshttps://authors.library.caltech.edu/records/d3v4c-rrn48Decoding the Golay code with Venn diagrams
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit90
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1990
DOI: 10.1109/18.53756
A decoding algorithm, based on Venn diagrams, for decoding the [23, 12, 7] Golay code is presented. The decoding algorithm is based on the design properties of the parity sets of the code. As for other decoding algorithms for the Golay code, decoding can be easily done by hand.https://authors.library.caltech.edu/records/88482-3cz58Fast arithmetic computing with neural networks
https://resolver.caltech.edu/CaltechAUTHORS:20120509-132855976
Authors: Siu, Kai-Yeung; Bruck, Jehoshua
Year: 1990
DOI: 10.1109/TENCON.1990.152559
The authors introduce a restricted model of a neuron which is more practical as a model of computation then the classical model of a neuron. The authors define a model of neural networks as a feedforward network of such neurons. Whereas any logic circuit of polynomial size (in n) that computes the product of two n-bit numbers requires unbounded delay, such computations can be done in a neural network with constant delay. The authors improve some known results by showing that the product of two n-bit numbers and sorting of n n-bit numbers can both be computed by a polynomial size neural network using only four unit delays, independent of n . Moreover, the weights of each threshold element in the neural networks require only O(log n)-bit (instead of n-bit) accuracy.https://authors.library.caltech.edu/records/f1tk4-a1h18Polynomial Threshold Functions, AC^0 Functions and Spectral Norms
https://resolver.caltech.edu/CaltechAUTHORS:20120425-065829076
Authors: Bruck, Jehoshua; Smolensky, Roman
Year: 1990
DOI: 10.1109/FSCS.1990.89585
The class of polynomial-threshold functions is studied using harmonic analysis, and the results are used to derive lower bounds related to AC^0 functions. A Boolean function is polynomial threshold if it can be represented as a sign function of a sparse polynomial (one that consists of a polynomial number of terms). The main result is that polynomial-threshold functions can be characterized by means of their spectral representation. In particular, it is proved that a Boolean function whose L_1 spectral norm is bounded by a polynomial in n is a polynomial-threshold function, and that a Boolean function whose L_∞^(-1) spectral norm is not bounded by a polynomial in n is not a polynomial-threshold function. Some results for AC^0 functions are derived.https://authors.library.caltech.edu/records/ebydz-kpv88On the Convergence Properties of the Hopfield Model
https://resolver.caltech.edu/CaltechAUTHORS:20120426-132042598
Authors: Bruck, Jehoshua
Year: 1990
DOI: 10.1109/5.58341
The main contribution of the present work is showing that the known convergence properties of the Hopfield model can be reduced to a very simple case, for which an elementary proof is provided. The convergence properties of the Hopfield model are dependent on the structure of the interconnections matrix W and the method by which the nodes are updated. Three cases are known: (1) convergence to a stable state when operating in a serial mode with symmetric W, (2) convergence to a cycle of length 2, at most, when operating in a fully parallel mode with symmetric W, and (3) convergence to a cycle of length 4 when operating in a fully parallel mode with antisymmetric W. The three known results are reviewed and it is proven that the fully parallel mode of operation is a special case of the serial model of operation. There are three more cases than can be considered using this characterization: serial mode of operation, antisymmetric W; serial mode of operation, arbitrary W; and fully parallel mode of operation, arbitrary W. By exhibiting exponential lower bounds on the length of the cycles in other cases, it is proven that the three known cases are the only interesting ones.https://authors.library.caltech.edu/records/m7hdv-r5t49Neural computation of arithmetic functions
https://resolver.caltech.edu/CaltechAUTHORS:20120503-090033553
Authors: Siu, Kai-Yeung; Bruck, Jehoshua
Year: 1990
DOI: 10.1109/5.58350
A neuron is modeled as a linear threshold gate, and the network architecture considered is the layered feedforward network. It is shown how common arithmetic functions such as multiplication and sorting can be efficiently computed in a shallow neural network. Some known results are improved by showing that the product of two n-bit numbers and sorting of n n-bit numbers can be computed by a polynomial-size neural network using only four and five unit delays, respectively. Moreover, the weights of each threshold element in the neural networks require O(log n)-bit (instead of n -bit) accuracy. These results can be extended to more complicated functions such as multiple products, division, rational functions, and approximation of analytic functions.https://authors.library.caltech.edu/records/e9eyf-cfj77On the Power of Threshold Circuits with Small Weights
https://resolver.caltech.edu/CaltechAUTHORS:20120424-103938302
Authors: Siu, Kai-Yeung; Bruck, Jehoshua
Year: 1991
DOI: 10.1109/ISIT.1991.695138
Linear threshold elements (LTEs) are the basic processing elements in artificial neural networks. An LTE computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually they can be very big integers-exponential in the number of input variables. However, in practice, it is very difficult to implement big weights. So the natural question that one can ask is whether there is an efficient way to simulate a network of LTEs with big weights by a network of LTEs with small weights. We prove the following results: 1) every LTE with big weights can be simulated by a depth-3, polynomial size network of LTEs with small weights, 2) every depth-d polynomial size network of LTEs with big weights can be simulated by a depth-(2d+1), polynomial size network of LTEs with small weights. To prove these results, we use tools from harmonic analysis of Boolean functions. Our technique is quite general, it provides insights to some other problems. For example, we were able to improve the best known results on the depth of a network of threshold elements that computes the COMPARISON, ADDITION and PRODUCT of two n-bits numbers, and the MAXIMUM and the SORTING of n n-bit numbers.https://authors.library.caltech.edu/records/cfm27-k7g38New Techniques For Constructing EC/AUED Codes
https://resolver.caltech.edu/CaltechAUTHORS:20120418-090627310
Authors: Bruck, Jehoshua; Blaum, Mario
Year: 1991
DOI: 10.1109/ISIT.1991.695194
We present two new techniques for constructing t-EC/AUED codes. The combination of the two techniques reduces the total redundancy of the best constructions by one bit or more in many cases.https://authors.library.caltech.edu/records/10mas-ed427Harmonic Analysis And The Complexity Of Computing With Threshold (Neural) Elements
https://resolver.caltech.edu/CaltechAUTHORS:20120417-094637010
Authors: Bruck, Jehoshua; Smolensky, Roman
Year: 1991
DOI: 10.1109/ISIT.1991.695142
The main purpose of this talk is to introduce a
useful tool for the analysis of discrete neural networks
in which every node is a Boolean threshold
gate. The difficulty in the analysis of neural
networks arises from the fact that the basic
processing elements (linear threshold gates) are
nonlinear. The key idea in harmonic analysis
of threshold functions is to represent the functions
as polynomials over the field of real numbers.
Answering different questions regarding
neural networks becomes equivalent to answering
questions related to the coefficients of these
polynomials. We have applied these techniques
and obtained many interesting and surprising results
[1, 2, 3, 4]. The focus of this talk will
be on presenting a theorem that characterizes-using
spetral norms-the complexity of computing
a Boolean function with threshold circuits
[2, 3]. This result establishes the first known link
between harmonic analysis and the complexity of
computing with neural networks.https://authors.library.caltech.edu/records/34v3g-9e396Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs
https://resolver.caltech.edu/CaltechAUTHORS:ALOisit91
Authors: Alon, Noga; Bruck, Jehoshua; Naor, Joseph; Naor, Moni; Roth, Ron M.
Year: 1991
A new technique, based on the pseudo-random properties of certain graphs, known as expanders, is used to obtain new simple explicit constructions of asymptotically good codes.https://authors.library.caltech.edu/records/erbp5-vqp12Fault-tolerant meshes with minimal numbers of spares
https://resolver.caltech.edu/CaltechAUTHORS:BRUispdp91
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1991
DOI: 10.1109/SPDP.1991.218267
This paper presents several techniques for adding fault-tolerance to distributed memory parallel computers. More formally, given a target graph with n nodes, we create a fault-tolerant graph with n + k nodes such that given any set of k or fewer faulty nodes, the remaining graph is guaranteed to contain the target graph as a fault-free subgraph. As a result, any algorithm designed for the target graph will run with no slowdown in the presence of k or fewer node faults, regardless of their distribution. We present fault-tolerant graphs for target graphs which are 2-dimensional meshes, tori, eight-connected meshes and hexagonal meshes. In all cases our fault-tolerant graphs have smaller degree than any previously known graphs with the same properties.https://authors.library.caltech.edu/records/6475m-mqy50Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs
https://resolver.caltech.edu/CaltechAUTHORS:ALOieeetit92
Authors: Alon, Noga; Bruck, Jehoshua; Naor, Joseph; Naor, Moni; Roth, Ron M.
Year: 1992
DOI: 10.1109/18.119713
A novel technique, based on the pseudo-random properties of certain graphs known as expanders, is used to obtain novel simple explicit constructions of asymptotically good codes. In one of the constructions, the expanders are used to enhance Justesen codes by replicating, shuffling, and then regrouping the code coordinates. For any fixed (small) rate, and for a sufficiently large alphabet, the codes thus obtained lie above the Zyablov bound. Using these codes as outer codes in a concatenated scheme, a second asymptotic good construction is obtained which applies to small alphabets (say, GF(2)) as well. Although these concatenated codes lie below the Zyablov bound, they are still superior to previously known explicit constructions in the zero-rate neighborhood.https://authors.library.caltech.edu/records/750gt-mvs87Tolerating faults in hypercubes using subcube partitioning
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc92a
Authors: Bruck, Jehoshua; Cypher, Robert; Soroker, Danny
Year: 1992
DOI: 10.1109/12.142686
We examine the issue of running algorithms on a hypercube which has both node and edge faults, and we assume a worst case distribution of the faults. We prove that for any constant c, an n-dimensional hypercube (n-cube) with n^c faulty components contains a fault-free subgraph that can implement a large class of hypercube algorithms with only a constant factor slowdown. In addition, our approach yields practical implementations for small numbers of faults. For example, we show that any regular algorithm can be implemented on an n-cube that has at most n-1 faults with slowdowns of at most 2 for computation and at most 4 for communication.
To the best of our knowledge this is the first result showing that an n-cube can tolerate more than O(n) arbitrarily placed faults with a constant factor slowdown.https://authors.library.caltech.edu/records/ygryd-nqj63New techniques for constructing EC/AUED codes
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc92b
Authors: Bruck, Jehoshua; Blaum, Mario
Year: 1992
DOI: 10.1109/12.166607
The most common method to construct a t-error correcting/all unidirectional error detecting (EC/AUED) code is to choose a t-error correcting (EC) code and then to append a tail in such a way that the new code can detect more than t errors when they are unidirectional. The tail is a function of the weight of the codeword.
We present two new techniques for constructing t-EC/AUED codes. The first technique modifies the t-EC code in such a way that the weight distribution of the original code is reduced. So, a smaller tail is needed. Frequently, this technique gives less overall redundancy than the best
available t-EC/AUED codes.https://authors.library.caltech.edu/records/4qk7p-k9813Fault tolerant graphs, perfect hash functions and disjoint paths
https://resolver.caltech.edu/CaltechAUTHORS:ATJfocs92
Authors: Ajtai, M.; Alon, N.; Bruck, J.; Cypher, R.; Ho, C.T.; Naor, M.; Szemerédi, E.
Year: 1992
DOI: 10.1109/SFCS.1992.267781
Given a graph G on n nodes the authors say that a graph T on n + k nodes is a k-fault tolerant version of G, if one can embed G in any n node induced subgraph of T. Thus T can sustain k faults and still emulate G without any performance degradation. They show that for a wide range of values of n, k and d, for any graph on n nodes with maximum degree d there is a k-fault tolerant graph with maximum degree O(kd). They provide lower bounds as well: there are graphs G with maximum degree d such that any k-fault tolerant version of them has maximum degree at least Ω(d√k)https://authors.library.caltech.edu/records/nbkty-71x39Tolerating faults in a mesh with a row of spare nodes
https://resolver.caltech.edu/CaltechAUTHORS:BRUispdp92a
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1992
DOI: 10.1109/SPDP.1992.242768
We present an efficient method for tolerating faults in a two-dimensional mesh architecture. Our approach is based on adding spare components (nodes) and extra links (edges) such that the resulting architecture can be reconfigured as a mesh in the presence of faults. We optimize the cost of the fault-tolerant mesh architecture by adding about one row of redundant nodes in addition to a set of k spare nodes (while tolerating up to k node faults) and minimizing the number of links per node. Our results are surprisingly efficient and seem to be practical for small values of k. The degree of the fault-tolerant architecture is k + 5 for odd k, and k + 6 for even k. Our results can be generalized to d-dimensional meshes such that the number of spare nodes is less than the length of the shortest axis plus k, and the degree of the fault-tolerant mesh is (d-1)k+d+3 when k is odd and (d-1)k+2d+2 when k is even.https://authors.library.caltech.edu/records/5ngvx-9te20Multiple message broadcasting with generalized Fibonacci trees
https://resolver.caltech.edu/CaltechAUTHORS:BRUispdp92b
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1992
DOI: 10.1109/SPDP.1992.242714
We present efficient algorithms for broadcasting multiple messages. We assume n processors, one of which contains m packets that it must broadcast to each of the remaining n - 1 processors. The processors communicate in rounds. In one round each processor is able to send one packet to any other processor and receive one packet from any other processor. We give a broadcasting algorithm which requires m + log n + 3 log log n + 15 rounds. In addition, we show a simple lower bound of m +[log n] - 1 rounds for broadcasting in this model.https://authors.library.caltech.edu/records/qhpp8-43j98Coding for skew-tolerant parallel asynchronous communications
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit93a
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1993
DOI: 10.1109/18.212269
A communication channel consisting of several subchannels transmitting simultaneously and asynchronously is considered, an example being a board with several chips, where the subchannels are wires connecting the chips and differences in the lengths of the wires can result in asynchronous reception. A scheme that allows transmission without an acknowledgment of the message, therefore permitting pipelined communication and providing a higher bandwidth, is described. The scheme allows a certain number of transitions from a second message to arrive before reception of the current message has been completed, a condition called skew. Necessary and sufficient conditions for codes that can detect skew as well as for codes that are skew-tolerant, i.e. can correct the skew and allow continuous operation, are derived. Codes that satisfy the necessary and sufficient conditions are constructed, their optimality is studied, and efficient decoding algorithms are devised. Potential applications of the scheme are in on-chip, on-board, and board to board communications, enabling much higher communication bandwidth.https://authors.library.caltech.edu/records/csasz-fbr13Depth Efficient Neural Networks for Division and Related Problems
https://resolver.caltech.edu/CaltechAUTHORS:20120309-113620511
Authors: Siu, Kai-Yeung; Bruck, Jehoshua; Kailath, Thomas; Hofmeister, Thomas
Year: 1993
DOI: 10.1109/18.256501
An artificial neural network (ANN) is commonly modeled by a threshold circuit, a network of interconnected processing units called linear threshold gates. The depth of a circuit represents the number of unit delays or the time for parallel computation. The size of a circuit is the number of
gates and measures the amount of hardware. It was known
that traditional logic circuits consisting of only unbounded fan-in AND, OR, NOT gates would require at least Ω(log n/log log n) depth to compute common arithmetic functions such as the product or the quotient of two n-bit numbers, if the circuit size is polynomially bounded (in n). It is shown that ANN'S can be much more powerful than traditional logic circuits, assuming that each threshold gate can be built with a cost that is comparable to that of AND/OloRg ic gates. In particular, the main results show
that powering and division can be computed by polynomial-size ANN'S of depth 4, and multiple product can be computed by polynomial-size ANN'S of depth 5. Moreover, using the techniques developed here, a previous result can be improved by showing that the sorting of n n-bit numbers can be carried out in a depth-3 polynomial size ANN. Furthermore, it is shown that the sorting network is optimal in depth.https://authors.library.caltech.edu/records/qprpj-d1a77Unordered Error-Correcting Codes and their Applications
https://resolver.caltech.edu/CaltechAUTHORS:20120309-145816573
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1993
DOI: 10.1109/FTCS.1992.243585
We give efficient constructions for error correcting
unordered {ECU) codes, i.e., codes such that any
pair of codewords are at a certain minimal distance
apart and at the same time they are unordered. These
codes are used for detecting a predetermined number
of (symmetric) errors and for detecting all unidirectional
errors. We also give an application in parallel
asynchronous communications.https://authors.library.caltech.edu/records/pka3c-chr23Fault-tolerant meshes and hypercubes with minimal numbers of spares
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc93a
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1993
DOI: 10.1109/12.241598
Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two- and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single faulty processor or communication link can seriously affect the performance of these machines.
This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. Our approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. We optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k = 1, we present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. We also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, we give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes.https://authors.library.caltech.edu/records/tgzxt-6qy80Constructions of skew-tolerant and skew-detecting codes
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit93b
Authors: Blaum, Mario; Bruck, Jehoshua; Khachatrian, Levon H.
Year: 1993
DOI: 10.1109/18.259671
The paradigm of skew-tolerant parallel asynchronous communication was introduced by Blaum and Bruck (see ibid., vol. 39, 1993) along with constructions for codes that can tolerate or detect skew. Some of these constructions were improved by Khachatrian (1991). In this paper these constructions are improved upon further, and the authors prove that the new constructions are, in a certain sense, optimal.https://authors.library.caltech.edu/records/2msng-t0y47Performance Optimization of Checkpointing Schemes with Task Duplication
https://resolver.caltech.edu/CaltechPARADISE:1994.ETR004
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1994
Checkpointing schemes enable fault-tolerant parallel and distributed computing by leveraging
the redundancy in hardware and software resources. In these systems, checkpointing serves two
purposes: it helps in detecting faults by comparing the processors states at checkpoints, and
it facilitates the reduction of fault recovery time by supplying a safe point to rollback to. The
efficiency of checkpointing schemes is influenced by the time it takes to perform the comparisons
and to store the states. The fact that checkpoints consist of both storing of states and comparison
between states, with conflicting objectives regarding the frequency of those operations, limits
the performance of current checkpointing schemes.
In this paper we show that by tuning the checkpointing schemes to a given architecture, a
significant reduction in the execution time can be achieved. We will present both analytical
results and experimental results that were obtained on a cluster of workstations and a parallel
computer.
The main idea is to use two types of checkpoints: compare-checkpoints (comparing the states
of the redundant processes to detect faults) and store-checkpoints (storing the states to reduce
recovery time). With two types of checkpoints, we can use both the comparison and storage
operations in an efficient way and improve the performance of checkpointing schemes. As a
particular example of this approach we analyzed the DMR checkpointing scheme with store
and compare checkpoints on two types of architectures, one where the comparison time is much
higher than the store time (like a cluster of workstations connected by a LAN) and one where the
store time is much higher than the comparison time (like the Intel Paragon supercomputer). We
have implemented a prototype of the new DMR schemes and run it on workstations connected by
a LAN and on the Intel Paragon supercomputer. The experimental results we obtained match
the analytical results and show that in some cases the overhead of the DMR checkpointing
schemes on both architectures can be improved by as much as 40%.https://authors.library.caltech.edu/records/j8hc8-cfd93Fault-Tolerant Meshes with Small Degree
https://resolver.caltech.edu/CaltechPARADISE:1994.ETR001
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1994
This paper presents constructions for fault-tolerant two-dimensional mesh
architectures. The constructions are designed to tolerate k faults while maintaining a healthy n
by n mesh as a subgraph. They utilize several novel techniques for obtaining trade-offs
between the number of spare nodes and the degree of the fault-tolerant network.
We consider both worst-case and random fault distributions. In terms of worst-case faults, we give a construction that has constant degree and O(k to the power of 3) spare nodes. This is
the first construction known in which the degree is constant and the number of spare
nodes is independent of n. In terms of random faults, we present several new degree-6
and degree-8 constructions and show (both analytically and through simulations) that
they can tolerate large numbers of randomly placed faults.https://authors.library.caltech.edu/records/nc7m6-vay05Analysis of Checkpointing Schemes for Multiprocessor Systems
https://resolver.caltech.edu/CaltechPARADISE:1994.ETR003
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1994
Parallel computing systems provide hardware redundancy that helps to achieve low cost fault-
tolerance. Fault-tolerance is achieved, in those systems, by duplicating the task into more than one
processor, and comparing the states of the processors at checkpoints. Many schemes that achieve
fault tolerance exist, and most of them use checkpointing to reduce the time spent retrying a task.
Performance evaluation for most of the schemes either relies on simulation results, or uses a simplified
fault model.
This paper suggests a novel technique, based on a Markov Reward Model (MRM), for analyzing the
performance of checkpointing schemes for fault-tolerance. We show how this technique can be used to
derive the average execution time of a task and other important parameters related to the performance
of checkpointing schemes. Our analytical results match well the values we obtained using a simulation
program.
We compare the average task completion time and total work of four checkpointing schemes, TMR,
DMR-B-2, DMR-F-1 and RFCS. We show that generally increasing the number of processors reduces
the average completion time, but increases the total work done by the processors. Namely, the TMR
scheme, which uses three processors, is the quickest but does the most work, while the DMR-B-2 scheme,
which uses only two processors, is the slowest of the four schemes but does the least work. However,
in cases where there is a big difference between the time it takes to perform different operations, those
results can change. For example, when we assume that the schemes are implemented on workstations
connected by a LAN and the time to move data between workstations is relatively long, the DMR-B-2
scheme can become quicker than the TMR scheme.https://authors.library.caltech.edu/records/aqapz-4np57A Note on "A Systematic (12,8) Code for Correcting Single Errors and Detecting Adjacent Errors"
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetc94
Authors: Blaum, Mario; Bruck, Jehoshua; Tolhuizen, Ludo
Year: 1994
DOI: 10.1109/12.250619
J.W. Schwartz and J.K. Wolf (ibid., vol. 39, no. 11, pp. 1403-1404, Nov. 1990) gave a parity check matrix for a systematic (12,8) binary code that corrects all single errors and detects eight of the nine double adjacent errors within any of the three 4-bit nibbles. We present a parity check matrix for a systematic (12,8) binary code that corrects all single errors and detects any pair of errors within a nibble.https://authors.library.caltech.edu/records/3dcp9-bvy40Fault-tolerant de Bruijn and shuffle-exchange networks
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds94
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1994
DOI: 10.1109/71.282566
This paper addresses the problem of creating a fault-tolerant interconnection network for a parallel computer. Three topologies, namely, the base-2 de Bruijn graph, the base-m de Bruijn graph, and the shuffle-exchange, are studied. For each topology an N+k node fault-tolerant graph is defined. These fault-tolerant graphs have the property that given any set of k node faults, the remaining N nodes contain the desired topology as a subgraph. All of the constructions given are the best known in terms of the degree of the fault-tolerant graph. We also investigate the use of buses to reduce the degrees of the fault-tolerant graphs still further.https://authors.library.caltech.edu/records/x2g2k-n9975Efficient checkpointing over local area networks
https://resolver.caltech.edu/CaltechAUTHORS:ZIVftpds94
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1994
DOI: 10.1109/FTPDS.1994.494471
Parallel and distributed computing on clusters of workstations is becoming very popular as it provides a cost effective way for high performance computing. In these systems, the bandwidth of the communication subsystem (Using Ethernet technology) is about an order of magnitude smaller compared to the bandwidth of the storage subsystem. Hence, storing a state in a checkpoint is much more efficient than comparing states over the network.
In this paper we present a novel checkpointing approach that enables efficient performance over local area networks. The main idea is that we use two types of checkpoints: compare-checkpoints (comparing the states of the redundant processes to detect faults) and store-checkpoints (where the state is only stored). The store-checkpoints reduce the rollback needed after a fault is detected, without performing many unnecessary comparisons.
As a particular example of this approach we analyzed the DMR checkpointing scheme with store-checkpoints. Our main result is that the overhead of the execution time can be significantly reduced when store-checkpoints are introduced. We have implemented a prototype of the new DMR scheme and run it on workstations connected by a LAN. The experimental results we obtained match the analytical results and show that in some cases the overhead of the DMR checkpointing schemes over LAN's can be improved by as much as 20%.https://authors.library.caltech.edu/records/1sc9z-98g70Embedding cube-connected cycles graphs into faulty hypercubes
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeeetc94
Authors: Bruck, Jehoshua; Cypher, Robert; Soroker, Danny
Year: 1994
DOI: 10.1109/12.324546
We consider the problem of embedding a cube-connected cycles graph (CCC) into a hypercube with edge faults. Our main result is an algorithm that, given a list of faulty edges, computes an embedding of the CCC that spans all of the nodes and avoids all of the faulty edges. The algorithm has optimal running time and tolerates the maximum number of faults (in a worst-case setting). Because ascend-descend algorithms can be implemented efficiently on a CCC, this embedding enables the implementation of ascend-descend algorithms, such as bitonic sort, on hypercubes with edge faults. We also present a number of related results, including an algorithm for embedding a CCC into a hypercube with edge and node faults and an algorithm for embedding a spanning torus into a hypercube with edge faults.https://authors.library.caltech.edu/records/6qna5-eqy71Analysis of checkpointing schemes for multiprocessor systems
https://resolver.caltech.edu/CaltechAUTHORS:ZIVreldis94
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1994
DOI: 10.1109/RELDIS.1994.336909
Parallel computing systems provide hardware redundancy that helps to achieve low cost fault-tolerance, by duplicating the task into more than a single processor, and comparing the states of the processors at checkpoints. This paper suggests a novel technique, based on a Markov Reward Model (MRM), for analyzing the performance of checkpointing schemes with task duplication. We show how this technique can be used to derive the average execution time of a task and other important parameters related to the performance of checkpointing schemes. Our analytical results match well the values we obtained using a simulation program. We compare the average task execution time and total work of four checkpointing schemes, and show that generally increasing the number of processors reduces the average execution time, but increases the total work done by the processors. However, in cases where there is a big difference between the time it takes to perform different operations, those results can change.https://authors.library.caltech.edu/records/f7ez8-2s723Wildcard dimensions, coding theory and fault-tolerant meshes and hypercubes
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc95
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1995
DOI: 10.1109/12.367998
Hypercubes, meshes and tori are well known interconnection networks for parallel computers. The sets of edges in those graphs can be partitioned to dimensions. It is well known that the hypercube can be extended by adding a wildcard dimension resulting in a folded hypercube that has better fault-tolerant and communication capabilities. First we prove that the folded hypercube is optimal in the sense that only a single wildcard dimension can be added to the hypercube. We then investigate the idea of adding wildcard dimensions to d-dimensional meshes and tori. Using techniques from error correcting codes we construct d-dimensional meshes and tori with wildcard dimensions. Finally, we show how these constructions can be used to tolerate edge and node faults in mesh and torus networks.https://authors.library.caltech.edu/records/n6qsv-wbx29On Neural Networks with Minimal Weights
https://resolver.caltech.edu/CaltechPARADISE:1995.ETR005
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 1995
Linear threshold elements are the basic building blocks of artificial
neural networks. A linear threshold element computes a function
that is a sign of a weighted sum of the input variables. The weights
are arbitrary integers: actually, they can be very big integers-
exponential in the number of the input variables. However, in
practice, it is difficult to implement big weights. In the present
literature a distinction is made between the two extreme cases:
linear threshold functions with polynomial-size weights as opposed
to those with exponential-size weights. The main contribution of
this paper is to fill up the gap by further refining that separation.
Namely, we prove that the class of linear threshold functions with
polynomial-size weights can be divided into subclasses according
to the degree of the polynomial. In fact we prove a more general
result-that there exists a minimal weight linear threshold function
for any arbitrary number of inputs and any weight size. To prove
those results we have developed a novel technique for constructing
linear threshold functions with minimal weights.https://authors.library.caltech.edu/records/xx8aa-yck89Interleaving Schemes for Multidimensional Cluster Errors
https://resolver.caltech.edu/CaltechPARADISE:1995.ETR008
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1995
We present 2 and 3-dimensional interleaving techniques for correcting 2 and 3-
dimensional bursts (or clusters) of errors, where a cluster of errors is characterized by its
area or volume. A recent application of correction of 2-dimensional clusters appeared
in the context of holographic storage. Our main contribution is the construction of
efficient 2 and 3-dimensional interleaving schemes. The schemes are based on arrays of
integers with the property that every connected component of area or volume t consists
of distinct integers (we call these t-interleaved arrays). In the 2-dimensional case, our
constructions are optimal in the sense that they contain the smallest possible number
of distinct integers, hence minimizing the number of codes required in an interleaving
scheme.https://authors.library.caltech.edu/records/h144c-x3927Fault-Tolerant Cube Graphs and Coding Theory
https://resolver.caltech.edu/CaltechPARADISE:1995.ETR007
Authors: Bruck, Jehoshua; Ho, Ching-Tien
Year: 1995
Hypercubes, meshes, tori and Omega networks are well known interconnection
networks for parallel computers. The structure of those graphs can be described in a
more general framework called cube graphs. The idea is to assume that every node in
a graph with q to the power of l (letter l) nodes is represented by a unique string of l (letter l) symbols over GF(q). The edges are specified by a set of offsets, those are vectors of length l (letter l) over GF(q), where the two endpoints of an edge are an offset apart. We study techniques for tolerating edge faults in cube graphs that are based on adding redundant edges. The redundant
graph has the property that the structure of the original graph can be maintained
in the presence of edge faults. Our main contribution is a technique for adding the
redundant edges that utilizes constructions of error-correcting codes and generalizes
existing ad-hoc techniques.https://authors.library.caltech.edu/records/1fv46-acs93An Online Algorithm for Checkpointing Placement
https://resolver.caltech.edu/CaltechPARADISE:1995.ETR006
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1995
Checkpointing is a common technique for reducing the
time to recover from faults in computer systems. By saving
intermediate states of programs in a reliable storage,
check pointing enables to reduce the lost processing time caused
by faults. The length of the intervals between checkpoints
affects the execution time of programs. Long intervals lead
to long re-processing time, while too frequent checkpoint-
iizg leads to high checkpointing overhead. In this paper we
present an on-line algorithm for placement of checkpoints.
The algorithm uses on-line knowledge of the current cost
of a checkpoint when it decides whether or not to place a
checkpoint. We show how the execution time of a program
using this algorithm can be analyzed. The total overhead of
the execution time when the proposed algorithm is used is
smaller than the overhead when fixed intervals are used.
Although the proposed algorithm uses only on-line knowledge
about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge
of checkpointing cost.https://authors.library.caltech.edu/records/7w58g-man58EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures
https://resolver.caltech.edu/CaltechAUTHORS:20120216-065736330
Authors: Blaum, Mario; Brady, Jim; Bruck, Jehoshua; Menon, Jai
Year: 1995
DOI: 10.1109/12.364531
We present a novel method, that we call EVENODD, for tolerating up to two disk failures in RAID architectures. EVENODD employs the addition of only two redundant disks and consists of simple exclusive-OR computations. This redundant storage is optimal, in the sense that two failed disks cannot be retrieved with less than two redundant disks. A major advantage of EVENODD is that it only requires parity hardware, which is typically present in standard RAID-5 controllers. Hence, EVENODD can be implemented on standard RAID-5 controllers without any hardware changes. The most commonly used scheme that employes optimal redundant storage (i.e., two extra disks) is based on Reed-Solomon (RS) error-correcting codes. This scheme requires computation over finite fields and results in a more complex implementation. For example, we show that the complexity of implementing EVENODD in a disk array with 15 disks is about 50% of the one required when using the RS scheme. The new scheme is not limited to RAID architectures: it can be used in any system requiring large symbols and relatively short codes, for instance, in multitrack magnetic recording. To this end, we also present a decoding algorithm for one column (track) in error.https://authors.library.caltech.edu/records/4azcy-f4q81CCL: a portable and tunable collective communication library for scalable parallel computers
https://resolver.caltech.edu/CaltechAUTHORS:BALieeetpds95
Authors: Bala, Vasanth; Bruck, Jehoshua; Cypher, Robert; Elustondo, Pablo; Ho, Alex; Ho, Ching-Tien; Kipnis, Shlomo; Snir, Marc
Year: 1995
DOI: 10.1109/71.342126
A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model.https://authors.library.caltech.edu/records/t2dhr-6gm34PCODE: an efficient and reliable collective communication protocol for unreliable broadcast domain
https://resolver.caltech.edu/CaltechAUTHORS:BRUipps95
Authors: Bruck, Jehoshua; Dolev, Danny; Ho, Ching-Tien; Orni, Rimon; Strong, Ray
Year: 1995
DOI: 10.1109/IPPS.1995.395924
Existing programming environments for clusters are typically built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. For example, a broadcast that is implemented using a TCP/IP protocol (which is a point-to-point protocol) over a LAN is obviously inefficient as it is not utilizing the fact that the LAN is a broadcast medium. We have observed that the main difference between a distributed computing paradigm and a message passing parallel computing paradigm is that, in a distributed environment the activity of every processor is independent while in a parallel environment the collection of the user-communication layers in the processors can be modeled as a single global program. We have formalized the requirements by defining the notion of a correct global program. This notion provides a precise specification of the interface between the transport layer and the user-communication layer. We have developed PCODE, a new communication protocol that is driven by a global program and proved its correctness.
We have implemented the PCODE protocol on a collection of IBM RS/6000 workstations and on a collection of Silicon Graphics Indigo workstations, both communicating via UDP broadcast. The experimental results we obtained indicate that the performance advantage of PCODE over the current point-to-point approach (TCP) can be as high as an order of magnitude on a cluster of 16 workstations.https://authors.library.caltech.edu/records/2rat0-mbt87Delay-insensitive pipelined communication on parallel buses
https://resolver.caltech.edu/CaltechAUTHORS:20120215-131718595
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1995
DOI: 10.1109/12.381951
Consider a communication channel that consists of several subchannels transmitting simultaneously and asynchronously. As an example of this scheme, we can consider a board with several chips. The subchannels represent wires connecting between the chips where differences in the lengths of the wires might result in asynchronous reception. In current technology, the receiver acknowledges reception of the message before the transmitter sends the following message. Namely, pipelined utilization of the channel is not possible. Our main contribution is a scheme that enables transmission without an acknowledgment of the message, therefore enabling pipelined communication and providing a higher bandwidth. However, our scheme allows for a certain number of transitions from a second message to arrive before reception of the current message has been completed, a condition that we call skew. We have derived necessary and sufficient conditions for codes that can tolerate a certain amount of skew among adjacent messages (therefore, allowing for continuous operation) and detect a larger amount of skew when the original skew is exceeded. These results generalize previously known results. We have constructed codes that satisfy the necessary and sufficient conditions, studied their optimality, and devised efficient decoding algorithms. To the best of our knowledge, this is the first known scheme that permits efficient asynchronous communications without acknowledgment. Potential applications are in on-chip, on-board, and board to board communications, enabling much higher communication bandwidth.https://authors.library.caltech.edu/records/7rdht-08e65Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations
https://resolver.caltech.edu/CaltechAUTHORS:20160811-162638038
Authors: Bruck, Jehoshua; Dolev, Danny; Ho, Ching-Tien; Roşu, Marcel-Cătălin
Year: 1995
DOI: 10.1145/215399.215421
Parallel computing on clusters of workstations and personal
computers has very high potential, since it leverages existing hardware and software. Parallel programming environments offer the user a convenient way to express parallel computation and communication. In fact, recently, a Message Passing Interface (MPI) has been proposed as an industrial standard for writing "portable" message-passing parallel programs. The communication part of MPI consists of
the usual point-to-point communication as well as collective
communication. However, existing implementations of programming environments for clusters are built on top of a
point-to-point communication layer (send and receive) over
local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part.
In this paper, we present an efficient design and implementation of the collective communication part in MPI that is optimized for clusters of workstations. Our system consists of two main components: the MPI-CCL layer that includes the collective communication functionality of MPI
and a User-level Reliable Transport Protocol (URTP) that
interfaces with the LAN Data-link layer and leverages the
fact that the LAN is a broadcast medium. Our system is
integrated with the operating system via an efficient kernel
extension mechanism that we developed. The kernel
extension significantly improves the performance of our implementation as it can handle part of the communication
overhead without involving user space.
We have implemented our system on a collection of IBM
RS/6000 workstations connected via a lOMbit Ethernet LAN.
Our performance measurements are taken from typical scientific programs that run in a parallel mode by means of
the MPI. The hypothesis behind our design is that system's
performance will be bounded by interactions between the
kernel and user space rather than by the bandwidth delivered
by the LAN Data-Link Layer. Our results indicate that
the performance of our MPI Broadcast (on top of Ethernet)
is about twice as fast as a recently published software implementation of broadcast on top of ATM.https://authors.library.caltech.edu/records/y0fve-83x68Computing global combine operations in the multiport postal model
https://resolver.caltech.edu/CaltechAUTHORS:BARieeetpds95
Authors: Bar-Noy, Amotz; Bruck, Jehoshua; Ho, Ching-Tien; Kipnis, Shlomo; Schieber, Baruch
Year: 1995
DOI: 10.1109/71.406965
Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multiport postal model. This model is characterized by three parameters: n-the number of processors, k-the number of ports per processor, and λ-the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent from k other processors λ-1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of communication rounds and minimizes the time spent by any processor in sending and receiving messageshttps://authors.library.caltech.edu/records/da8ms-0fp39MDS Array Codes with Independent Parity Symbols
https://resolver.caltech.edu/CaltechAUTHORS:20120216-070547466
Authors: Blaum, Mario; Bruck, Jehoshua; Vardy, Alexander
Year: 1995
DOI: 10.1109/ISIT.1995.535761
A new family of maximum distance separable (MDS) array codes is presented. The code arrays contain p information columns and r independent parity columns, where p is a prime. We give necessary and sufficient conditions for our codes to be MDS, and then prove that if p belongs to a certain class of primes these conditions are satisfied up to r⩽8. We also develop efficient decoding procedures for the case of two and three column errors, and any number of column erasures. Finally, we present upper and lower bounds on the average number of parity bits which have to be updated in an MDS code over GF(2^m), following an update in a single information bit. We show that the upper bound obtained from our codes is close to the lower bound and does not depend on the size of the code symbols.https://authors.library.caltech.edu/records/rpfxf-ddw17On Neural Networks with Minimal Weights
https://resolver.caltech.edu/CaltechAUTHORS:20160223-114401229
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 1996
Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential
in the number of the input variables. However, in
practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights. The main contribution of
this paper is to fill up the gap by further refining that separation. Namely, we prove that the class of linear threshold functions with polynomial-size weights can be divided into subclasses according to the degree of the polynomial. In fact, we prove a more general result- that there exists a minimal weight linear threshold function
for any arbitrary number of inputs and any weight size. To prove those results we have developed a novel technique for constructing linear threshold functions with minimal weights.https://authors.library.caltech.edu/records/fn4q8-j9y21An on-line algorithm for checkpoint placement
https://resolver.caltech.edu/CaltechAUTHORS:ZIVissre96
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1996
DOI: 10.1109/ISSRE.1996.558869
Checkpointing is a common technique for reducing the time to recover from faults in computer systems. By saving intermediate states of programs in a reliable storage, checkpointing enables to reduce the lost processing time caused by faults. The length of the intervals between checkpoints affects the execution time of programs. Long intervals lead to long re-processing time, while too frequent checkpointing leads to high checkpointing overhead. In this paper we present an on-line algorithm for placement of checkpoints. The algorithm uses on-line knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. We show how the execution time of a program using this algorithm can be analyzed. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only on-line knowledge about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost.https://authors.library.caltech.edu/records/mn9hj-yks12Optimal Constructions of Fault-Tolerant Multistage Interconnection Networks
https://resolver.caltech.edu/CaltechPARADISE:1996.ETR014
Authors: Fan, Charles C.; Bruck, Jehoshua
Year: 1996
In this paper we discover the family of Fault-Tolerant Multistage Interconnection Networks
(MINs) that tolerates switch faults with a minimal number of redundant switching stages.
While previously known constructions handled switch faults by eliminating complete stages,
our approach is to bypass faulty switches by utilizing redundant paths. As a result, we
are able to construct the first known fault-tolerant MINs that are optimal in the number
of redundant stages. Our fault model assumes that a faulty switch can be bypassed and
our goal is to guarantee arbitrary point to point and broadcast connectivity. Under this
model, we show that to tolerate f switch faults the MIN must have at least f redundant
stages. We then present the explicit construction of a MIN that meets this lower-bound.
This construction repeatedly uses the singleton basis of the n-dimensional vector space as the
mask vectors of the MIN. We generalize this construction and prove that an n-dimensional
MIN is optimally fault-tolerant if and only if the mask vectors of every n consecutive stages
span the n-dimensional vector space.https://authors.library.caltech.edu/records/098qk-57131On Optimal Placements of Processors in Tori Networks
https://resolver.caltech.edu/CaltechPARADISE:1996.ETR012
Authors: Blaum, Mario; Bruck, Jehoshua; Pifarre, Gustavo ED.; Sanz, Jorge L. C.
Year: 1996
Two and three dimensional k-tori are among the most used topologies in the design of new
parallel computers. Traditionally (with the exception of the Tera parallel computer), these
networks have been used as fully-populated networks, in the sense that every routing node
in the topology is subjected to message injection. However, fully-populated tori and meshes
exhibit a theoretical throughput which degrades as the network size increases. In addition,
the performance of those networks is sensitive to link faults. In contrast, multistage networks
(that are partially populated) scale well with the network size. We propose to add slackness in
fully-populated tori by reducing the number of processors and we study optimal fault-tolerant
routing strategies for the resulting interconnections.
The key concept that we study is the average link load in an interconnection network with
a given placement and a routing algorithm, where a placement is the subset of the nodes in the
interconnection network that are attached to processors. Reducing the load on the links by the
choice of a placement and a routing algorithm leads to improvements in both the performance
and the fault tolerance of the communication system.
Our main contribution is the construction of optimal placements for 2 and 3-dimensional
k-tori networks and their corresponding routing algorithms. Those placements yield a linear (in
the number of processors) link load and are of optimal size.https://authors.library.caltech.edu/records/1g70n-67892Multiple Threshold Neural Logic
https://resolver.caltech.edu/CaltechPARADISE:1996.ETR010
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 1996
We introduce a new Boolean computing element, related to the Boolean version of a
neural element. Instead of the sign function in the Boolean neural element, (also known
as an LT element), it computes an arbitrary (with polynomialy many transitions) Boolean
function of the weighted sum of its inputs. We call the new computing element an LTM
element, which stands for Linear Threshold with Multiple transitions.
The paper consists of the following main contributions related to our study of LTM
circuits: (i) the characterization of the computing power of LTM relative to LT circuits,
(ii) a proof that the area of the VLSI layout, is reduced from O(n to the power of 2) in LT circuits to O(n) in LTM circuits, for n inputs symmetric Boolean functions, and (iii) the creation of efficient
designs of LTM circuits for the addition of a multiple number of integers and the product,
of two integers. In particular, we show how to compute the addition of m integers with a
single layer of LTM elements.https://authors.library.caltech.edu/records/npa6y-nyj57Efficient Digital to Analog Encoding
https://resolver.caltech.edu/CaltechPARADISE:1996.ETR009
Authors: Gibson, Michael A.; Bruck, Jehoshua
Year: 1996
NOTE: Text or symbols not renderable in plain ASCII are indicated by [...]. Abstract included in .pdf document.
An important issue in analog circuit design is the problem of digital to analog conversion,
namely, the encoding of Boolean variables into a single analog value which contains enough
information to reconstruct the values of the Boolean variables. A natural question is: What
is the complexity of implementing the digital to analog encoding function? That question was
recently answered in (5), where matching lower and upper bounds on the size of the circuit for
the encoding function were proven. In particular, it was proven that [...] 2-input arithmetic
gates are necessary and sufficient for implementing the encoding function of n Boolean variables.
However, the proof of the upper bound is not constructive.
In this paper, we present an explicit construction of a digital to analog encoder that is
optimal in the number of 2-input arithmetic gates. In addition, we present an efficient analog
to digital decoding algorithm. Namely, given the encoded analog value, our decoding algorithm
reconstructs the original Boolean values. Our construction is suboptimal in that it uses constants
[...] bits.https://authors.library.caltech.edu/records/t5axy-4xq58Deterministic Voting in Distributed Systems Using Error-Correcting Codes
https://resolver.caltech.edu/CaltechPARADISE:1996.ETR011
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1996
Distributed voting is an important problem in reliable computing. In an N
Modular Redundant (NMR) system, the N computational modules execute identical tasks
and they need to periodically vote on their current states. In this paper, we propose a
deterministic majority voting algorithm for NMR systems. Our voting algorithm uses
error-correcting codes to drastically reduce the average case communication
complexity. In particular, we show that the efficiency of our voting algorithm can be improved
by choosing the parameters of the error correcting code to match the probability of
the computational faults. For example, consider an NMR system with 31 modules,
each with a state of m bits, where each module has an independent computational
error probability of 10 to the power of minus 3. In this NMR system, our algorithm can reduce the average case communication complexity to approximately 1.0825m compared with the
communication complexity of 31m of the naive algorithm in which every module broadcasts
its local result to all other modules. We have also implemented the voting algorithm
over a network of workstations. The experimental performance results match well the
theoretical predictions.https://authors.library.caltech.edu/records/34cns-m4519Algebraic Techniques for Constructing Minimal Weight Threshold Functions
https://resolver.caltech.edu/CaltechPARADISE:1996.ETR015
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 1996
A linear threshold element computes a function that is a sign of a weighted sum of the
input variables. The weights are arbitrary integers; actually, they can be very big integers-
exponential in the number of the input variables. While in the present literature a distinction is
made between the two extreme cases of linear threshold functions with polynomial-size weights
as opposed to those with exponential-size weights, the best known lower bounds on the size
of threshold circuits are for depth-2 circuits with small weights. Our main contributions are
devising two distinct methods for constructing threshold functions with minimal weights and
filling up the gap between polynomial and exponential weight growth by further refining the
separation. Namely, we prove that the class of linear threshold functions with polynomial-size
weights can be divided into subclasses according to the degree of the polynomial. In fact, we
prove a more general result-that there exists a minimal weight linear threshold function for
any arbitrary number of inputs and any weight size.https://authors.library.caltech.edu/records/zwzxc-fts62A Coding Approach for Detection of Tampering in Write-Once Optical Disks
https://resolver.caltech.edu/CaltechPARADISE:1996.ETR013
Authors: Blaum, Mario; Bruck, Jehoshua; Rubin, Kurt; Lenth, Wilfried
Year: 1996
We present coding methods for protecting against tampering of write-once optical
disks which turns them into a secure digital medium for applications where critical
information must be stored in a way that presents or allows detection of an attempt at
falsification. Our method involves adding a small amount of redundancy to a modulated
sector of data. This extra redundancy is not used for normal operation, but can be
used for determining, say as a testimony in court, that a disk has not been tampered
with.https://authors.library.caltech.edu/records/3xy7r-6pq82On the design and implementation of broadcast and global combine operations using the postal model
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds96
Authors: Bruck, Jehoshua; De Coster, Luc; Dewulf, Natalie; Ho, Ching-Tien; Lauwereins, Rudy
Year: 1996
DOI: 10.1109/71.491579
There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1.
Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%.https://authors.library.caltech.edu/records/43aw7-78d60MDS array codes with independent parity symbols
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit96
Authors: Blaum, Mario; Bruck, Jehoshua; Vardy, Alexander
Year: 1996
DOI: 10.1109/18.485722
A new family of maximum distance separable (MDS) array codes is presented. The code arrays contain p information columns and r independent parity columns, each column consisting of p-1 bits, where p is a prime. We extend a previously known construction for the case r=2 to three and more parity columns. It is shown that when r=3 such extension is possible for any prime p. For larger values of r, we give necessary and sufficient conditions for our codes to be MDS, and then prove that if p belongs to a certain class of primes these conditions are satisfied up to r ≤ 8. One of the advantages of the new codes is that encoding and decoding may be accomplished using simple cyclic shifts and XOR operations on the columns of the code array. We develop efficient decoding procedures for the case of two- and three-column errors. This again extends the previously known results for the case of a single-column error. Another primary advantage of our codes is related to the problem of efficient information updates. We present upper and lower bounds on the average number of parity bits which have to be updated in an MDS code over GF (2^m), following an update in a single information bit. This average number is of importance in many storage applications which require frequent updates of information. We show that the upper bound obtained from our codes is close to the lower bound and, most importantly, does not depend on the size of the code symbols.https://authors.library.caltech.edu/records/8w2ps-yt124On Optimal Placements of Processors in Tori Networks
https://resolver.caltech.edu/CaltechAUTHORS:20120207-113452642
Authors: Blaum, Mario; Bruck, Jehoshua; Pifarré, Gustavo D.; Sanz, Jorge L. C.
Year: 1996
DOI: 10.1109/SPDP.1996.570382
Two and three dimensional k-tori are among the most used topologies in the designs of new parallel computers. Traditionally (with the exception of the Tera parallel computer), these networks have been used as fully-populated networks, in the sense that every routing node in the topology is subjected to message injection. However, fully populated tori and meshes exhibit a theoretical throughput which degrades as the network size increases. In contrast, multistage networks (that are partially populated) scale well with the network size. Introducing slackness in fully populated tori, i.e., reducing the number of processors, and studying optimal routing strategies for the resulting interconnections are the central subjects of the paper. The key concept is the placement of the processors in a network together with a routing algorithm between them, where a placement is the subset of the nodes in the interconnection network that are attached to processors. The main contribution is the construction of optimal placements for d-dimensional k-tori networks, of sizes k and k^2 and the corresponding routing algorithms for the cases d=2 and d=3, respectively.https://authors.library.caltech.edu/records/xvv2d-73504Fault-tolerant cube graphs and coding theory
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetit96
Authors: Bruck, Jehoshua; Ho, Ching-Tien
Year: 1996
DOI: 10.1109/18.556609
Hypercubes, meshes, tori, and Omega networks are well-known interconnection networks for parallel computers. The structure of those graphs can be described in a more general framework called cube graphs. The idea is to assume that every node in a graph with ql nodes is represented by a unique string of l symbols over GF(q). The edges are specified by a set of offsets, those are vectors of length l over GF(q), where the two endpoints of an edge are an offset apart. We study techniques for tolerating edge faults in cube graphs that are based on adding redundant edges. The redundant graph has the property that the structure of the original graph can be maintained in the presence of edge faults. Our main contribution is a technique for adding the redundant edges that utilizes constructions of error-correcting codes and generalizes existing ad hoc techniques.https://authors.library.caltech.edu/records/jtga1-pr659Array Codes for Correction of Criss-Cross Errors
https://resolver.caltech.edu/CaltechAUTHORS:20120119-113410577
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1997
DOI: 10.1109/ISIT.1997.613349
We present MDS array codes of size (p-1)×(p-1), p is a prime number, that can correct any row or column in error without a priori knowledge of what type of error that has occurred. The complexity of the encoding and decoding algorithms is lower than that of known codes with the same error-correcting power, since our algorithms are based on exclusive-OR operations over lines of different slopes, as opposed to algebraic operations over a finite field.https://authors.library.caltech.edu/records/npsne-szw82X-Code: MDS Array Codes with Optimal Encoding
https://resolver.caltech.edu/CaltechPARADISE:1997.ETR020
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1997
We present a new class of MDS array codes of size n x n (n a prime number)
called X-Code. The X-Codes are of minimum column distance 3, namely, they can
correct either one column error or two column erasures. The key novelty in X-code is
that it has a simple geometrical construction which achieves encoding/update optimal
complexity, namely, a change of any single information bit affects exactly two parity
bits. The key idea in our constructions is that all parity symbols are placed in rows
rather than columns.https://authors.library.caltech.edu/records/tq5cm-6qa49Two-Dimensional Interleaving Schemes with Repetitions
https://resolver.caltech.edu/CaltechPARADISE:1997.ETR016
Authors: Blaum, Mario; Bruck, Jehoshua; Farrell, Paddy
Year: 1997
We present 2-dimensional interleaving schemes, with repetition, for correcting 2-
dimensional bursts (or clusters) of errors, where a cluster of errors is characterized by
its area. A recent application of correction of 2-dimensional clusters appeared in the
context of holographic storage. Known interleaving schemes are based on arrays of
integers with the property that every connected component of area t consists of distinct
integers. Namely, they are based on the use of 1-error-correcting codes. We extend this
concept by allowing repetitions within the arrays, hence, providing a trade-off between
the error-correcting capability of the codes and the degree of the interleaving schemes.https://authors.library.caltech.edu/records/cz8ad-gxp22Programmable Neural Logic
https://resolver.caltech.edu/CaltechPARADISE:1997.ETR017
Authors: Bohossian, Vasken; Hasler, Paul; Bruck, Jehoshua
Year: 1997
NOTE: Text or symbols not renderable in plain ASCII are indicated by [...]. Abstract is included in .pdf document.
Circuits of threshold elements (Boolean input, Boolean output neurons) have been
shown to be surprisingly powerful. Useful functions such as XOR, ADD and MULTIPLY
can be implemented by such circuits more efficiently than by traditional AND/OR cir-
cuits. In view of that, we have designed and built a programmable threshold element.
The weights are stored on polysilicon floating gates, providing long-term retention
without refresh. The weight value is increased using tunneling and decreased via hot electron
injection. A weight is stored on a single transistor allowing the development of dense
arrays of threshold elements. A 16-input programmable neuron was fabricated in the
standard 2 [...] double-poly, analog process available from MOSIS. A long term goal
of this research is to incorporate programmable threshold elements, as building blocks in
Field Programmable Gate Arrays.https://authors.library.caltech.edu/records/w3av0-39113Partial-Sum Queries in OLAP Data Cubes Using Covering Codes
https://resolver.caltech.edu/CaltechPARADISE:1997.ETR018
Authors: Ho, Ching-Tien; Bruck, Jehoshua; Agrawal, Rakesh
Year: 1997
A partial-sum query obtains the summation over a set of
specified cells of a data cube. We establish a connection
between the covering problem in the theory of covering codes
and the partial-sum problem and use this connection to
devise algorithms for the partial-sum problem with efficient
space-time trade-offs. For example, using our algorithms,
with 44% additional storage, the query response time can
be improved by about 12%; by roughly doubling the storage
requirement, the query response time can be improved by
about 34%.https://authors.library.caltech.edu/records/afjb1-vmg34MDS Array Codes for Correcting Criss-Cross Errors
https://resolver.caltech.edu/CaltechPARADISE:1997.ETR019
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1997
We present a family of MDS array codes of size (p - 1) x ( p - 1), p a prime number,
and minimum criss-cross distance 3, i.e., the code is capable of correcting any row
or column in error, without apriori knowledge of what type of error occurred. The
complexity of the encoding and decoding algorithms is lower than that of known codes
with the same error-correcting power, since our algorithms are based on exclusive-
OR operations over lines of different slopes. as opposed to algebraic operations over a
finite field. Tl'e also provide efficient encoding and decoding algorithms for errors and
erasures.https://authors.library.caltech.edu/records/wa4cr-bce03Partial-sum queries in OLAP data cubes using covering codes
https://resolver.caltech.edu/CaltechAUTHORS:20161103-134218465
Authors: Ho, Ching-Tien; Bruck, Jehoshua; Agrawal, Rakesh
Year: 1997
DOI: 10.1145/263661.263686
A partial-sum query obtains the summation over a set of specified cells of a data cube. We establish a connection between the covering problem in the theory of covering codes and the partial-sum problem and use this connection to devise algorithms for the partial-sum problem with efficient space-time trade-offs. For example, using our algorithms, with 44% additional storage, the query response time can be improved by about 12%; by roughly doubling the storage requirement, the query response time can be improved by about 34%.https://authors.library.caltech.edu/records/5asp5-sky26Two-dimensional interleaving schemes with repetitions
https://resolver.caltech.edu/CaltechAUTHORS:20120119-135511706
Authors: BIaum, Mario; Bruck, Jehoshua; Farrell, Patrick G.
Year: 1997
DOI: 10.1109/ISIT.1997.613272
We present 2-dimensional interleaving schemes, with repetition, for correcting 2-dimensional bursts (or clusters) of errors, where a cluster of errors is characterized by its area. Known interleaving schemes are based on arrays of integers with the property that every connected component of area t consists of distinct integers. Namely, they are based on the use of 1-error-correcting codes. We extend this concept by allowing repetitions within the arrays, hence, providing a trade-off between the error-correcting capability of the codes and the degree of the interleaving schemes.https://authors.library.caltech.edu/records/cn0x9-6er65An on-line algorithm for checkpoint placement
https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc97b
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1997
DOI: 10.1109/12.620479
Checkpointing enables us to reduce the time to recover from a fault by saving intermediate states of the program in a
reliable storage. The length of the intervals between checkpoints affects the execution time of programs. On one hand, long intervals lead to long reprocessing time, while, on the other hand, too frequent checkpointing leads to high checkpointing overhead. In this paper, we present an on-line algorithm for placement of checkpoints. The algorithm uses knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only on-line knowledge about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost.https://authors.library.caltech.edu/records/074wp-fx331Programmable neural logic
https://resolver.caltech.edu/CaltechAUTHORS:BOHiciss97
Authors: Bohossian, Vasken; Hasler, Paul; Bruck, Jehoshua
Year: 1997
DOI: 10.1109/ICISS.1997.630242
Circuits of threshold elements ( Boolean input, Boolean output neurons ) have been shown to be surprisingly powerful. Useful functions such as XOR, ADD and MULTIPLY can be implemented by such circuits more efficiently than by traditional AND/OR circuits. In view of that, we have designed and built a programmable threshold element. The weights are stored on polysilicon floating gates, providing long-term retention without refresh. The weight value is increased using tunneling and decreased via hot electron injection. A weight is stored on a single transistor allowing the development of dense arrays of threshold elements. A 16-input programmable neuron was fabricated in the standard 2 μm double - poly, analog process available from MOSIS. A long term goal of this research is to incorporate programmable threshold elements, as building blocks in Field Programmable Gate Arrays.https://authors.library.caltech.edu/records/6fmjr-vpz17Efficient algorithms for all-to-all communications in multiport message-passing systems
https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds97
Authors: Bruck, Jehoshua; Ho, Ching-Tien; Kipnis, Shlomo; Upfal, Eli; Weathersby, Derrick
Year: 1997
DOI: 10.1109/71.642949
We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected message-passing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k ≥ 1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth.
In the index operation among n processors, initially, each processor has n blocks of data, and the goal is to exchange the ith block of processor j with the jth block of processor i. We present a class of index algorithms that is designed for all values of n and that features a trade-off between the communication start-up time and the data transfer time. This class of algorithms includes two special cases: an algorithm that is optimal with respect to the measure of the start-up time, and an algorithm that is optimal with respect to the measure of the data transfer time. We also present experimental results featuring the performance tuneability of our index algorithms on the IBM SP-1 parallel system.
In the concatenation operation, among n processors, initially, each processor has one block of data, and the goal is to concatenate the n blocks of data from the n processors, and to make the concatenation result known to all the processors. We present a concatenation algorithm that is optimal, for most values of n, in the number of communication rounds and in the amount of data transferred.https://authors.library.caltech.edu/records/ypzfe-0bb45Performance optimization of checkpointing schemes with task duplication
https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc97a
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1997
DOI: 10.1109/12.641939
In checkpointing schemes with task duplication, checkpointing serves two purposes: detecting faults by comparing the processors' states at checkpoints, and reducing fault recovery time by supplying a safe point to rollback to. In this paper, we show that, by tuning the checkpointing schemes to a given architecture, a significant reduction in the execution time can be achieved. The main idea is to use two types of checkpoints: compare-checkpoints (comparing the states of the redundant processes to detect faults) and store-checkpoints (storing the states to reduce recovery time). With two types of checkpoints, we can use both the comparison and storage operations in an efficient way and improve the performance of checkpointing schemes. Results we obtained show that, in some cases, using compare and store checkpoints can reduce the overhead of DMR checkpointing schemes by as much as 30 percent.https://authors.library.caltech.edu/records/hckdg-rf028Multiple Threshold Neural Logic
https://resolver.caltech.edu/CaltechAUTHORS:20160224-141437128
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 1998
We introduce a new Boolean computing element related to the Linear Threshold element, which is the Boolean version of the neuron. Instead of the sign function, it computes an arbitrary (with polynomialy many transitions) Boolean function of the weighted sum of its inputs. We call the new computing element an LT M element, which stands for Linear Threshold with Multiple transitions.
The paper consists of the following main contributions related to our study of LTM circuits: (i) the creation of efficient designs of LTM circuits for the addition of a multiple number of integers and the product of two integers. In particular, we show how to compute
the addition of m integers with a single layer of LT M elements. (ii) a proof that the area of the VLSI layout is reduced from O(n^2) in LT circuits to O(n) in LTM circuits, for n inputs symmetric Boolean functions, and (iii) the characterization of the computing power of LT M relative to LT circuits.https://authors.library.caltech.edu/records/tb591-xtq92A coding approach for detection of tampering in write-once optical disks
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetc98
Authors: Blaum, Mario; Bruck, Jehoshua; Rubin, Kurt; Lenth, Wilfried
Year: 1998
DOI: 10.1109/12.656095
We present coding methods for protecting against tampering of write-once optical disks, which turns them into a secure digital medium for applications where critical information must be stored in a way that prevents or allows detection of an attempt at falsification. Our method involves adding a small amount of redundancy to a modulated sector of data. This extra redundancy is not used for normal operation, but can be used for determining, say, as a testimony in court, that a disk has not been tampered with.https://authors.library.caltech.edu/records/g0495-at522Trading Weight Size for Circuit Depth: A Circuit for Comparison
https://resolver.caltech.edu/CaltechPARADISE:1998.ETR028
Authors: Bohossian, Vasken; Riedel, Marc D.; Bruck, Jehoshua
Year: 1998
NOTE: Text or symbols not renderable in plain ASCII are indicated by [...]. Abstract included in .pdf
document.
We present an explicit construction of a circuit for the COMPARISON function in [...],
the class of polynomial-size linear threshold circuits of depth two with polynomially growing
weights. Goldmann and Karpinski proved that [...] in [4]. Hofmeister presented a
simplified version of the same result in [6]. We have further simplified the results of these two
papers by limiting ourselves to the simulation of COMPARISON. Our construction has size
[...], a significant improvement on the general bound of [...] in [6].https://authors.library.caltech.edu/records/c72rs-18e50Tolerating Faults in Counting Networks
https://resolver.caltech.edu/CaltechPARADISE:1998.ETR022
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 1998
Counting networks were proposed by Aspnes, Herlihy and Shavit [4] as a technique
for solving multiprocessor coordination problems. We describe a method for tolerating an
arbitrary number of faults in counting networks. In our fault model, the following errors can occur
dynamically in the counting network data structure: 1) a balancer's state is spuriously altered, 2)
a balancer's state can no longer be accessed.
We propose two approaches for tolerating faults. The first is based on a construction for a
fault-tolerant balancer. We substitute a fault-tolerant balancer for every balancer in a counting
network. Thus, we transform a counting network with depth O(log to the power of 2 n); where n is the
width, into a k-fault-tolerant counting network with depth O(k log to the power of 2 n).
The second approach is to append a correction network, built with fault-tolerant balancers, to a
counting network that may experience faults. We present a bound on the error in the output token
distribution of counting networks with faulty balancers (a generalization of the error bound for
sorting networks with faulty comparators presented by Yao & Yao [21]. Given a token distribution
with a bounded error, the correction network produces a token distribution that is smooth, i.e.,
the number of tokens on each output wire differs by at most one (a weaker condition than the
step property). In order to tolerate k faults, the correction network has depth O (k to the power of 2
log n) for a network of width n.https://authors.library.caltech.edu/records/47r2m-0k938Low Density MDS Codes and Factors of Complete Graphs
https://resolver.caltech.edu/CaltechPARADISE:1998.ETR025
Authors: Xu, Lihao; Bohossian, Vasken; Bruck, Jehoshua; Wagner, David G.
Year: 1998
We reveal an equivalence relation between the construction of a new class of low density
MDS array codes, that we call B-Code, and a combinatorial problem known as perfect one-
factorization of complete graphs. We use known perfect one-factors of complete graphs to
create constructions and decoding algorithms for both B-Code and its dual code. B-Code and
its dual are optimal in the sense that (i) they are MDS, (ii) they have an optimal encoding
property, i.e., the number of the parity bits that are affected by change of a single information
bit is minimal and (iii) they have optimal length. The existence of perfect one-factorizations
for every complete graph with an even number of nodes is a 35 years long conjecture in graph
theory. The construction of B-codes of arbitrary odd length will provide an affirmative answer
to the conjecture.https://authors.library.caltech.edu/records/jcwj7-97e76Improving the Performance of Data Servers Using Array Codes
https://resolver.caltech.edu/CaltechPARADISE:1998.ETR027
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1998
This paper discusses improving performance (throughput) of data server systems by
introducing proper data redundancy into the system. General performance properties of a
server system with redundant data are described. We show that proper data redundancy
in a server system can significantly improve the performance, in addition to the reliability
of the system. Two problems related to the performance together with their solutions
are proposed, namely, the problems of efficient data distribution scheme for the severs
and data acquisition scheme for the client. Both schemes utilize array codes, a class of
error-correcting codes whose encoding and decoding procedures only use simple binary
exclusive-OR operations, which can be implemented efficiently in software and/or hardware.
Construction of general MDS array codes suitable for the both schemes is discussed. A new
property of MDS array codes, called the strong MDS property, is also defined to improve
the data acquisition performance. A method for modeling data server performance and the
related experimental results are presented as well.https://authors.library.caltech.edu/records/syxb8-kqh08Fault-Tolerant Switched Local Area Networks
https://resolver.caltech.edu/CaltechPARADISE:1998.ETR021
Authors: LeMahieu, Paul S.; Bohossian, Vasken; Bruck, Jehoshua
Year: 1998
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly
reliable distributed systems by leveraging commercially available personal computers, workstations and
interconnect technologies. In particular; the issue of reliable communication is addressed by
introducing redundancy in the form of multiple network interfaces per computer node. When using compute
nodes with multiple network connections
the question of how to best connect these nodes to a given network of switches arises. We examine
networks of
switches (e.g. based on Myrinet technology) and focus on degree two compute nodes (two network adaptor
cards
per node). Our primary goal is to create networks that are as resistant as possible to partitioning.
Our main contributions are: (i) a construction for degree-2 compute nodes connected by a ring network of
switches of degree 4 that can tolerate any 3 switch failures without partitioning the nodes into
disjoint sets, (ii)
a proof that this construction is optimal in the sense that no construction can tolerate more switch
failures while
avoiding partitioning and (iii) generalizations of this construction to arbitrary switch and node
degrees and to
other switch networks, in particular; to a fully-connected network of switches.https://authors.library.caltech.edu/records/re8wp-xm408An Efficient Algorithm for Generating Trajectories of Stochastic Gene Regulation Reactions
https://resolver.caltech.edu/CaltechPARADISE:1998.ETR026
Authors: Gibson, Michael A.; Bruck, Jehoshua
Year: 1998
Systems of weakly coupled chemical equations occur in gene regulation and other biological
systems. For small numbers of molecules (as in a small cell), the usual differential equations
approach to chemical kinetics must be replaced with a stochastic approach. To deal with this
kind of system, one generates trajectories through stochastic phase space. By generating a large
enough number of trajectories, one can understand the statistics of the behavior of the complex,
non-linear system.
The algorithms for dealing with sparsely connected stochastic processes are not as advanced
as those for sparse deterministic processes. In particular. the existing algorithm of choice for
generating trajectories, which is not optimized in any way for sparseness, is O(rE), where r is
the number of reactions and E is the number of reaction events in the trajectory. \Ye present
two algorithms of O(r + Elogr), one of which is a simple extension of the existing algorithm,
and the other of which is more subtle. The latter is more easily extended to include stochastic
processes of different types.
We apply our faster algorithm to a model of bacteriophage lambda and are able to run the
same calculations on a cluster of desktop workstations that previously required a supercomputer.
This allows us to run more complicated calculations than could be done previously. As an
example of this, we analyse the sensitivity of the lambda model to the values of several of
its parameters. We find that the model is relatively insensitive to changes in the translation
rate, protein dimerization rates and protein degradation rates; is somewhat sensitive to the
transcription rate. and is extremely sensitive to the average number of proteins per mRNA
transcript.https://authors.library.caltech.edu/records/39v3v-60q25A Leader Election Protocol for Fault Recovery in Asynchronous Fully-Connected Networks
https://resolver.caltech.edu/CaltechPARADISE:1998.ETR024
Authors: Franceschetti, Massimo; Bruck, Jehoshua
Year: 1998
We introduce a new algorithm for consistent failure detection in asynchronous
systems. Informally, consistent failure detection requires processes in a distributed system to distinguish between two different populations: a fault free
population and a faulty one.
The major contribution of this paper is in combining ideas from group
membership and leader election, in order to have an election protocol for a
fault manager whose convergence is delayed until a new consistent view of the connectivity
of the network is established by all processes. In our algorithm a group of
processes agrees upon the failed population of the system, and then gives to a unique
leader, called the fault manager, the possibility of executing distributed tasks in
a centralized way.
This research and the new perspective that we propose are driven by the
study of an actual system, the Caltech RAIN (Reliable Array of Independent
Nodes), on which our protocol has been implemented in order to perform fault
recovery in distributed checkpointing. Other potential applications include fault
tolerant distributed database services and fault tolerant distributed web servers.https://authors.library.caltech.edu/records/6z835-5g480A Consistent History Link Connectivity Protocol
https://resolver.caltech.edu/CaltechPARADISE:1998.ETR023
Authors: LeMahieu, Paul S.; Bruck, Jehoshua
Year: 1998
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating
highly reliable distributed systems by leveraging commercially available personal computers.
workstations and interconnect technologies. In particular, the issue of reliable communication
is addressed by introducing redundancy in the form of multiple network interfaces per compute
node.
When using compute nodes with multiple network connections the question of how to
determine connectivity between nodes arises. We examine a connectivity protocol that guarantees
that each side of a point-to-point connection sees the same history of activity over the commu-
nication channel. In other words, we maintain a consistent history of the state of the
communication channel. At any give moment in time the histories as seen by each side are guaranteed
to be identical to within some number of transitions. This bound on how much one side may
lead or lag the other is the slack.
Our main contributions are: (i) a simple, stable protocol for monitoring connectivity that
maintains a consistent history with bounded slack. and (ii) proofs that this protocol exhibits
correctness, bounded slack, and stability.https://authors.library.caltech.edu/records/gvtmd-4r446Analysis of checkpointing schemes with task duplication
https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc98
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/12.663769
This paper suggests a technique for analyzing the performance of checkpointing schemes with task duplication. We show how this technique can be used to derive the average execution time of a task and other important parameters related to the performance of checkpointing schemes. The analysis results are used to study and compare the performance of four existing checkpointing schemes. Our comparison results show that, in general, the number of processors used, not the complexity of the scheme, has the most effect on the scheme performance.https://authors.library.caltech.edu/records/zd31m-rh050Fault-tolerant switched local area networks
https://resolver.caltech.edu/CaltechAUTHORS:20111215-115455804
Authors: LeMahieu, Paul; Bohossian, Vasken; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/IPPS.1998.670011
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal computers, workstations and interconnect technologies. In particular the issue of reliable communication is addressed by introducing redundancy in the form of multiple network interfaces per compute node. When using compute nodes with multiple network connections the question of how to best connect these nodes to a given network of switches arises. We examine networks of switches (e.g. based on Myrinet technology) and focus on degree-two compute nodes (two network adaptor cards per node). Our primary goal is to create networks that are as resistant as possible to partitioning. Our main contributions are: (i) a construction for degree-2 compute nodes connected by a ring network of switches of degree 4 that can tolerate any 3 switch failures without partitioning the nodes into disjoint sets; (ii) a proof that this construction is optimal in the sense that no construction can tolerate more switch failures while avoiding partitioning; and (ii) generalizations of this construction to arbitrary switch and node degrees and to other switch networks, in particular to a fully-connected network of switches.https://authors.library.caltech.edu/records/8ev4h-nkh42Interleaving schemes for multidimensional cluster errors
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit98
Authors: Blaum, Mario; Bruck, Jehoshua; Vardy, Alexander
Year: 1998
DOI: 10.1109/18.661516
We present two-dimensional and three-dimensional interleaving techniques for correcting two- and three-dimensional bursts (or clusters) of errors, where a cluster of errors is characterized by its area or volume. Correction of multidimensional error clusters is required in holographic storage, an emerging application of considerable importance. Our main contribution is the construction of efficient two-dimensional and three-dimensional interleaving schemes. The proposed schemes are based on t-interleaved arrays of integers, defined by the property that every connected component of area or volume t consists of distinct integers. In the two-dimensional case, our constructions are optimal: they have the lowest possible interleaving degree. That is, the resulting t-interleaved arrays contain the smallest possible number of distinct integers, hence minimizing the number of codewords required in an interleaving scheme. In general, we observe that the interleaving problem can be interpreted as a graph-coloring problem, and introduce the useful special class of lattice interleavers. We employ a result of Minkowski, dating back to 1904, to establish both upper and lower bounds on the interleaving degree of lattice interleavers in three dimensions. For the case t≡0 mod 6, the upper and lower bounds coincide, and the Minkowski lattice directly yields an optimal lattice interleaver. For t≠0 mod 6, we construct efficient lattice interleavers using approximations of the Minkowski lattice.https://authors.library.caltech.edu/records/t4s49-2nn79A consistent history link connectivity protocol
https://resolver.caltech.edu/CaltechAUTHORS:20161122-142619200
Authors: LeMahieu, Paul; Bruck, Jehoshua
Year: 1998
DOI: 10.1145/277697.277757
Given the prevalence of powerful personal workstations
connected over local area networks, it is only natural that
people are exploring distributed computing over such systems. Whenever systems become distributed the issue of
fault tolerance becomes an important consideration. In the
context of the RAIN project (Reliable Arrays of Independent
Nodes) at Caltech, we have been looking into fault tolerance
in several elements of the distributed system. One
important aspect of this is the introduction of fault tolerance into the communication system by introducing redundant network elements and redundant network interfaces.https://authors.library.caltech.edu/records/a7akm-zzc65Efficient digital to analog encoding
https://resolver.caltech.edu/CaltechAUTHORS:20111215-111208543
Authors: Gibson, Michael; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/ISIT.1998.708930
An important issue in analog circuit design is the problem of digital to analog conversion, namely, the encoding of Boolean variables into a single analog value which contains enough information to reconstruct the values of the Boolean variables. Wegener (1996) proved that [3n-1/2] 2-input arithmetic gates are necessary and sufficient for implementing the encoding function of n Boolean variables. However, the proof of the upper bound is not constructive. We present an explicit construction of a digital to analog encoder that is optimal in the number of 2-input arithmetic gates.https://authors.library.caltech.edu/records/ffgb2-xdf10Coding for skew correcting and detecting in parallel asynchronous communications
https://resolver.caltech.edu/CaltechAUTHORS:20120112-112036174
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/ISIT.1998.708659
We study the problem of pipelined transmission in parallel asynchronous communications allowing a certain amount of skew. We redefine the concept of skew in a way that extends previously known results in this area. Using the new definition of skew, we derive the necessary and sufficient conditions for codes that can tolerate a certain amount of skew and detect a larger amount of skew when the tolerating threshold is exceeded.https://authors.library.caltech.edu/records/n6axc-apn80Low density MDS codes and factors of complete graphs
https://resolver.caltech.edu/CaltechAUTHORS:XULisit98
Authors: Xu, Lihao; Bohossian, Vasken; Bruck, Jehoshua; Wagner, David G.
Year: 1998
DOI: 10.1109/ISIT.1998.708599
We reveal an equivalence relation between the construction of a new class of low density MDS array codes, that we call B-Code, and a combinatorial problem known as perfect one-factorization of complete graphs. We use known perfect one-factors of complete graphs to create constructions and decoding algorithms for both B-Code and its dual code. B-Code and its dual are optimal in the sense that (i) they are MDS, (ii) they have an optimal encoding property, i.e., the number of the parity bits that are affected by change of a single information bit is minimal and (iii) they have optimal length. The existence of perfect one-factorizations for every complete graph with an even number of nodes is a 35 years long conjecture in graph theory. The construction of B-codes of arbitrary odd length will provide an affirmative answer to the conjecture.https://authors.library.caltech.edu/records/21bjn-9hj28Deterministic voting in distributed systems using error-correcting codes
https://resolver.caltech.edu/CaltechAUTHORS:XULieeetpds98
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/71.706052
Distributed voting is an important problem in reliable computing. In an N Modular Redundant (NMR) system, the N computational modules execute identical tasks and they need to periodically vote on their current states. In this paper, we propose a deterministic majority voting algorithm for NMR systems. Our voting algorithm uses error-correcting codes to drastically reduce the average case communication complexity. In particular, we show that the efficiency of our voting algorithm can be improved by choosing the parameters of the error-correcting code to match the probability of the computational faults. For example, consider an NMR system with 31 modules, each with a state of m bits, where each module has an independent computational error probability of 10^-3. In, this NMR system, our algorithm can reduce the average case communication complexity to approximately 1.0825 m compared with the communication complexity of 31 m of the naive algorithm in which every module broadcasts its local result to all other modules. We have also implemented the voting algorithm over a network of workstations. The experimental performance results match well the theoretical predictions.https://authors.library.caltech.edu/records/5kcm3-eqy92Programmable neural logic
https://resolver.caltech.edu/CaltechAUTHORS:BOHieeetcpmtb98
Authors: Bohossian, Vasken; Hasler, Paul; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/96.730415
Circuits of threshold elements (Boolean input, Boolean output neurons) have been shown to be surprisingly powerful. Useful functions such as XOR, ADD and MULTIPLY can be implemented by such circuits more efficiently than by traditional AND/OR circuits. In view of that, we have designed and built a programmable threshold element. The weights are stored on polysilicon floating gates, providing long-term retention without refresh. The weight value is increased using tunneling and decreased via hot electron injection. A weight is stored on a single transistor allowing the development of dense arrays of threshold elements. A 16-input programmable neuron was fabricated in the standard 2 μm double-poly, analog process available from MOSIS.
We also designed and fabricated the multiple threshold element introduced in [5]. It presents the advantage of reducing the area of the layout from O(n^2) to O(n); (n being the number of variables) for a broad class of Boolean functions, in particular symmetric Boolean functions such as PARITY.
A long term goal of this research is to incorporate programmable single/multiple threshold elements, as building blocks in field programmable gate arrays.https://authors.library.caltech.edu/records/21z5w-1t664Partial-sum queries in OLAP data cubes using covering codes
https://resolver.caltech.edu/CaltechAUTHORS:HOCieeetc98
Authors: Ho, Ching-Tien; Bruck, Jehoshua; Agrawal, Rakesh
Year: 1998
DOI: 10.1109/12.737680
A partial-sum query obtains the summation over a set of specified cells of a data cube. We establish a connection between the covering problem in the theory of error-correcting codes and the partial-sum problem and use this connection to devise algorithms for the partial-sum problem with efficient space-time trade-offs. For example, using our algorithms, with 44 percent additional storage, the query response time can be improved by about 12 percent; by roughly doubling the storage requirement, the query response time can be improved by about 34 percent.https://authors.library.caltech.edu/records/z54j1-r2r11Highly available distributed storage systems
https://resolver.caltech.edu/CaltechAUTHORS:20200709-082541313
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1999
DOI: 10.1007/bfb0110096
Information is generated, processed, transmitted and stored in various forms: text, voice, image, video and multimedia types. Here all these forms will be treated as general data. As the need for data increases exponentially with the passage of time and the increase of computing power, data storage becomes more and more important. From scientific computing to business transactions, data is the most precious part. How to store the data reliably and efficiently is the essential issue, that is the focus of this chapter.https://authors.library.caltech.edu/records/ssd41-n0y60X-code: MDS array codes with optimal encoding
https://resolver.caltech.edu/CaltechAUTHORS:XULieeetit99b
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1999
DOI: 10.1109/18.746809
We present a new class of MDS (maximum distance separable) array codes of size n×n (n a prime number) called X-code. The X-codes are of minimum column distance 3, namely, they can correct either one column error or two column erasures. The key novelty in X-code is that it has a simple geometrical construction which achieves encoding/update optimal complexity, i.e., a change of any single information bit affects exactly two parity bits. The key idea in our constructions is that all parity symbols are placed in rows rather than columns.https://authors.library.caltech.edu/records/2s8tf-z1m79Splitting the Scheduling Headache
https://resolver.caltech.edu/CaltechPARADISE:1999.ETR030
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 1999
The broadcast disk provides an effective way to transmit information from a server to many
clients. Information is broadcast cyclically and clients pick the information they need out of the
broadcast. An example of such a system is a wireless web service where web servers broadcast to
browsing clients. Work has been done to schedule the information broadcast so as to minimize
the expected waiting time of the clients. This work has treated the information as indivisible
blocks that are transmitted in their entirety. We propose a new way to schedule the broadcast of
information, which involves splitting items into smaller sub-items, which need not be broadcast
immediately after each other. This relaxes the previous restrictions, and hence allows us to
have better schedules with lower expected waiting times. We look at the case of two items of
the same length, each split into two halves, and show that we can achieve optimal performance
by choosing the appropriate schedule from a small set of schedules. We derive a set of optimal
schedules and show which one to use, as a function of the demand probabilities. In fact we prove
the surprising result that there are only two possible types of optimal cyclic schedules for items
1 and 2. The first starts with 1122 and the second with 122122. For example, with demand
probabilities p1 = .19 and p2 = 31, the best order to use in broadcasting the halves of items 1
and 2 is a cyclic schedule with cycle 122122.https://authors.library.caltech.edu/records/1nev3-5px41Efficient Exact Stochastic Simulation of Chemical Systems with Many Species and Many Channels
https://resolver.caltech.edu/CaltechPARADISE:1999.ETR031
Authors: Gibson, Michael A.; Bruck, Jehoshua
Year: 1999
There are two fundamental ways to view coupled systems of chemical equations: as continuous, represented
by differential equations whose variables are concentrations, or as discrete, represented by stochastic
processes whose variables are numbers of molecules. Although the former is by far more common, systems
with very small numbers of molecules are important in some applications, e.g., in small biological cells
or in surface processes. In both views, most complicated systems with multiple reaction channels and
multiple chemical species cannot be solved analytically. There are exact numerical simulation methods to
simulate trajectories of discrete, stochastic systems, methods that are rigorously equivalent to the
Master Equation approach, but they do not scale well to systems with many reaction pathways.
This paper presents the Next Reaction Method, an exact algorithm to simulate coupled chemical reactions
that is also efficient: it (a) uses only a single random number per simulation event, and (b) takes time
proportional to the logarithm of the number of reactions, not to the number of reactions itself. The
Next Reaction Method is extended to include time-dependent rate constants and non-Markov processes and
it is applied to a sample application in biology: the lysis/lysogeny decision circuit of lambda phage.
When run on lambda the Next Reaction Method requires approximately 1/15th as many operations as a
standard implementation of the existing methods.https://authors.library.caltech.edu/records/68ta2-vg371Computing in the RAIN: A Reliable Array of Independent Nodes
https://resolver.caltech.edu/CaltechPARADISE:1999.ETR029
Authors: Bohossian, Vasken; Fan, Charles C.; LeMahieu, Paul S.; Riedel, Marc D.; Xu, Lihao; Bruck, Jehoshua
Year: 1999
The RAIN project is a research collaboration between Caltech and NASA-JPL on
distributed computing and data storage systems for future spaceborne missions. The goal of the
project is to identify and develop key building blocks for reliable distributed systems built with
inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster
of computing and/or storage nodes connected via multiple interfaces to networks configured
in fault-tolerant topologies. The RAIN software components run in conjunction with
operating system services and standard network protocols. Through software-implemented fault
tolerance, the system tolerates multiple node, link, and switch failures, with no single point of
failure. The RAIN technology has been transfered to RAINfinity, a start-up company focusing
on creating clustered solutions for improving the performance and availability of Internet data
centers.
In this paper we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures; 2) fault
management techniques based on group membership; and 3) data storage schemes based on
computationally efficient error-control codes. We present several proof-of-concept applications:
highly available video and web servers, and a distributed checkpointing system.https://authors.library.caltech.edu/records/sykbz-p5r83A Possible Solution to the Impossible Membership Problem
https://resolver.caltech.edu/CaltechPARADISE:1999.ETR032
Authors: Franceschetti, Massimo; Bruck, Jehoshua
Year: 1999
This paper presents a solvable specification and gives an algorithm for the Group
Membership Problem in asynchronous systems with crash failures. Our specification
requires processes to maintain a consistent history in their sequence of views. This
allows processes to order failures and recoveries in time and simplifies the programming
of high level applications. Previous work proved that the Group Membership Problem
cannot be solved in asynchronous systems with crash failures. We circumvent this
impossibility result building a weaker, yet non-trivial specification. We show that our
solution is an improvement upon previous attempts to solve this problem using a weaker
specification. We also relate our solution to other methods, and give a classification of
progress properties that can be achieved under different models.https://authors.library.caltech.edu/records/mys1x-vty92A Consistent History Link Connectivity Protocol
https://resolver.caltech.edu/CaltechAUTHORS:20111207-091413566
Authors: LeMahieu, Paul; Bruck, Jehoshua
Year: 1999
DOI: 10.1109/IPPS.1999.760448
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating reliable distributed systems by leveraging commercially available personal computers and interconnect technologies. Fault-tolerance is introduced into the communication infrastructure by using multiple network interfaces per compute node. When using multiple network connections per compute node, the question of how to monitor connectivity between nodes arises. We examine a connectivity protocol that guarantees that each side of a point-to-point connection sees the same history of activity over the communication channel. In other words, we maintain a consistent history of the state of the channel. The history of channel-state is guaranteed to be identical at each endpoint within some bounded slack. Our main contributions are: (i) a simple, stable protocol for monitoring connectivity that maintains a consistent history with bounded slack, and (ii) proofs that this protocol exhibits correctness, bounded slack, and stability.https://authors.library.caltech.edu/records/8k1s5-da382Efficient digital-to-analog encoding
https://resolver.caltech.edu/CaltechAUTHORS:GIBieeetit99
Authors: Gibson, Michael A.; Bruck, Jehoshua
Year: 1999
DOI: 10.1109/18.771156
An important issue in analog circuit design is the problem of digital-to-analog conversion, i.e., the encoding of Boolean variables into a single analog value which contains enough information to reconstruct the values of the Boolean variables. A natural question is: what is the complexity of implementing the digital-to-analog encoding function? That question was answered by Wegener (see Inform. Processing Lett., vol.60, no.1, p.49-52, 1995), who proved matching lower and upper bounds on the size of the circuit for the encoding function. In particular, it was proven that [(3n-1)/2] 2-input arithmetic gates are necessary and sufficient for implementing the encoding function of n Boolean variables. However, the proof of the upper bound is not constructive. In this paper, we present an explicit construction of a digital-to-analog encoder that is optimal in the number of 2-input arithmetic gates. In addition, we present an efficient analog-to-digital decoding algorithm. Namely, given the encoded analog value, our decoding algorithm reconstructs the original Boolean values. Our construction is suboptimal in that it uses constants of maximum size n log n bits; the nonconstructive proof uses constants of maximum size 2n+[log n] bits.https://authors.library.caltech.edu/records/b6f6r-g8w29Low-density MDS codes and factors of complete graphs
https://resolver.caltech.edu/CaltechAUTHORS:XULieeetit99a
Authors: Xu, Lihao; Bohossian, Vasken; Bruck, Jehoshua; Wagner, David G.
Year: 1999
DOI: 10.1109/18.782102
We present a class of array code of size n×l, where l=2n or 2n+1, called B-Code. The distances of the B-Code and its dual are 3 and l-1, respectively. The B-Code and its dual are optimal in the sense that i) they are maximum-distance separable (MDS), ii) they have an optimal encoding property, i.e., the number of the parity bits that are affected by change of a single information bit is minimal, and iii) they have optimal length. Using a new graph description of the codes, we prove an equivalence relation between the construction of the B-Code (or its dual) and a combinatorial problem known as perfect one-factorization of complete graphs, thus obtaining constructions of two families of the B-Code and its dual, one of which is new. Efficient decoding algorithms are also given, both for erasure correcting and for error correcting. The existence of perfect one-factorizations for every complete graph with an even number of nodes is a 35 years long conjecture in graph theory. The construction of B-Codes of arbitrary odd length will provide an affirmative answer to the conjecture.https://authors.library.caltech.edu/records/6gk8r-c3h23Tolerating Faults in Counting Networks
https://resolver.caltech.edu/CaltechAUTHORS:20190830-101628653
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2000
DOI: 10.1007/978-1-4615-4549-1_12
Counting networks were proposed by Aspnes, Herlihy and Shavit [3] as a low-contention concurrent data structure for multiprocessor coordination. We address the issue of tolerating faults in counting networks. In our fault model, balancer objects experience responsive crash failures: they behave correctly until they fail, and thereafter they are inaccessible. We propose two methods for tolerating such faults. The first is based on a construction of a k-fault-tolerant balancer with 2(K + 1) bits of memory. All balancers in a counting network are replaced by fault-tolerant ones. Thus, a counting network with depth O(log2 n), where n is the width, is transformed into a k-fault-tolerant counting network with depth O(k log^2 n).
We also consider the case where inaccessible balancers can be remapped to spare balancers. We present a bound on the error in the output token distribution of counting networks with remapped faulty balancers (a generalization of the error bound for sorting networks with faulty comparators presented by Yao & Yao [10]).
Our second method for tolerating faults is based on the construction of a correction network. Given a token distribution with a bounded error, the correction network produces a token distribution that is smooth (i.e., the number of tokens on each output wire differs by at most one — a weaker condition than the step property of counting networks). The correction network is constructed with fault-tolerant balancers. It is appended to a counting network in which faulty balancers are remapped to spare balancers. In order to tolerate k faults, the correction network has depth 2k(k + l)(log n + 1), for a network of width n. Therefore, this method results in a network with a smaller depth provided that O(k) < O(log n). However, it is only applicable if it is possible to remap faulty balancers.https://authors.library.caltech.edu/records/gd1gn-tsn15On the Possibility of Group Membership Protocols
https://resolver.caltech.edu/CaltechAUTHORS:20200127-124216616
Authors: Franceschetti, Massimo; Bruck, Jehoshua
Year: 2000
DOI: 10.1007/978-1-4615-4549-1_4
Chandra et al. [5] showed that the group membership problem cannot be solved in asynchronous systems with crash failures. We identify the main assumptions required for their proof and show how to circumvent this impossibility result building a weaker, yet non trivial specification. We provide an algorithm that solves this specification and show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification.https://authors.library.caltech.edu/records/cmjqd-6vz06Splitting Schedules for Internet Broadcast Communication
https://resolver.caltech.edu/CaltechPARADISE:2000.ETR034
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2000
The broadcast disk provides an effective way to transmit information from a server to many
clients. Information is broadcast cyclically and clients pick the information they need out of the
broadcast. An example of such a system is a wireless web service where web servers broadcast
to browsing clients. Work has been done to schedule the broadcast of information in a way
that minimizes the expected waiting time of the clients. This work has treated the information
as indivisible blocks. We propose a new way to schedule the broadcast of information, which
involves splitting items into smaller pieces that need not be broadcast consecutively. This relaxes
the previous restrictions, and allows us to have better schedules with lower expected waiting
times. We look at the case of two items of the same length, each split into two halves, and show
that we can achieve optimal performance by choosing the appropriate schedule from a small set
of schedules. We derive a set of optimal schedules and show which one to use, as a function of
the demand probabilities. In fact we prove the surprising result that there are only two possible
types of optimal cyclic schedules for items 1 and 2. These start with 1122 and 122122. For
example, with demand probabilities p subscript1 = .08 and p subscript2 = .92, the best order to use in broadcasting
the halves of items 1 and 2 is a cyclic schedule with cycle 122122222. We also show that much
of the analysis remains the same if we consider items of different lengths. We present numerical
data that suggests that the set of optimal schedules for different length items also consists of
two types, starting with 1122 and 122122. For example, with demand probabilities p subscriptl = .08 and
p subscript2 = .92 as above but lsubscript2 = 2lsubscript1, the best schedule is 11222222.https://authors.library.caltech.edu/records/03dfq-5x652Coding for Tolerance and Detection of Skew in Parallel Asynchronous Communications
https://resolver.caltech.edu/CaltechPARADISE:2000.ETR033
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 2000
Abstract to be added.https://authors.library.caltech.edu/records/5bnpk-3aj79Efficient Exact Stochastic Simulation of Chemical Systems with Many Species and Many Channels
https://resolver.caltech.edu/CaltechAUTHORS:20170719-082029624
Authors: Gibson, Michael A.; Bruck, Jehoshua
Year: 2000
DOI: 10.1021/jp993732q
There are two fundamental ways to view coupled systems of chemical equations: as continuous, represented by differential equations whose variables are concentrations, or as discrete, represented by stochastic processes whose variables are numbers of molecules. Although the former is by far more common, systems with very small numbers of molecules are important in some applications (e.g., in small biological cells or in surface processes). In both views, most complicated systems with multiple reaction channels and multiple chemical species cannot be solved analytically. There are exact numerical simulation methods to simulate trajectories of discrete, stochastic systems, (methods that are rigorously equivalent to the Master Equation approach) but these do not scale well to systems with many reaction pathways. This paper presents the Next Reaction Method, an exact algorithm to simulate coupled chemical reactions that is also efficient: it (a) uses only a single random number per simulation event, and (b) takes time proportional to the logarithm of the number of reactions, not to the number of reactions itself. The Next Reaction Method is extended to include time-dependent rate constants and non-Markov processes and is applied to a sample application in biology (the lysis/lysogeny decision circuit of lambda phage). The performance of the Next Reaction Method on this application is compared with one standard method and an optimized version of that standard method.https://authors.library.caltech.edu/records/zkvwc-5wc86MDS array codes for correcting a signle criss-cross error
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit00b
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 2000
DOI: 10.1109/18.841187
We present a family of maximum-distance separable (MDS) array codes of size (p-1)×(p-1), p a prime number, and minimum criss-cross distance 3, i.e., the code is capable of correcting any row or column in error, without a priori knowledge of what type of error occurred. The complexity of the encoding and decoding algorithms is lower than that of known codes with the same error-correcting power, since our algorithms are based on exclusive-OR operations over lines of different slopes, as opposed to algebraic operations over a finite field. We also provide efficient encoding and decoding algorithms for errors and erasures.https://authors.library.caltech.edu/records/rnkg3-kxn52Scaffold proteins may biphasically affect the levels of mitogen-activated protein kinase signaling and reduce its threshold properties
https://resolver.caltech.edu/CaltechAUTHORS:LEVpnas00
Authors: Levchenko, Andre; Bruck, Jehoshua; Sternberg, Paul W.
Year: 2000
PMCID: PMC18517
In addition to preventing crosstalk among related signaling pathways, scaffold proteins might facilitate signal transduction by preforming multimolecular complexes that can be rapidly activated by incoming signal. In many cases, such as mitogen-activated protein kinase (MAPK) cascades, scaffold proteins are necessary for full activation of a signaling pathway. To date, however, no detailed biochemical model of scaffold action has been suggested. Here we describe a quantitative computer model of MAPK cascade with a generic scaffold protein. Analysis of this model reveals that formation of scaffold-kinase complexes can be used effectively to regulate the specificity, efficiency, and amplitude of signal propagation. In particular, for any generic scaffold there exists a concentration value optimal for signal amplitude. The location of the optimum is determined by the concentrations of the kinases rather than their binding constants and in this way is scaffold independent. This effect and the alteration of threshold properties of the signal propagation at high scaffold concentrations might alter local signaling properties at different subcellular compartments. Different scaffold levels and types might then confer specialized properties to tune evolutionarily conserved signaling modules to specific cellular contexts.https://authors.library.caltech.edu/records/pv9g2-e7t51Computing in the RAIN: A Reliable Array of Independent Nodes
https://resolver.caltech.edu/CaltechAUTHORS:20190828-102317828
Authors: Bohossian, Vasken; Fan, Charles C.; LeMahieu, Paul S.; Riedel, Marc D.; Xu, Lihao; Bruck, Jehoshua
Year: 2000
DOI: 10.1007/3-540-45591-4_167
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through softw are-implemented fault tolerance, the system tolerates multiplenode, link, and switch failures, with no single point of failure. The RAIN technology has been transfered to RAIN finity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures; 2) fault management techniques based on group membership; and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: highly available video and web servers, and a distributed checkpointing system.https://authors.library.caltech.edu/records/20fm6-qhq29Splitting the Scheduling Headache
https://resolver.caltech.edu/CaltechAUTHORS:20111117-110955967
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2000
DOI: 10.1109/ISIT.2000.866787
The broadcast disk provides an effective way to transmit information from a server to many clients. Information is broadcast cyclically and clients pick the information they need out of the broadcast. An example of such a system is a wireless Web service where Web servers broadcast to browsing clients. Work has been done to schedule the information broadcast so as to minimize the expected waiting time of the clients. This work has treated the information as indivisible blocks that are transmitted in their entirety. We propose a new way to schedule the broadcast of information, which involves splitting items into smaller sub-items, which need not be broadcast consecutively. This relaxes the restrictions on scheduling and allows for better schedules. We look at the case of two items of the same length, each split into two halves, and show that we can achieve optimal performance by choosing the appropriate schedule from a small set of scheduleshttps://authors.library.caltech.edu/records/2bykj-fba55Tolerating multiple faults in multistage interconnection networks with minimal extra stages
https://resolver.caltech.edu/CaltechAUTHORS:FANieeetc00
Authors: Fan, Chenggong Charles; Bruck, Jehoshua
Year: 2000
DOI: 10.1109/12.869334
Adams and Siegel (1982) proposed an extra stage cube interconnection network that tolerates one switch failure with one extra stage. We extend their results and discover a class of extra stage interconnection networks that tolerate multiple switch failures with a minimal number of extra stages. Adopting the same fault model as Adams and Siegel, the faulty switches can be bypassed by a pair of demultiplexer/multiplexer combinations. It is easy to show that, to maintain point to point and broadcast connectivities, there must be at least S extra stages to tolerate I switch failures. We present the first known construction of an extra stage interconnection network that meets this lower-bound. This 12-dimensional multistage interconnection network has n+f stages and tolerates I switch failures. An n-bit label called mask is used for each stage that indicates the bit differences between the two inputs coming into a common switch. We designed the fault-tolerant construction such that it repeatedly uses the singleton basis of the n-dimensional vector space as the stage mask vectors. This construction is further generalized and we prove that an n-dimensional multistage interconnection network is optimally fault-tolerant if and only if the mask vectors of every n consecutive stages span the n-dimensional vector space.https://authors.library.caltech.edu/records/c097g-9ph90Coding for tolerance and detection of skew in parallel asynchronous communications
https://resolver.caltech.edu/CaltechAUTHORS:BLAieeetit00
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 2000
DOI: 10.1109/18.887847
We provide a new definition for the concept of skew in parallel asynchronous communications introduced by Blaum and Bruck (1993). The new definition extends and strengthens previously known results on skew. We give necessary and sufficient conditions for codes that can tolerate a certain amount of skew under the new definition. We also extend the results to codes that can tolerate a certain amount of skew and detect a larger amount of skew when the tolerating threshold is exceededhttps://authors.library.caltech.edu/records/ma02m-d0258Interval Modulation Coding
https://resolver.caltech.edu/CaltechPARADISE:2001.ETR040
Authors: Mukhtar, Saleem; Bruck, Jehoshua
Year: 2001
In this paper we introduce a new paradigm for
storage and communication. We call this paradigm Interval
Modulation Coding. Both in the context of
communication and storage, one needs to measure the elapsed time
between voltage transitions or voltage pulses.
Conventionally, this measurement is made by a clock, by counting clock
pulses. Analog circuits (or clocks of higher frequency) can
also be used to measure elapsed time. And in this case the
set of permissible time intervals no longer has to consist of
consecutive integer multiples of the clock period but can be
chosen in accordance with a probabilistic model of
measurement error. We will show that this can potentially provide
substantial improvements in terms of bandwitdth and
storage density over coding techniques deployed in real storage
and communication systems. We provide a mechanism for
encoding and decoding data based on variable length to
variable length prefix free codes. We show that such codes can
be constructed using integer linear programming. From a
theoretical standpoint, we study the linear programming
relaxation of the integer linear program associated with code
construction. We provide an efficient algorithm for
determining if the linear programming relaxation is feasible and
an efficient algorithm for solving the linear programming
relaxation, assuming it is feasible.https://authors.library.caltech.edu/records/6yhbd-vjn12Frequency Modulation for Asynchronous Data Transfer
https://resolver.caltech.edu/CaltechPARADISE:2001.ETR036
Authors: Mukhtar, Saleem; Bruck, Jehoshua
Year: 2001
Consider a communication channel that consists of several subchannels transmitting simultaneously and
asynchronously. As an example of this scheme, consider a board with two chips (transmitter and receiver). The subchannels represent wires connecting between the chips where differences in the lengths of the wires might result in asynchronous reception. The contribution of this paper is a scheme which allows pipelined asynchronous communication at very high rates even when the amount of skew is arbitrarily large and unknown apriori. Insensitivity to delay is accomplished by encoding data in the frequency of the signal, as opposed to amplitude. The theoretical questions that are answered are what rates can be accomplished. In doing so we have extended the work of Capocelli and Spickerman on generalized Fibonacci numbers. The second question that we answer is
how to encode data efficiently in the frequency of the signal. For the purposes of encoding and decoding
we use variable length to variable length prefix-free codes. We have provided an algorithm based on
integer linear programming for constructing such codes. In essence, we have formulated a scheme which is easy to implement and allows for asynchronous data transfer at very high rates. Potential applications are in on-chip, on-board and board to board communication, enabling much higher bandwidths.https://authors.library.caltech.edu/records/e7s3t-e9466Diversity Coloring for Distributed Storage in Mobile Networks
https://resolver.caltech.edu/CaltechPARADISE:2001.ETR038
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2001
Storing multiple copies of files is crucial for ensuring quality of service for data storage in
mobile networks. This paper proposes a new scheme, called the K-out-of-N file distribution scheme, for
the placement of files. In this scheme files are splitted, and Reed-Solomon codes or other maximum
distance seperable (MDS) codes are used to produce file segments containing parity information. Multiple
copies of the file segments are stored on gateways in the network in such a way that every gateway can
retrieve enough file segments from itself and its neighbors within a certain amount of hops for
reconstructing the orginal files. The goal is to minimize the maximum number of hops it takes for any
gateway to get enough file segments for the file reconstruction.
We formulate the K-out-of-N file distribution scheme as a coloring problem we call diversity coloring.
A diversity coloring is defined to be optimal if it uses the smallest number of colors. Upper and lower
bounds on the performance of diversity coloring for general graphs are studied. Diversity coloring
algorithms for several special classes of graphs - trees, rings and tori - are presented, all of which
have linear time complexity. Both the algorithm for trees and the algorithm for rings output optimal
diversity colorings. The algorithm for tori guarantees to output optimal diversity coloring when the
sizes of tori are sufficiently large.https://authors.library.caltech.edu/records/c0g4e-34404Covering Algorithms, Continuum Percolation, and the Geometry of Wireless Networks.
https://resolver.caltech.edu/CaltechPARADISE:2001.ETR037
Authors: Booth, Lorna; Bruck, Jehoshua; Franceschetti, Massimo; Meester, Ronald
Year: 2001
Continuum percolation models where each point of a two-dimensional Poisson
point process is the center of a disc of given (or random) radius r, have been
extensively studied. In this paper, we consider the generalization in which a
deterministic algorithm (given the points of the point process) places the discs
on the plane, in such a way that each disc covers at least one point of the point
process and that each point is covered by at least one disc. This gives a model
for wireless communication networks, which was the original motivation to study
this class of problems.
We look at the percolation properties of this generalized model, showing the
almost sure non-existence of an unbounded connected component of discs for
small values of the density lambda of the Poisson point process, for any covering
algorithm. In general, it turns out not to be true that unbounded connected
components arise when lambda is taken sufficiently high. However, we identify some
large families of covering algorithms, for which such an unbounded component
does arise for large values of lambda.
We show how a simple scaling operation can change the percolation properties
of the model, leading to the almost sure existence of an unbounded connected
component for large values of lambda, for any covering algorithm.
Finally, we show that a large class of covering algorithms, that arise in many
practical applications, can get arbitrarily close to achieving a minimal density of
covering discs. We also show (constructively) the existence of algorithms that
achieve this minimal density.https://authors.library.caltech.edu/records/2kpct-h9a29A Geometric Theorem for Approximate Disk Covering Algorithms
https://resolver.caltech.edu/CaltechPARADISE:2001.ETR035
Authors: Franceschetti, Massimo; Cook, Matthew; Bruck, Jehoshua
Year: 2001
We present a basic theorem in combinatorial geometry that leads to a family of approximation algorithms for the the geometric disk covering problem. These algorithms exhibit constant approximation factors, with a wide range of their choices. This flexibility allows to achieve a running time that compares favourably with those of existing procedures..https://authors.library.caltech.edu/records/j0t0g-0px95Computing in the RAIN: a reliable array of independent nodes
https://resolver.caltech.edu/CaltechAUTHORS:BOHieeetpds01
Authors: Bohossian, Vasken; Fan, Chenggong C.; LeMahieu, Paul S.; Riedel, Marc D.; Xu, Lihao; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/71.910866
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology.https://authors.library.caltech.edu/records/y4cys-77k02Introduction to the special section on dependable network computing
https://resolver.caltech.edu/CaltechAUTHORS:AVRieeetpds01
Authors: Avresky, D. R.; Bruck, Jehoshua; Culler, David E.
Year: 2001
DOI: 10.1109/TPDS.2001.910865
Dependable network computing is becoming a key part of our daily economic and social life. Every day, millions of users and businesses are utilizing the Internet infrastructure for real-time electronic commerce transactions, scheduling important events, and building relationships. While network traffic and the number of users are rapidly growing, the mean-time between failures (MTTF) is surprisingly short; according to recent studies, in the majority of Internet backbone paths, the MTTF is 28 days. This leads to a strong requirement for highly dependable networks, servers, and software systems. The challenge is to build interconnected systems, based on available technology, that are inexpensive, accessible, scalable, and dependable. This special section provides insights into a number of these exciting challenges.https://authors.library.caltech.edu/records/cdqjv-fnx20The Raincore Distributed Session Service for Networking Elements
https://resolver.caltech.edu/CaltechAUTHORS:20111110-152519883
Authors: Fan, Chenggong Charles; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/IPDPS.2001.925154
Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer
protocols for networking elements. These protocols are
designed to enable a network cluster to share the state
information necessary for balancing network traffic and
computation load among a group of networking elements.
In addition, in the presence of failures, they allow
network traffic to fail-over from failed networking
elements to healthy ones. To maximize the overall
network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity's first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals.https://authors.library.caltech.edu/records/2hcmq-02r44Time Division is Better Than Frequency Division for Periodic Internet Broadcast of Dynamic Data
https://resolver.caltech.edu/CaltechAUTHORS:20111117-083940997
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/ISIT.2001.936021
We consider two ways to send items over a broadcast channel and compare them using the metric of expected waiting time. The first is frequency division, where each item is broadcast on its own subchannel of lower bandwidth. We find the optimal allocation of bandwidth to the subchannels for this method. Then we look at time division, where items are sent sequentially on a single full-bandwidth channel. We show that for any frequency division broadcast schedule, we can find a better time division schedule. Thus time division is better than frequency division.https://authors.library.caltech.edu/records/gx3g7-23g16The Raincore API for clusters of networking elements
https://resolver.caltech.edu/CaltechAUTHORS:FANieeeic01
Authors: Fan, Chenggong Charles; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/4236.957897
Clustering technology offers a way to increase overall reliability and performance of Internet information flow by strengthening one link in the chain without adding others. We have implemented this technology in a distributed computing architecture for network elements. The architecture, called Raincore, originated in the Reliable Array of Independent Nodes, or RAIN, research collaboration between the California Institute of Technology and the US National Aeronautics and Space Agency's Jet Propulsion Laboratory. The RAIN project focused on developing high-performance, fault-tolerant, portable clustering technology for spaceborne computing . The technology that emerged from this project became the basis for a spinoff company, Rainfinity, which has the exclusive intellectual property rights to the RAIN technology. The authors describe the Raincore conceptual architecture and distributed services, which are designed to make it easy for developers to port their applications to run on top of a cluster of networking elements. We include two applications: a Web server prototype that was part of the original RAIN research project and a commercial firewall cluster product from Rainfinity.https://authors.library.caltech.edu/records/w0jh4-vgp11A group membership algorithm with a practical specification
https://resolver.caltech.edu/CaltechAUTHORS:FRAieeetpds01
Authors: Franceschetti, Martin; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/71.969128
Presents a solvable specification and gives an algorithm for the group membership problem in asynchronous systems with crash failures. Our specification requires processes to maintain a consistent history in their sequences of views. This allows processes to order failures and recoveries in time and simplifies the programming of high level applications. Previous work has proven that the group membership problem cannot be solved in asynchronous systems with crash failures. We circumvent this impossibility result building a weaker, yet nontrivial specification. We show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification. We also relate our solution to other methods and give a classification of progress properties that can be achieved under different models.https://authors.library.caltech.edu/records/3sj39-fx517Time-Division is Better Than Frequency-Division for Periodic Internet Broadcasting
https://resolver.caltech.edu/CaltechPARADISE:2002.ETR042
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2002
The broadcast disk provides an effective way to transmit information from a server to many
clients. Information is broadcast cyclically and clients pick the information they need out of the
broadcast. An example of such a system is a wireless web service where web servers broadcast
to browsing clients. We consider two ways to send items over a broadcast channel and compare
them using the metric of expected waiting time. The first is frequency-division, where each
item is broadcast on its own subchannel of lower bandwidth. We find the optimal allocation of
bandwidth to the subchannels using this method. Then we look at time-division, where items
are sent sequentially on a single full-bandwidth channel. For items of equal length, we show
that for any frequency-division broadcast schedule, we can find a better time-division schedule.
Thus time-division is better than frequency-division.https://authors.library.caltech.edu/records/s09d6-97m41Microcellular Systems, Random Walks, and Wave Propagation
https://resolver.caltech.edu/CaltechPARADISE:2002.ETR045
Authors: Franceschetti, Massimo; Bruck, Jehoshua; Schulman, Leonard J.
Year: 2002
As the number of users of wireless services increases, the concept of using smaller
cell sizes becomes especially attractive because of its potential for capacity increase.
Current technology allows to build base stations for small cells in a cost effective
way, and telecommunication companies have started exploiting the new microcellular
concept in providing coverage to densely populated areas. Prediction of propagation
characteristics in this new scenario is essential for accurate link budget calculations in
network planning.
In this paper a new, simple model of wave propagation for microcellular systems
is applied to predict the path loss of a wireless channel. The model does not rely on
the classical theory of electromagnetic wave propagation, but it is entirely based on
probability theory. We consider the canonical scenario of a random environment of
partially absorbing scatterers and model the trajectory of each photon in the system
as a random walk. This model leads to a path loss formula that rather accurately (in comparison to other models and experimental data) describes the smooth transition
of power attenuation from an inverse square law with the distance to the transmitter
to an exponential attenuation as this distance is increased. This result can justify
empirical formulas that are often used for path loss prediction, characterized by a
breakpoint distance at which the exponent of a power law is increased from a value of
approximately 2 to a value in the range of 4 to 10.
Theoretical predictions of the model are validated by showing agreement with experimental data collected in the city of Rome, Italy.https://authors.library.caltech.edu/records/1ryzf-n1606Memory Allocation in Information Storage Networks
https://resolver.caltech.edu/CaltechPARADISE:2002.ETR048
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2002
We propose a file storage scheme which bounds the file-retrieving delays in a hetrerogeneous information network, under both fault-free and faulty circumstances. The scheme combines coding with storage for better performance. We study the memory allocation problem for the scheme, which is to decide how much data to store on each node, with the objective of minimizing the total amount of data stored in the network. This problem is NP-hard for general networks. We present three polynomial-time algorithms which solve the memory allocation problem for tree networks. The first two algorithms are for tree networks with and without upper bounds on nodes' memory sizes respectively. The third algorithm finds, among all the optimal solutions for the tree network, the solution that minimizes the greatest memory size of single nodes. By combining these memory allocation algorithms with known data-interleaving techniques, a complete solution to realize the file storage scheme in tree networks is established.https://authors.library.caltech.edu/records/g4g98-68990Interleaving Schemes on Circulant Graphs
https://resolver.caltech.edu/CaltechPARADISE:2002.ETR046
Authors: Slivkins, Aleksandrs; Bruck, Jehoshua
Year: 2002
Interleaving schemes are used for error-correcting on a noisy channel. We consider interleaving schemes on infinite circulant graphs with two offsets 1 and d, with a goal to
minimize the interleaving degree. Our constructions are minimal covers of the graph by copies of some subgraph S that can be labeled by a single label. We focus on minimizing the index of S - an inverse of its density rounded up. We establish lower bounds and prove that our constructions are optimal or almost optimal, both for the index of S and for the interleaving degree. We identify related combinatorial questions and advance conjectures.https://authors.library.caltech.edu/records/bak03-gez83DNAS: Dispersed Network Attached Storage for Reliability and Performance
https://resolver.caltech.edu/CaltechPARADISE:2002.ETR043
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2002
NOTE: Text or symbols not renderable in plain ASCII are indicated by [...]. Abstract included in .pdf
document.
With the advent of merging between communication and
storage, there is an increasing need for developing dis-
tributed data layout schemes for network attached
storage that address reliability and performance challenges.
This paper proposes a novel scheme for storing
information on networks. In particular, for a fault-free
operation, it provides the ability to retrieve data by accessing
network nodes within a small proximity. In the event of
faults, data is guaranteed to be retrieved by exploring a
slightly larger proximity.
The problem of designing layout schemes, namely pro-
viding Dispersed Network Attached Storage (DNAS), is
formulated as a graph coloring problem that we call
Layered Diversity Coloring. Consider the following problem:
given a graph G(V,E) and N colors, how to color vertices
of G so that every vertex can find at least [...]
In this paper we study the layered diversity coloring
problem where the graph G(V,E) is a tree. A coloring
algorithm of time complexity [...] is
presented, and the sufficient and necessary condition for
there to exist a layered diversity coloring on a tree follows
the algorithm.https://authors.library.caltech.edu/records/2jz7r-g7w76Coding and Scheduling for Efficient Loss-Resilient Data Broadcasting
https://resolver.caltech.edu/CaltechPARADISE:2002.ETR049
Authors: Foltz, Kevin; Xu, Lihao; Bruck, Jehoshua
Year: 2002
We examine the problem of sending data to clients over a broadcast channel in a way that minimizes the expected waiting time of the clients for this data. This channel, however, is not completely reliable, and packets are occasionally lost. This poses a problem, as performance is greatly degraded by even a single packet loss. For example, one lost packet will increase our expected waiting time for an item from .75 to 2 or 167%, when sending two items with equal demands. We propose and analyze two solutions that attempt to minimize this degradation. In the first, we code packets and in the second we code packets and slightly modify our schedule. The resulting degradations are 67% for the first solution and less than 1% for the second. We conclude that using the second scheme is a very effective way to combat single packet losses, and we extend this solution to combat up to t packet losses per data item for any t [ ] k, where k is the number of packets per data item.https://authors.library.caltech.edu/records/18d5j-cqj70Ad hoc wireless networks with noisy links
https://resolver.caltech.edu/CaltechPARADISE:2002.ETR047
Authors: Booth, Lorna; Bruck, Jehoshua; Cook, Matthew; Franceschetti, Massimo
Year: 2002
Models of ad-hoc wireless networks are often based on the geometric disc abstraction: transmission is assumed to be isotropic, and reliable communication channels are assumed to assumed to exist (apart from interference) between nodes closer than a given distance. In reality communication channels are unreliable and communication range is generally not rotationally symmetric. In this paper we examine how these issues affect network connectivity. Using ideas from percolation theory, we compare networks of geometric discs to other simple shapes, including probabilistic connections, and find that when transmission range and node density are normalized across experiments so as to preserve the expected number of connections (ENC) enjoyed by each node, the discs are the "hardest" shape to connect together. In other words, anisotropic radiation patterns and spotty coverage allow an unbounded connected component to appear at lower ENC levels than perfect circular coverage allows. This indicates that connectivity claims made in the literature using the geometric disc abstraction will in general hold also for the more irregular shapes found in practice.https://authors.library.caltech.edu/records/bhdsr-zxz48A Geometric Theorem for Wireless Network Design Optimization
https://resolver.caltech.edu/CaltechPARADISE:2002.ETR044
Authors: Franceschetti, Massimo; Cook, Matthew; Bruck, Jehoshua
Year: 2002
Consider an infinite square grid G. How many
discs of given radius r, centered at the vertices of G, are
required, in the worst case, to completely cover an arbitrary disc of radius r placed on the plane? We show that this number is an integer in the set (3.4; 5.6) whose value depends on the ratio of r to the grid spacing.
This result can be applied at the very early design stage of
a wireless cellular network to determine, under the recent
International Telecommunication Union (ITU) proposal for
a traffic load model, and under the assumption that each
client is able to communicate if it is within a certain range from a base station, conditions for which a grid network design is cost effective, for any expected traffic demand.https://authors.library.caltech.edu/records/edv6k-qm527Splitting schedules for Internet broadcast communication
https://resolver.caltech.edu/CaltechAUTHORS:FOLieeetit02.854
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2002
DOI: 10.1109/18.978728
The broadcast disk provides an effective way to transmit information from a server to many clients. Work has been done to schedule the broadcast of information in a way that minimizes the expected waiting time of the clients. Much of this work has treated the information as indivisible blocks. We look at splitting items into smaller pieces that need not be broadcast consecutively. This allows us to have better schedules with lower expected waiting times. We look at the case of two items of the same length, each split into two halves, and show how to achieve optimal performance. We prove the surprising result that there are only two possible types of optimal cyclic schedules for items 1, and 2. These start with 1122 and 122122. For example, with demand probabilities p1= 0.08 and p2= 0.92, the best order to use in broadcasting the halves of items 1 and 2 is a cyclic schedule with cycle 122122222. We also look at items of different lengths and show that much of the analysis remains the same, resulting in a similar set of optimal schedules.https://authors.library.caltech.edu/records/gvhjj-4rk59Robustness of Time-Division Schedules for Internet Broadcast
https://resolver.caltech.edu/CaltechAUTHORS:20111102-132442999
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2002
DOI: 10.1109/ISIT.2002.1023655
The model we consider consists of a server and many clients. The clients have a large incoming bandwidth and little or no outgoing bandwidth. The server repeatedly broadcasts information through the air to the clients. There are two information items with lengths l_1 and l_2, and demand probabilities p_1 and p_2. The demand probability of an item is simply the relative frequency of requests for that item by the clients, scaled such that the sum of the p_i's is 1. These items contain static data. This allows us to receive data out of order and use parts of different broadcasts to reassemble items. The metric we use to evaluate broadcast schedules is expected waiting time. This is the expected time a client must wait for an item, averaged over all items and clients, with weight p_i for item i.https://authors.library.caltech.edu/records/yj1sf-v7v74Power requirements for connectivity in clustered wireless networks
https://resolver.caltech.edu/CaltechAUTHORS:20111102-090203478
Authors: Booth, L.; Bruck, J.; Franceschetti, M.; Meester, R.
Year: 2002
DOI: 10.1109/ISIT.2002.1023625
We consider wireless networks in which a subset of the nodes provide coverage to clusters of clients and route data packets from source to destination. We generalize previous work of Gilbert (1961), deriving conditions on the communication range of the nodes and on the placement of the covering stations to provide, with probability one, some long distance multi-hop communication. One key result is that the network can almost surely (as.) provide some
long distance multi-hop communication, regardless of
the algorithm used to place the covering stations, if
the density of the clients is high enough and their
communication range is less than half the communication
range of the base stations. As the ratio between the two communication ranges becomes greater than half, a malicious covering algorithm that never provides long distance, multi-hop communication in the network, exists even if we constrain the base station to be placed at the vertices of a fixed grid-which is the typical scenario in the case of commercial networks.https://authors.library.caltech.edu/records/xhx3n-c9340Interval modulation coding
https://resolver.caltech.edu/CaltechAUTHORS:20111027-155844420
Authors: Mukhtar, Saleem; Bruck, Jehoshua
Year: 2002
DOI: 10.1109/ISIT.2002.1023599
We propose a new modulation scheme and a new architecture for the design of communication and storage systems. The modulation scheme is based on modulating pulse width and the architecture is based on time measurement circuitry.https://authors.library.caltech.edu/records/y20w9-mmk20Diversity Coloring for information storage in networks
https://resolver.caltech.edu/CaltechAUTHORS:20111019-133944822
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2002
DOI: 10.1109/ISIT.2002.1023653
We propose a new file placement scheme using MDS codes, and formulate it as the diversity coloring Problem. We then present an optimal diversity coloring algorithm for trees.https://authors.library.caltech.edu/records/kbvpn-at979Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations
https://resolver.caltech.edu/CaltechPARADISE:1994.ETR002
Authors: Bruck, Jehoshua; Dolev, Danny; Ho, Ching-Tien; Roşu, Marcel-Cătălin; Strong, Ray
Year: 2002
DOI: 10.1145/215399.215421
Parallel computing on clusters of workstations and personal computers has very high
potential, since it leverages existing hardware and software. Parallel programming
environments offer the user a convenient way to express parallel computation and communication.
In fact, recently, a Message Passing Interface (MPI) has been proposed as an industrial
standard for writing "portable" message-passing parallel programs. The communication
part of MPI consists of the usual point-to-point communication as well as collective
communication. However, existing implementations of programming environments for clusters
are built on top of a point-to-point communication layer (send and receive) over local
area networks (LANs) and, as a result, suffer from poor performance in the collective
communication part.
In this paper, we present an efficient design and implementation of the collective
communication part in MPI that is optimized for clusters of workstations. Our system consists
of two main components: the MPI-CCL layer that includes the collective communication
functionality of MPI and a User-level Reliable Transport Protocol (URTP) that interfaces
with the LAN Data-link layer and leverages the fact that the LAN is a broadcast medium.
Our system is integrated with the operating system via an efficient kernel extension
mechanism that we developed. The kernel extension significantly improves the performance of
our implementation as it can handle part of the communication overhead without involving
user space.
We have implemented our system on a collection of IBM RS/6000 workstations con-
nected via a lOMbit Ethernet LAN. Our performance measurements are taken from typical
scientific programs that run in a parallel mode by means of the MPI. The hypothesis behind
our design is that system's performance will be bounded by interactions between the kernel
and user space rather than by the bandwidth delivered by the LAN Data-Link Layer. Our
results indicate that the performance of our MPI Broadcast (on top of Ethernet) is about
twice as fast as a recently published software implementation of broadcast on top of ATM.https://authors.library.caltech.edu/records/y075m-mfm96The synthesis of cyclic combinational circuits
https://resolver.caltech.edu/CaltechAUTHORS:20111012-143707754
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2003
Digital circuits are called combinational if they are memoryless: they have outputs that depend only on the current values of the inputs. Combinational circuits are generally thought of as acyclic (i.e., feed-forward) structures. And yet, cyclic circuits can be combinational. Cycles sometimes occur in designs synthesized from high-level descriptions. Feedback in such cases is carefully contrived, typically occurring when functional units axe connected in a cyclic topology. Although the premise of cycles in combinational circuits has been accepted, and analysis techniques have been proposed, no one has attempted the synthesis of circuits with feedback at the logic level.
We propose a general methodology for the synthesis of multilevel combinational circuits with cyclic topologies. Our approach is to introduce feedback in the substitution / minimization phase, optimizing a multilevel network description for area. In trials with benchmark circuits, many were optimized significantly, with improvements of up to 30% in the area.
We argue the case for radically rethinking the concept of "combinational" in circuit design: we should no longer think of combinational logic as acyclic in theory or in practice, since nearly all combinational circuits are best designed with cycles.https://authors.library.caltech.edu/records/0pjae-t9h73The Synthesis of Cyclic Combinatorial Circuits
https://resolver.caltech.edu/CaltechPARADISE:ETR052
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2003
To be added.https://authors.library.caltech.edu/records/gc4vq-v3988The Synthesis of Cyclic Combinatorial Circuits
https://resolver.caltech.edu/CaltechPARADISE:ETR052a
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2003
To be added.https://authors.library.caltech.edu/records/cy22z-4f190Percolation in Multi-hop Wireless Networks
https://resolver.caltech.edu/CaltechPARADISE:2003.ETR055
Authors: Franceschetti, Massimo; Booth, Lorna; Cook, Matthew; Meester, Ronald; Bruck, Jehoshua
Year: 2003
To be addedhttps://authors.library.caltech.edu/records/azx95-b8232Multi-Cluster interleaving in linear arrays and rings
https://resolver.caltech.edu/CaltechPARADISE:2003.ETR051
Authors: Bruck, Jehoshua; Jiang, Anxiao (Andrew)
Year: 2003
Interleaving codewords is an important method not only for combatting burst-errors, but also for flexible data-retrieving. This paper defines the Multi-Cluster Interleaving (MCI) problem, an interleaving problem for parallel data-retrieving. The MCI problems on linear arrays and rings are studied. The following problem is completely solved: how to interleave integers on a linear array or ring such that any m (m greater than or equal to 2) non-overlapping segments of length 2 in the array or ring have at least 3 distinct integers. We then present a scheme using a 'hierarchical-chain structure' to solve the following more general problem for linear arrays: how to interleave integers on a linear array such that any m (m greater than or equal to 2) non-overlapping segments of length L (L greater than or equal to 2) in the array have at least L + 1 distinct integers. It is shown that the scheme using the 'hierarchical-chain structure' solves the second interleaving problem for arrays that are asymptotically as long as the longest array on which an MCI exists, and clearly, for shorter arrays as well.https://authors.library.caltech.edu/records/0prc7-09k21Interleaving Schemes on Circulant Graphs with Two Offsets
https://resolver.caltech.edu/CaltechPARADISE:2003.ETR054
Authors: Slivkins, Aleksandrs; Bruck, Jehoshua
Year: 2003
To be added.https://authors.library.caltech.edu/records/9814j-0a422Algorithmic Aspects of Cyclic Combinational Circuit Synthesis
https://resolver.caltech.edu/CaltechPARADISE:ETR053
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2003
Digital circuits are called combinational if they are memoryless: if they have outputs that depend only on the current values of the inputs. Combinational circuits are generally thought of as acyclic (i.e., feed-forward) structures. And yet, cyclic circuits can be combinational. Cycles sometimes occur in designs synthesized from high-level descriptions, as well as in bus-based designs [16]. Feedback in such cases is carefully contrived, typically occurring when functional units are connected in a cyclic topology. Although the premise of cycles in combinational circuits has been accepted, and analysis techniques have been proposed [7], no one has attempted the synthesis of circuits with feedback at the logic level.
We have argued the case for a paradigm shift in combinational circuit design [10]. We should no longer think of combinational logic as acyclic in theory or in practice, since most combinational circuits are best designed with cycles. We have proposed a general methodology for the synthesis of multilevel networks with cyclic topologies and incorporated it in a general logic synthesis environment. In trials, benchmark circuits were optimized significantly, with improvements of up to 30%I n the area. In this paper, we discuss algorithmic aspects of cyclic circuit design. We formulate a symbolic framework for analysis based on a divide-and-conquer strategy. Unlike previous approaches, our method does not require ternary-valued simulation. Our analysis for combinationality is tightly coupled with the synthesis phase, in which we assemble a combinational network from smaller combinational components. We discuss the underpinnings of the heuristic search methods and present examples as well as synthesis results for benchmark circuits.
In this paper, we discuss algorithmic aspects of cyclic circuit design. We formulate a symbolic framework for analysis based on a divide-and-conquer strategy. Unlike previous approaches, our method does not require ternary-valued simulation. Our analysis for combinationality is tightly coupled with the synthesis phase, in which we assemble a combinational network from smaller combinational components. We discuss the underpinnings of the heuristic search methods and present examples as well as synthesis results for benchmark circuits.https://authors.library.caltech.edu/records/j22pr-2a704Algebraic techniques for constructing minimal weight threshold functions
https://resolver.caltech.edu/CaltechAUTHORS:BOHsiamjdm03
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 2003
DOI: 10.1137/S0895480197326048
A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The best known lower bounds on the size of threshold circuits are for depth-2 circuits with small (polynomial-size) weights. However, in general, the weights are arbitrary integers and can be of exponential size in the number of input variables. Namely, obtaining progress in lower bounds for threshold circuits seems to be related to understanding the role of large weights. In the present literature, a distinction is made between the two extreme cases of linear threshold functions with polynomial-size weights, as opposed to those with exponential-size weights. Our main contributions are in devising two novel methods for constructing threshold functions with minimal weights and filling up the gap between polynomial and exponential weight growth by further refining the separation. Namely, we prove that the class of linear threshold functions with polynomial-size weights can be divided into subclasses according to the degree of the polynomial. In fact, we prove a more general result — that there exists a minimal weight linear threshold function for any arbitrary number of inputs and any weight size.https://authors.library.caltech.edu/records/namf7-9qd10Optimal Content Placement for En-Route Web Caching
https://resolver.caltech.edu/CaltechPARADISE:ETR050
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2003
DOI: 10.1109/NCA.2003.1201132
This paper studies the optimal placement of web files for en-route web caching. It is shown that existing placement policies are all solving restricted partial problems of the file placement problem, and therefore give only sub-optimal solutions. A dynamic programming algorithm of low complexity which computes the optimal solution is presented. It is shown both analytically and experimentally that the file-placement solution output by our algorithm outperforms existing en-route caching policies. The optimal placement of web files can be implemented with a reasonable level of cache coordination and management overhead for en-route caching; and importantly, it can be achieved with or without using data prefetching.https://authors.library.caltech.edu/records/t4bc1-d9a83Covering algorithms, continuum percolation and the geometry of wireless networks
https://resolver.caltech.edu/CaltechAUTHORS:BOOaoap03
Authors: Booth, Lorna; Bruck, Jehoshua; Franceschetti, Massimo; Meester, Ronald
Year: 2003
DOI: 10.1214/aoap/1050689601
Continuum percolation models in which each point of a two-dimensional Poisson point process is the centre of a disc of given (or random) radius r, have been extensively studied. In this paper, we consider the generalization in which a deterministic algorithm (given the points of the point process) places the discs on the plane, in such a way that each disc covers at least one point of the point process and that each point is covered by at least one disc. This gives a model for wireless communication networks, which was the original motivation to study this class of problems.
We look at the percolation properties of this generalized model, showing that an unbounded connected component of discs does not exist, almost surely, for small values of the density lambda of the Poisson point process, for any covering algorithm. In general, it turns out not to be true that unbounded connected components arise when lambda is taken sufficiently high. However, we identify some large families of covering algorithms, for which such an unbounded component does arise for large values of lambda.
We show how a simple scaling operation can change the percolation properties of the model, leading to the almost sure existence of an unbounded connected component for large values of lambda, for any covering algorithm.
Finally, we show that a large class of covering algorithms, which arise in many practical applications, can get arbitrarily close to achieving a minimal density of covering discs. We also construct an algorithm that achieves this minimal density.https://authors.library.caltech.edu/records/nw85n-cq682Coding and scheduling for efficient loss-resilient data broadcasting
https://resolver.caltech.edu/CaltechAUTHORS:20111005-113409987
Authors: Foltz, Kevin; Xu, Lihao; Bruck, Jehoshua
Year: 2003
DOI: 10.1109/ISIT.2003.1228430
We examine the problem of sending data to clients over a broadcast channel in a way that minimizes the clients' expected waiting time for this data. This channel, however, is not completely reliable, and packets are occasionally lost. If items consist of k packets, k large, the loss of even a single packet can increase the expected waiting time by 167%. We propose and analyze two solutions that use coding to reduce this degradation. The resulting degradation is 67% for the first solution and less than 1% for the second. The second solution is extended to combat up to t packet losses per data item for any t≪k. This solution maintains near-optimal performance even with packet losses.https://authors.library.caltech.edu/records/x2fmn-vkh31Ad hoc wireless networks with noisy links
https://resolver.caltech.edu/CaltechAUTHORS:20111005-091318608
Authors: Booth, Lorna; Bruck, Jehoshua; Cook, Matthew; Franceschetti, Massimo
Year: 2003
DOI: 10.1109/ISIT.2003.1228402
Models of ad-hoc wireless networks are often based on the geometric disc abstraction: transmission is assumed to be isotropic, and reliable communication channels are assumed to exist (apart from interference) between nodes closer than a given distance. In reality communication channels are unreliable and communication range is generally not rotationally symmetric. In this paper we examine how
these issues affect network connectivity.https://authors.library.caltech.edu/records/f2npx-a7660Bridging Paradigm Gaps Between Biology and Engineering
https://resolver.caltech.edu/CaltechAUTHORS:20111025-143359852
Authors: Bruck, Johoshua
Year: 2003
DOI: 10.1109/CSB.2003.1227290
Computing and communications are well understood topics in engineering. However, we are very much at the beginning of the road to understanding those mechanisms in biological systems. I'll argue that progress in biology will require better understanding of biologically inspired paradigms for computing and communications. In particular, I'll discuss some initial results related to asynchronous circuits with feedback and to delay insensitive communications.https://authors.library.caltech.edu/records/7trtc-y3788Scheduling for Efficient Data Broadcast over Two Channels
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR056
Authors: Foltz, Kevin; Xu, Lihao; Bruck, Jehoshua
Year: 2004
The broadcast disk provides a way to distribute data to many clients simultaneously. A central server fixes a set of data and a schedule for sending it, and then repeatedly sends the data according to the schedule. Clients listen for data until it is broadcast. We look at the problem of scheduling for two separate channels, where each can have a different broadcast schedule. Our metric for measuring schedule performance is expected delivery time (EDT), the expected value of the total elapsed time between when a client starts listening for data and when the client is completely finished receiving the data. We fix the first channel with a schedule that is optimal for an average case, and look at how to schedule for the second channel.We show two interesting results for sending two items over two channels. The first is that all schedules with equal portions of the two items in the second channel have the same EDT. The second is that for a situation that is symmetric in the two items the optimal schedule is asymmetric with respect to these items.https://authors.library.caltech.edu/records/788gb-dba94Optimal Interleaving on Tori
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR059
Authors: Jiang, Anxiao (Andrew); Cook, Matthew; Bruck, Jehoshua
Year: 2004
We study t-interleaving on two-dimensional tori, which is defined by the property that any connected subgraph with t or fewer vertices in the torus is labelled by all distinct integers. It has applications in distributed data storage and burst error correction, and is closely related to Lee metric codes. We say that a torus can be perfectly t-interleaved if its t-interleaving number – the minimum number of distinct integers needed to t-interleave the torus – meets the spherepacking lower bound. We prove the necessary and sufficient conditions for tori that can be perfectly t-interleaved, and present efficient perfect t-interleaving constructions. The most important contribution of this paper is to prove that the t-interleaving numbers of tori large enough in both dimensions, which constitute by far the majority of all existing cases, is at most one more than
the sphere-packing lower bound, and to present an optimal and efficient t-interleaving scheme for them. Then we prove some bounds on the t-interleaving numbers for other cases, completing a general picture for the t-interleaving problem on 2-dimensional tori.https://authors.library.caltech.edu/records/thsck-vr733A Geometric Theorem for Network Design
https://resolver.caltech.edu/CaltechAUTHORS:FRAieeetc04
Authors: Franceschetti, Massimo; Cook, Matthew; Bruck, Jehoshua
Year: 2004
DOI: 10.1109/TC.2004.1268406
Consider an infinite square grid G. How many discs of given radius r, centered at the vertices of G, are required, in the worst case, to completely cover an arbitrary disc of radius r placed on the plane? We show that this number is an integer in the set {3,4,5,6} whose value depends on the ratio of r to the grid spacing. One application of this result is to design facility location algorithms with constant approximation factors. Another application is to determine if a grid network design, where facilities are placed on a regular grid in a way that each potential customer is within a reasonably small radius around the facility, is cost effective in comparison to a nongrid design. This can be relevant to determine a cost effective design for base station placement in a wireless network.https://authors.library.caltech.edu/records/6ys3y-24856Optimal Unviersal Schedules for Discrete Broadcast
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR057
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2004
In this paper we study the scenario in which a server sends dynamic data over a single broadcast channel to
a number of passive clients. We consider the data to consist of discrete packets, where each update is sent in a
separate packet. On demand, each client listens to the channel in order to obtain the most recent data packet. Such
scenarios arise in many practical applications such as the distribution of weather and traffic updates to wireless
mobile devices and broadcasting stock price information over the Internet.
To satisfy a request, a client must listen to at least one packet from beginning to end. We thus consider the design
of a broadcast schedule which minimizes the time that passes between a clients request and the time that it hears a
new data packet, i.e., the waiting time of the client. Previous studies have addressed this objective, assuming that
client requests are distributed uniformly over time. However, in the general setting, the clients behavior is difficult
to predict and might not be known to the server. In this work we consider the design of universal schedules that
guarantee a short waiting time for any possible client behavior. We define the model of dynamic broadcasting in
the universal setting, and prove various results regarding the waiting time achievable in this framework.https://authors.library.caltech.edu/records/37q5t-gxg91A random walk model of wave propagation
https://resolver.caltech.edu/CaltechAUTHORS:FRAieeetap04
Authors: Franceschetti, Massimo; Bruck, Jehoshua; Shulman, Leonard J.
Year: 2004
DOI: 10.1109/TAP.2004.827540
This paper shows that a reasonably accurate description of propagation loss in small urban cells can be obtained with a simple stochastic model based on the theory of random walks, that accounts for only two parameters: the amount of clutter and the amount of absorption in the environment. Despite the simplifications of the model, the derived analytical solution correctly describes the smooth transition of power attenuation from an inverse square law with the distance to the transmitter, to an exponential attenuation as this distance is increased - as it is observed in practice. Our analysis suggests using a simple exponential path loss formula as an alternative to the empirical formulas that are often used for prediction. Results are validated by comparison with experimental data collected in a small urban cell.https://authors.library.caltech.edu/records/ekacz-8vk35Timing Analysis of Cyclic Combinatorial Circuits
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR060.1160
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2004
The accepted wisdom is that combinational circuits must have acyclic (i.e., loop-free or feed-forward) topologies. And yet simple examples suggest that this need not be so. In previous work, we advocated the design of cyclic combinational circuits (i.e., circuits with loops or feedback paths). We proposed a methodology for analyzing and synthesizing such circuits, with an emphasis on the optimization of area.
In this paper, we extend our methodology into the temporal realm. We characterize the true delay of cyclic circuits through symbolic event propagation in the floating mode of operation, according to the up-bounded inertial delay model. We present analysis results for circuits optimized with our program CYCLIFY. Some benchmark circuits were optimized significantly, with simultaneous improvements of up to 10% in the area and 25% in the delay.https://authors.library.caltech.edu/records/33zeb-29k56A Combinatorial Bound on the List Size
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR058
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2004
In this paper we study the scenario in which a server sends dynamic data over a single broadcast channel to
a number of passive clients. We consider the data to consist of discrete packets, where each update is sent in a
separate packet. On demand, each client listens to the channel in order to obtain the most recent data packet. Such
scenarios arise in many practical applications such as the distribution of weather and traffic updates to wireless
mobile devices and broadcasting stock price information over the Internet.
To satisfy a request, a client must listen to at least one packet from beginning to end. We thus consider the design
of a broadcast schedule which minimizes the time that passes between a clients request and the time that it hears a new data packet, i.e., the waiting time of the client. Previous studies have addressed this objective, assuming that client requests are distributed uniformly over time. However, in the general setting, the clients behavior is difficult to predict and might not be known to the server. In this work we consider the design of universal schedules that guarantee a short waiting time for any possible client behavior. We define the model of dynamic broadcasting in
the universal setting, and prove various results regarding the waiting time achievable in this framework.https://authors.library.caltech.edu/records/0r8dz-xf638Scheduling for Efficient Data Broadcast over Two Channels
https://resolver.caltech.edu/CaltechAUTHORS:20110921-122835869
Authors: Foltz, Kevin; Xu, Lihao; Bruck, Jehoshua
Year: 2004
DOI: 10.1109/ISIT.2004.1365147
As wireless computer networks grow more popular, we are
faced with the problem of providing scalable,
high-bandwidth service to a growing number of users. In the wireless domain, "data push" promises to provide superior performance for many applications [1]. The broadcast domain that is typical of wireless communication is very effective in distributing information to large audiences. Work has been done to schedule data broadcast from a server to many clients using the broadcast disk model [3]. However, little of it has looked at methods for more than one channel. We examine a simple two-channel broadcast model and present some interesting scheduling results for this model.https://authors.library.caltech.edu/records/61fev-rtp87Regulatory modules that generate biphasic signal
response in biological systems
https://resolver.caltech.edu/CaltechAUTHORS:20111014-095736081
Authors: Levchenko, A.; Bruck, J.; Sternberg, P. W.
Year: 2004
DOI: 10.1049/sb:20045014
Biochemical networks might be composed of modules. It is still not clear how biochemical modules can be defined and characterised. Here we propose a functional approach to
module definition, considering different classes of biphasic regulation modules, which effect optimal cell response to intermediate signal strength. Each regulation class might possess unique properties that make it especially suitable for particular biological functions.https://authors.library.caltech.edu/records/v4mec-z2551Optimal t-Interleaving on Tori
https://resolver.caltech.edu/CaltechAUTHORS:20110818-083929592
Authors: Jiang, Anxiao (Andrew); Cook, Matthew; Bruck, Jehoshua
Year: 2004
DOI: 10.1109/ISIT.2004.1365060
The number of integers needed to t-interleave
a 2-dimensional torus has a sphere-packing
lower bound. We present the necessary and sufficient
conditions for tori to meet that lower bound. We
prove that for tori sufficiently large in both dimensions,
their t-interleaving numbers exceed the lower
bound by at most 1. We then show upper bounds on
t-interleaving numbers for other cases, completing a
general picture for the problem of t-interleaving on
2-dimensional tori. Efficient t-interleaving algorithms
are also presented.https://authors.library.caltech.edu/records/h1xvx-0qj37Miscorrection probability beyond the minimum distance
https://resolver.caltech.edu/CaltechAUTHORS:CASisit04
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2004
DOI: 10.1109/ISIT.2004.1365561
The miscorrection probability of a list decoder is the probability that the decoder will have at least one non-causal codeword in its decoding sphere. Evaluating this probability is important when using a list-decoder as a conventional decoder since in that case we require the list to contain at most one codeword for most of the errors. A lower bound on the miscorrection is the main result. The key ingredient in the proof is a new combinatorial upper bound on the list-size for a general q−ary block code. This bound is tighter than the best known on large alphabets, and it is shown to be very close to the algebraic bound for Reed-Solomon codes. Finally we discuss two known upper bounds on the miscorrection probability and unify them for linear MDS codes.https://authors.library.caltech.edu/records/rjbbz-vxg40Network File Storage With Graceful Performance Degradation
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR061
Authors: Jiang, Anxiao (Andrew); Bruck, Jehosua
Year: 2004
A file storage scheme is proposed for networks containing heterogeneous clients. In the scheme, the
performance measured by file-retrieval delays degrades gracefully under increasingly serious faulty
circumstances. The scheme combines coding with storage for better performance. The problem
is NP-hard for general networks; and this paper focuses on tree networks with asymmetric edges
between adjacent nodes. A polynomial-time memory-allocation algorithm is presented, which
determines how much data to store on each node, with the objective of minimizing the total
amount of data stored in the network. Then a polynomial-time data-interleaving algorithm is used
to determine which data to store on each node for satisfying the quality-of-service requirements in
the scheme. By combining the memory-allocation algorithm with the data-interleaving algorithm,
an optimal solution to realize the file storage scheme in tree networks is established.https://authors.library.caltech.edu/records/a9g9f-9px19Optimal Schedules for Asynchronous Transmission of Discrete Packets
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR062
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2004
In this paper we study the distribution of dynamic data over a broadcast channel to a large number of
passive clients. Clients obtain the information by accessing the channel and listening for the next available
packet. This scenario, referred to as packet-based or discrete broadcast, has many practical applications such
as the distribution of weather and traffic updates to wireless mobile devices, reconfiguration and reprogramming
of wireless sensors and downloading dynamic task information in battlefield networks.
The optimal broadcast protocols require a high degree of synchronization between the server and the
wireless clients. However, in typical wireless settings such degree of synchronization is difficult to achieve
due to the inaccuracy of internal clocks. Moreover, in some settings, such as military applications, synchronized
transmission is not desirable due to jamming. The lack of synchronization leads to large delays
and excessive power consumption. Accordingly, in this work we focus on the design of optimal broadcast
schedules that are robust to clock inaccuracy. We present universal schedules for delivery of up-to-date
information with minimum waiting time in asynchronous settings.https://authors.library.caltech.edu/records/aw3am-f5v45The Encoding Complexity of Network Coding
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR063
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2004
In the multicast network coding problem, a source s needs to deliver h packets to a set of k terminals over an underlying network G. The nodes of the coding network can be broadly categorized into two groups. The first group includes encoding nodes, i.e., nodes that generate new packets by combining data received from two or more incoming links. The second group includes forwarding nodes that can only duplicate and forward the incoming packets. Encoding nodes are, in general, more expensive due to the need to equip them with encoding capabilities. In addition, encoding nodes incur delay and increase the overall complexity of the network.
Accordingly, in this paper we study the design of multicast coding networks with a limited number of encoding nodes. We prove that in an acyclic coding network, the number of encoding nodes required to achieve the capacity of the network is bounded by h^3k^2. Namely, we present (efficiently constructible) network codes that achieve
capacity in which the total number of encoding nodes is independent of the size of the network and is bounded by h^3k^2. We show that the number of encoding nodes may depend both on h and k as we present acyclic instances of the multicast network coding problem in which [Omega](h^2k) encoding nodes are needed.
In the general case of coding networks with cycles, we show that the number of encoding nodes is limited by the size of the feedback link set, i.e., the minimum number of links that must be removed from the network in order to eliminate cycles. Specifically, we prove that the number of encoding nodes is bounded by (2B+1)h^3k^2, where B is the minimum size of the feedback link set. Finally, we observe that determining or even crudely approximating the minimum number of encoding nodes needed to achieve the capacity for a given instance of the network coding problem is NP-hard.https://authors.library.caltech.edu/records/1bv0x-ym004The encoding complexity of network coding
https://resolver.caltech.edu/CaltechAUTHORS:LANisit05b
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2005
In the multicast network coding problem, a source s needs to deliver h packets to a set of k terminals over an underlying network G. The nodes of the coding network can be broadly categorized into two groups. The first group includes encoding nodes, i.e., nodes that generate new packets by combining data received from two or more incoming links. The second group includes forwarding nodes that can only duplicate and forward the incoming packets. Encoding nodes are, in general, more expensive due to the need to equip them with encoding capabilities. In addition, encoding nodes incur delay and increase the overall complexity of the network. Accordingly, in this paper we study the design of multicast coding networks with a limited number of encoding nodes. We prove that in an acyclic coding network, the number of encoding nodes required to achieve the capacity of the network is bounded by h^3k^2. Namely, we present (efficiently constructible) network codes that achieve capacity in which the total number of encoding nodes is independent of the size of the network and is bounded by h^3k^2. We show that the number of encoding nodes may depend both on h and k as we present acyclic instances of the multicast network coding problem in which [Omega](h^2k) encoding nodes are needed. In the general case of coding networks with cycles, we show that the number of encoding nodes is limited by the size of the feedback link set, i.e., the minimum number of links that must be removed from the network in order to eliminate cycles. Specifically, we prove that the number of encoding nodes is bounded by (2B+1)h^3k^2, where B is the minimum size of the feedback link set. Finally, we observe that determining or even crudely approximating the minimum number of encoding nodes needed to achieve the capacity for a given instance of the network coding problem is NP-hard.https://authors.library.caltech.edu/records/qa637-30s54Staleness vs. waiting time in universal discrete broadcast
https://resolver.caltech.edu/CaltechAUTHORS:LANisit05a
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2005
In this paper we study the distribution of dynamic data over a broadcast channel to a large number of passive clients. The data is simultaneously distributed to clients in the form of discrete packets, each packet captures the most recent state of the information source. Clients obtain the information by accessing the channel and listening for the next available packet. This scenario, referred to as discrete broadcast, has many practical applications such as the distribution of stock information to wireless mobile devices and downloading up-to-date battle information in military networks.
Our goal is minimize the amount of time a client has to wait in order to obtain a new data packet, i.e., the waiting time of the client. We show that we can significantly reduce the waiting time by adding redundancy to the schedule. We identify universal schedules that guarantee low waiting time for any client, regardless of the access pattern.
A key point in the design of data distribution systems is to ensure that the transmitted information is always up-to-date. Accordingly, we introduce the notion of staleness that captures the amount of time that passes from the moment the information is generated, until it is delivered to the client. We investigate the fundamental trade-off between the staleness and the waiting time. In particular, we present schedules that yield lowest possible waiting time for any given staleness constraint.https://authors.library.caltech.edu/records/km3eg-16a62Optimal universal schedules for discrete broadcast
https://resolver.caltech.edu/CaltechAUTHORS:LANisit04
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2005
DOI: 10.1109/ISIT.2004.1365148
This paper investigates an efficient scheduling for sending dynamic data over lossless broadcast channels. A server transmits dynamic data periodically to a number of passive clients and thus the updated discrete packets are sent into a separate packet. The objective of this paper is to design universal schedules that minimize the time that passes between a client's request and the broadcast of a new item, independently of the client's behavior. From the results the optimal scheduling of high transmission rate for discrete broadcast data is obtained by considering adaptive clients.https://authors.library.caltech.edu/records/9x5h3-q5j37Network Coding for Nonuniform Demands
https://resolver.caltech.edu/CaltechPARADISE:2005.ETR064
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2005
In this paper we define nonuniform-demand networks as a useful connection model, in between multicasts and general connections. In these networks, the source has a pool of messages, and each sink demands a certain number of messages, without specifying their identities. We study the solvability of such networks and give a tight bound on the number of sinks that achieve capacity in a worst-case network. We propose constructions to solve networks at, or slightly below capacity, and investigate the effect large alphabets have on the solvability of such networks. We also show that our efficient constructions are suboptimal when used in networks with more sinks, yet this comes with little surprise considering the fact that the general problem is shown to be NP-hard.https://authors.library.caltech.edu/records/3t5b3-81t27Monotone Percolation and The Topology Control of Wireless Networks
https://resolver.caltech.edu/CaltechPARADISE:2005.ETR065
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2005
This paper addresses the topology control problem for large wireless networks that are modelled by an infinite point process on a two-dimensional plane. Topology control is the process of determining the edges in the network by adjusting
the transmission radii of the nodes. Topology control algorithms should be based on local decisions, be adaptive to changes, guarantee full connectivity and support efficient routing. We present a family of topology control algorithms that, respectively, achieve some or all of these requirements efficiently. The key idea in our algorithms is a concept that we call monotone percolation. In classical percolation theory, we are interested in the emergence of an infinitely large connected component. In contrast, in monotone percolation we are interested in the existence of a relatively short path that makes monotonic progress between any pair of source and destination nodes. Our key contribution is that we demonstrate how local decisions on the transmission radii can lead to monotone percolation and in turn to efficient topology control algorithms.https://authors.library.caltech.edu/records/qagwh-p5v35Multicluster interleaving on paths and cycles
https://resolver.caltech.edu/CaltechAUTHORS:JIAieeetit05
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2005
DOI: 10.1109/TIT.2004.840893
Interleaving codewords is an important method not only for combatting burst errors, but also for distributed data retrieval. This paper introduces the concept of multicluster interleaving (MCI), a generalization of traditional interleaving problems. MCI problems for paths and cycles are studied. The following problem is solved: how to interleave integers on a path or cycle such that any m (m/spl ges/2) nonoverlapping clusters of order 2 in the path or cycle have at least three distinct integers. We then present a scheme using a "hierarchical-chain structure" to solve the following more general problem for paths: how to interleave integers on a path such that any m (m/spl ges/2) nonoverlapping clusters of order L (L/spl ges/2) in the path have at least L+1 distinct integers. It is shown that the scheme solves the second interleaving problem for paths that are asymptotically as long as the longest path on which an MCI exists, and clearly, for shorter paths as well.https://authors.library.caltech.edu/records/7b4k9-c3k29Continuum Percolation with Unreliable and Spread-Out Connections
https://resolver.caltech.edu/CaltechAUTHORS:20191009-093219813
Authors: Franceschetti, Massimo; Booth, Lorna; Cook, Matthew; Meester, Ronald; Bruck, Jehoshua
Year: 2005
DOI: 10.1007/s10955-004-8826-0
We derive percolation results in the continuum plane that lead to what appears to be a general tendency of many stochastic network models. Namely, when the selection mechanism according to which nodes are connected to each other, is sufficiently spread out, then a lower density of nodes, or on average fewer connections per node, are sufficient to obtain an unbounded connected component. We look at two different transformations that spread-out connections and decrease the critical percolation density while preserving the average node degree. Our results indicate that real networks can exploit the presence of spread-out and unreliable connections to achieve connectivity more easily, provided they can maintain the average number of functioningconnections per node.https://authors.library.caltech.edu/records/nwdmc-pv183An automated system for measuring parameters of nematode sinusoidal movement
https://resolver.caltech.edu/CaltechAUTHORS:CRObmcg05
Authors: Cronin, Christopher J.; Mendel, Jane E.; Mukhtar, Saleem; Kim, Young-Mee; Stirbl, Robert C.; Bruck, Jehoshua; Sternberg, Paul W.
Year: 2005
DOI: 10.1186/1471-2156-6-5
PMCID: PMC549551
Background: Nematode sinusoidal movement has been used as a phenotype in many studies of C. elegans development, behavior and physiology. A thorough understanding of the ways in which genes control these aspects of biology depends, in part, on the accuracy of phenotypic analysis. While worms that move poorly are relatively easy to describe, description of hyperactive movement and movement modulation presents more of a challenge. An enhanced capability to analyze all the complexities of nematode movement will thus help our understanding of how genes control behavior.
Results: We have developed a user-friendly system to analyze nematode movement in an automated and quantitative manner. In this system nematodes are automatically recognized and a computer-controlled microscope stage ensures that the nematode is kept within the camera field of view while video images from the camera are stored on videotape. In a second step, the images from the videotapes are processed to recognize the worm and to extract its changing position and posture over time. From this information, a variety of movement parameters are calculated. These parameters include the velocity of the worm's centroid, the velocity of the worm along its track, the extent and frequency of body bending, the amplitude and wavelength of the sinusoidal movement, and the propagation of the contraction wave along the body. The length of the worm is also determined and used to normalize the amplitude and wavelength measurements.
To demonstrate the utility of this system, we report here a comparison of movement parameters for a small set of mutants affecting the Go/Gq mediated signaling network that controls acetylcholine release at the neuromuscular junction. The system allows comparison of distinct genotypes that affect movement similarly (activation of Gq-alpha versus loss of Go-alpha function), as well as of different mutant alleles at a single locus (null and dominant negative alleles of the goa-1 gene, which encodes Go-alpha). We also demonstrate the use of this system for analyzing the effects of toxic agents. Concentration-response curves for the toxicants arsenite and aldicarb, both of which affect motility, were determined for wild-type and several mutant strains, identifying P-glycoprotein mutants as not significantly more sensitive to either compound, while cat-4 mutants are more sensitive to arsenite but not aldicarb.
Conclusions: Automated analysis of nematode movement facilitates a broad spectrum of experiments. Detailed genetic analysis of multiple alleles and of distinct genes in a regulatory network is now possible. These studies will facilitate quantitative modeling of C. elegans movement, as well as a comparison of gene function. Concentration-response curves will allow rigorous analysis of toxic agents as well as of pharmacological agents. This type of system thus represents a powerful analytical tool that can be readily coupled with the molecular genetics of nematodes.https://authors.library.caltech.edu/records/mbs8e-rsm97An automated system for measuring parameters of nematode sinusoidal movement
https://resolver.caltech.edu/CaltechPARADISE:2005.ETR066
Authors: Cronin, Christopher J.; Mendel, Jane E.; Mukhtar, Saleem; Kim, Young-Mee; Stirbl, Robert C.; Bruck, Jehoshua; Sternberg, Paul W.
Year: 2005
Background: Nematode sinusoidal movement has been used as a phenotype in many studies of C. elegans development, behavior and physiology. A thorough understanding of the ways in which genes control these aspects of biology depends, in part, on the accuracy of phenotypic analysis. While worms that move poorly are relatively easy to describe, description of hyperactive movement and movement modulation presents more of a challenge. An enhanced capability to analyze all the complexities of nematode movement will thus help our understanding of how genes control behavior.
Results: We have developed a user-friendly system to analyze nematode movement in an automated and quantitative manner. In this system nematodes are automatically recognized and a computer-controlled microscope stage ensures that the nematode is kept within the camera field of view while video images from the camera are stored on videotape. In a second step, the images from the videotapes are processed to recognize the worm and to extract its changing position and posture over time. From this information, a variety of movement parameters are calculated. These parameters include the velocity of the worm's centroid, the velocity of the worm along its track, the extent and frequency of body bending, the amplitude and wavelength of the sinusoidal movement, and the propagation of the contraction wave along the body. The length of the worm is also determined and used to normalize the amplitude and wavelength measurements.
To demonstrate the utility of this system, we report here a comparison of movement parameters for a small set of mutants
affecting the Go/Gq mediated signaling network that controls acetylcholine release at the neuromuscular junction. The system allows comparison of distinct genotypes that affect movement similarly (activation of Gq-alpha versus loss of Go-alpha function), as well as of different mutant alleles at a single locus (null and dominant negative alleles of the goa-1 gene, which encodes Goalpha). We also demonstrate the use of this system for analyzing the effects of toxic agents. Concentration-response curves for the toxicants arsenite and aldicarb, both of which affect motility, were determined for wild-type and several mutant strains,
identifying P-glycoprotein mutants as not significantly more sensitive to either compound, while cat-4 mutants are more sensitive to arsenite but not aldicarb.
Conclusions: Automated analysis of nematode movement facilitates a broad spectrum of experiments. Detailed genetic analysis of multiple alleles and of distinct genes in a regulatory network is now possible. These studies will facilitate quantitative modeling of C. elegans movement, as well as a comparison of gene function. Concentration-response curves will allow rigorous analysis
of toxic agents as well as of pharmacological agents. This type of system thus represents a powerful analytical tool that can be readily coupled with the molecular genetics of nematodes.https://authors.library.caltech.edu/records/mzz80-z1b49Monotone Percolation and The Topology Control of Wireless Networks
https://resolver.caltech.edu/CaltechAUTHORS:20110818-114857462
Authors: Jiang, Anxiao; Bruck, Jehoshua
Year: 2005
DOI: 10.1109/INFCOM.2005.1497903
This paper addresses the topology control problem for large wireless networks that are modelled by an infinite point process on a two-dimensional plane. Topology control is the process of determining the edges in the network by adjusting the transmission radii of the nodes. Topology control algorithms should be based on local decisions, be adaptive to changes, guarantee full connectivity and support efficient routing. We present a family of topology control algorithms that, respectively, achieve some or all of these requirements efficiently. The key idea in our algorithms is a concept that we call monotone percolation. In classical percolation theory, we are interested in the emergence of an infinitely large connected component. In contrast, in monotone percolation we are interested in the existence of a relatively short path that makes monotonic progress between any pair of source and destination nodes. Our key contribution is that we demonstrate how local decisions on the transmission radii can lead to monotone percolation and in turn to efficient topology control algorithms.https://authors.library.caltech.edu/records/3zha7-xnt45Implementability Among Predicates
https://resolver.caltech.edu/CaltechPARADISE:2005.ETR067
Authors: Cook, Matthew; Bruck, Jehoshua
Year: 2005
Much work has been done to understand when given predicates (relations) on discrete variables can be conjoined to implement other predicates. Indeed, the lattice of "co-clones" (sets of predicates closed under conjunction, variable renaming, and existential quantification of variables) has been investigated steadily from the 1960's to the present. Here, we investigate a more general model, where duplicatability of values is not taken for granted. This model is motivated in part by large scale neural models, where duplicating a value is similar in cost to computing a function, and by quantum mechanics, where values cannot be duplicated. Implementations in this case are naturally given by a graph fragment in which vertices are predicates, internal edges are existentially quantified variables, and "dangling edges" (edges emanating from a vertex but not yet connected to another vertex) are the free variables of the implemented predicate. We examine questions of implementability among predicates in this scenario, and
we present the solution to all implementability problems for single predicates on up to three boolean values. However, we find that a variety of proof methods are required, and the question of implementability indeed becomes undecidable for larger predicates, although this is tricky to prove. We find that most predicates cannot implement the 3-way equality predicate, which reaffirms the view that duplicatability of values should not be assumed a priori.https://authors.library.caltech.edu/records/dg474-54385Network file storage with graceful performance degradation
https://resolver.caltech.edu/CaltechAUTHORS:20161107-163620734
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2005
DOI: 10.1145/1063786.1063788
A file storage scheme is proposed for networks containing heterogeneous clients. In the scheme, the performance measured by file-retrieval delays degrades gracefully under increasingly serious faulty circumstances. The scheme combines coding with storage for better performance. The problem is NP-hard for general networks; and this article focuses on tree networks with asymmetric edges between adjacent nodes. A polynomial-time memory-allocation algorithm is presented, which determines how much data to store on each node, with the objective of minimizing the total amount of data stored in the network. Then a polynomial-time data-interleaving algorithm is used to determine which data to store on each node for satisfying the quality-of-service requirements in the scheme. By combining the memory-allocation algorithm with the data-interleaving algorithm, an optimal solution to realize the file storage scheme in tree networks is established.https://authors.library.caltech.edu/records/yq3za-eej92Localization and routing in sensor networks by local angle information
https://resolver.caltech.edu/CaltechAUTHORS:20160811-163730860
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2005
DOI: 10.1145/1062689.1062713
Location information is very useful in the design of sensor network infrastructures. In this paper, we study the anchor-free 2D localization problem by using local angle measurements in a sensor network. We prove that given a unit disk graph and the angles between adjacent edges, it is NP-hard to find a valid embedding in the plane such that neighboring nodes are within distance 1 from
each other and non-neighboring nodes are at least distance 1 away. Despite the negative results, however, one can find a planar spanner of a unit disk graph by using only local angles. The planar spanner can be used to generate a set of virtual coordinates that enable efficient and local routing schemes such as geographical routing or approximate shortest path routing. We also proposed a practical
anchor-free embedding scheme by solving a linear program.
We show by simulation that not only does it give very good local embedding, i.e., neighboring nodes are close and non-neighboring nodes are far away, but it also gives a quite accurate global view such that geographical routing and approximate shortest path routing on the embedded graph are almost identical to those on the original
(true) embedding. The embedding algorithm can be adapted to
other models of wireless sensor networks and is robust to measurement noise.https://authors.library.caltech.edu/records/gbnzz-4by52Localization and Routing in Sensor Networks by Local Angle
Information
https://resolver.caltech.edu/CaltechPARADISE:2005.ETR068
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2005
DOI: 10.1145/1062689.1062713
Location information is very useful in the design of sensor network infrastructures. In this paper, we study the anchor-free 2D localization problem by using local angle measurements in a sensor network. We prove that given a unit disk graph and the angles between adjacent edges, it is NP-hard to find a valid embedding in the plane such that neighboring nodes are within distance 1 from each other and non-neighboring nodes are at least distance 1 away. Despite the negative results, however, one can find a planar spanner
of a unit disk graph by using only local angles. The planar spanner can be used to generate a set of virtual coordinates that enable efficient and local routing schemes such as geographical routing or approximate shortest path routing. We also proposed a practical anchor-free embedding scheme by solving a linear program. We show by simulation that not only does it give very good local embedding, i.e., neighboring nodes are close and non-neighboring nodes are far away, but it also gives a quite accurate global view
such that geographical routing and approximate shortest path routing on the embedded graph are almost identical to those on the original (true) embedding. The embedding algorithm can be adapted to other models of wireless sensor networks and is robust to measurement noise.https://authors.library.caltech.edu/records/8ybfg-bxc46MAP: Medial Axis Based Geometric Routing in Sensor Networks
https://resolver.caltech.edu/CaltechPARADISE:2005.ETR069
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2005
DOI: 10.1145/1080829.1080839
One of the challenging tasks in the deployment of dense wireless networks (like sensor networks) is in devising a routing scheme for node to node communication. Important consideration includes scalability, routing complexity, the length of the communication paths and the load sharing of the routes. In this paper, we show that a compact and expressive abstraction of network connectivity by the medial axis enables efficient and localized routing. We propose MAP, a Medial Axis based naming and routing Protocol that does not require locations, makes routing decisions locally, and achieves good load balancing. In its preprocessing phase, MAP constructs the medial axis of the sensor field, defined as the set of nodes with at least two closest boundary nodes. The medial axis of the network captures both the complex geometry and non-trivial topology of the sensor field. It can be represented compactly by a graph whose size is comparable with the complexity of the geometric features (e.g., the number of holes). Each node is then given a name related to its position with respect to the medial axis. The routing scheme is derived through local decisions based on the names of the source and destination nodes and guarantees delivery with reasonable and natural routes. We show by both theoretical analysis and simulations that our medial axis based geometric routing scheme is scalable, produces short routes, achieves excellent load balancing, and is very robust to variations in the network model.https://authors.library.caltech.edu/records/1cvf7-kcc88Anti-Jamming Schedules for Wireless Broadcast Systems
https://resolver.caltech.edu/CaltechPARADISE:2005.ETR070
Authors: Codenotti, Paolo; Sprintson, Alexander; Bruck, Jehoshua
Year: 2005
Modern society is heavily dependent on wireless networks for providing voice and data communications. Wireless data broadcast has recently emerged as an attractive way to disseminate data to a large number of clients. In data broadcast systems, the server proactively transmits the information on a downlink channel; the clients access the data by listening to the channel. Wireless data broadcast systems can serve a large number of heterogeneous clients, minimizing power consumption as well as protecting the privacy of the clients' locations.
The availability and relatively low cost of antennas resulted in a number of potential threats to the integrity of the wireless infrastructure. The existing solutions and schedules for wireless data broadcast are vulnerable to jamming, i.e., the use of active signals to prevent data distribution. The goal of jammers is to disrupt the normal operation of the broadcast system, which results in high waiting time and excessive power consumption for the clients.
In this paper we investigate efficient schedules for wireless data broadcast that perform well in the presence of a jammer. We show that the waiting time of client can be efficiently reduced by adding redundancy to the schedule. The main challenge in the design of redundant broadcast schedules is to ensure that the transmitted information is always up-to-date. Accordingly, we present schedules that guarantee low waiting time and low staleness of data in the presence of a jammer. We prove that our schedules are optimal if the jamming signal has certain energy limitations.https://authors.library.caltech.edu/records/7wa2a-91w72Network coding for non-uniform demands
https://resolver.caltech.edu/CaltechAUTHORS:CASisit05
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2005
DOI: 10.1109/ISIT.2005.1523639
Non-uniform demand networks are defined as a useful connection model, in between multicasts and general connections. In these networks, each sink demands a certain number of messages, without specifying their identities. We study the solvability of such networks and give a tight bound on the number of sinks for which the min cut condition is sufficient. This sufficiency result is unique to the non-uniform demand model and does not apply to general connection networks. We propose constructions to solve networks at, or slightly below capacity, and investigate the effect large alphabets have on the solvability of such networks. We also show that our efficient constructions are suboptimal when used in networks with more sinks, yet this comes with little surprise considering the fact that the general problem is shown to be NP-hard.https://authors.library.caltech.edu/records/26cx3-qxz51MAP: medial axis based geometric routing in sensor networks
https://resolver.caltech.edu/CaltechAUTHORS:20160811-164254714
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2005
DOI: 10.1145/1080829.1080839
One of the challenging tasks in the deployment of dense wireless networks (like sensor networks) is in devising a routing scheme for node to node communication. Important consideration includes scalability, routing complexity, the length of the communication paths and the load sharing of the routes. In this paper, we show that a compact and expressive abstraction of network connectivity by
the medial axis enables efficient and localized routing. We propose MAP, a Medial Axis based naming and routing Protocol that does not require locations, makes routing decisions locally, and achieves good load balancing. In its preprocessing phase, MAP constructs the medial axis of the sensor field, defined as the set of nodes with
at least two closest boundary nodes. The medial axis of the network captures both the complex geometry and non-trivial topology of the sensor field. It can be represented compactly by a graph whose size is comparable with the complexity of the geometric features (e.g., the number of holes). Each node is then given a name related to
its position with respect to the medial axis. The routing scheme is derived through local decisions based on the names of the source and destination nodes and guarantees delivery with reasonable and natural routes. We show by both theoretical analysis and simulations
that our medial axis based geometric routing scheme is scalable, produces short routes, achieves excellent load balancing, and is very robust to variations in the network model.https://authors.library.caltech.edu/records/2m2sg-avn58Networks of Relations for Representation, Learning, and Generalization
https://resolver.caltech.edu/CaltechPARADISE:2005.ETR071
Authors: Cook, Matthew; Bruck, Jehoshua
Year: 2005
We propose representing knowledge as a network of relations. Each relation relates only a few continuous or discrete variables, so that any overall relationship among the many variables treated by the network winds up being distributed throughout the network. Each relation encodes which combinations of values correspond to past experience for the variables related by the relation. Variables may or may not correspond to understandable aspects of the situation being modeled by the network. A distributed calculational process can be used to access the information stored in such a network, allowing the network to function as an associative memory. This process in its simplest form is purely inhibitory, narrowing down the space of possibilities as much as possible given the data to be matched. In contrast with methods that always retrieve a best fit for all variables, this method can return values for inferred variables while leaving non-inferable variables in an unknown or partially known state. In contrast with belief propagation methods, this method can be proven to converge quickly and uniformly for any network topology, allowing networks to be as interconnected as the relationships warrant, with no independence assumptions required. The generalization properties of such a memory are aligned with the network's relational representation of how the various aspects of the modeled situation are related.https://authors.library.caltech.edu/records/qmrfh-vb932Network Coding: A Computational Perspective
https://resolver.caltech.edu/CaltechAUTHORS:20110630-145653337
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/CISS.2006.286590
In this work, we study the computational perspective of network coding, focusing on two issues. First, we address the computational complexity of finding a network code for acyclic multicast networks. Second, we address the issue of reducing the amount of computation performed by the network nodes. In particular, we consider the problem of finding a network code with the minimum possible number of encoding nodes, i.e., nodes that generate new packets by combining the packets received over incoming links. We present a deterministic algorithm that finds a feasible network code for a multicast network over an underlying graph G(V, E) in time O(|E|kh+|V|k^2h^2+h^4k^3(k+h)), where k is the number of destinations and h is the number of packets. This improves the best known running time of O(|E|kh+|V|k^2h^2(k+h)) of Jaggi et al. (2005) in the typical case of large communication graphs. In addition, our algorithm guarantees that the number of encoding nodes in the obtained network code is bounded by O(h^3k^2). Next, we address the problem of finding a network code with the minimum number of encoding nodes in both integer and fractional coding networks. We prove that in the majority of settings this problem is NP-hard. However, we show that if h=O(1) and k=O(1) and the underlying communication graph is acyclic, then there exists an algorithm that solves this problem in polynomial time.https://authors.library.caltech.edu/records/4nwht-fpn50Adaptive Bloom filter
https://resolver.caltech.edu/CaltechPARADISE:2006.ETR072
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2006
A Bloom filter is a simple randomized data structure that answers membership query with no false negative and a small false positive probability. It is an elegant data compression technique for membership information, and has broad applications. In this paper, we generalize the traditional Bloom filter to Adaptive Bloom Filter, which incorporates the information on the query frequencies and the membership likelihood of the elements into its optimal design. It has been widely observed that in many applications, some popular elements are queried much more often than the others. The traditional Bloom filter for data sets with irregular query patterns and non-uniform membership likelihood can be further optimized. We derive the optimal configuration of the Bloom filter with query-frequency and membership-likelihood information, and show that the adapted Bloom filter always outperforms the traditional Bloom filter. Under reasonable frequency models such as the step distribution or the Zipf's distribution, the improvement of the false positive probability of the adaptive Bloom filter over that of the traditional Bloom filter is usually of orders of magnitude.https://authors.library.caltech.edu/records/cvg8d-d9d25On the Capacity of Precision-Resolution Constrained Systems
https://resolver.caltech.edu/CaltechPARADISE:2006.ETR073
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2006
Arguably, the most famous constrained system is the (d, k)-RLL (Run-Length Limited), in which a stream of bits obeys the constraint that every two 1's are separated by at least d 0's, and there are no more than k consecutive 0's anywhere in the stream. The motivation for this scheme comes from the fact that certain sensor characteristics restrict the minimum time between adjacent 1's or else the two will be merged in the receiver, while a clock drift between transmitter and receiver may cause spurious 0's or missing 0's at the receiver if too many appear consecutively.
The interval-modulation scheme introduced by Mukhtar and Bruck extends the RLL constraint and implicitly suggests a way of taking advantage of higher-precision clocks. Their work however, deals only with an encoder/decoder construction.
In this work we introduce a more general framework which we call the precision-resolution (PR) constrained system. In PR systems, the encoder has precision constraints, while the decoder has resolution constraints. We examine the capacity of PR systems and show the gain in the presence of a high-precision encoder (thus, we place the PR system with integral encoder, (p=1,alpha,theta)-PR, which turns out to be a simple extension of RLL, and the PR system with infinite-precision encoder, (infinity,alpha,theta)-PR, on two ends of a continuum). We derive an exact expression for their capacity in terms of the precision p, the minimal resolvable measurement at the decoder alpha, and the decoder resolution factor theta. In an analogy to the RLL terminology these are the clock precision, the minimal time between peaks, and the clock drift. Surprisingly, even with an infinite-precision encoder, the capacity is finite.https://authors.library.caltech.edu/records/y8vnv-426982020 Computing: Can computers help to explain biology?
https://resolver.caltech.edu/CaltechAUTHORS:20150319-090453634
Authors: Brent, Roger; Bruck, Jehoshua
Year: 2006
DOI: 10.1038/440416a
The road leading from computer formalisms to explaining biological function will be difficult, but Roger Brent and Jehoshua Bruck suggest three hopeful paths that could take us closer to this goal.https://authors.library.caltech.edu/records/fv21j-z6y85Network Coding: A Computational Perspective
https://resolver.caltech.edu/CaltechPARADISE:2006.ETR074
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2006
In this work, we study the computational perspective of network coding, focusing on two issues. First, we address the computational complexity of finding a network code for acyclic multicast networks. Second, we address the issue of reducing the amount of computation performed by network nodes. In particular, we consider the problem of finding a network code with the minimum possible number of encoding nodes, i.e., nodes that generate new packets by combining the packets received over incoming links.
We present a deterministic algorithm that finds a feasible network code for a multicast network over an underlying graph G(V,E) in time O(|E|kh + |V |k2h2 + h4k3(k + h)), where k is the number of destinations and h is the number of packets. Our result improves the best known running time of O(|E|kh+ |V |k2h2(k + h)) of the algorithm due to Jaggi et al. [1] in the typical case of large communication graphs. In addition, our algorithm guarantees that the number of encoding nodes in the obtained network code is bounded by O(h3k2).
Next, we address the problem of finding a network code with the minimum number of encoding nodes in both integer and fractional coding networks. We prove that in the majority of settings this problem is NP-hard. However, we show that if h = O(1), k = O(1), and the underlying communication graph is acyclic, then there exists an algorithm that solves this problem in polynomial time.https://authors.library.caltech.edu/records/aykkf-taw59Shortening Array Codes and the Perfect 1-Factorization Conjecture
https://resolver.caltech.edu/CaltechPARADISE:2006.ETR075
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 2006
The existence of a perfect 1-factorization of the complete graph Kn, for arbitrary n, is a 40-year old open problem in graph theory. Two infinite families of perfect 1-factorizations are known for K2p and Kp+1, where p is a prime. It was shown in [8] that finding a perfect 1-factorization of Kn can be reduced to a problem in coding, i.e. to constructing an MDS, lowest density array code of length n. In this paper, a new method for shortening arbitrary array codes is introduced. It is then used to derive the Kp+1 family of perfect 1-factorizations from the K2p family, by applying the reduction metioned above. Namely, techniques from coding theory are used to prove a new result in graph theory.https://authors.library.caltech.edu/records/a320x-97y71Cyclic Low-Density MDS Array Codes
https://resolver.caltech.edu/CaltechPARADISE:2006.ETR076
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2006
We construct two infinite families of low density MDS array codes which are also cyclic. One of these families includes the first such sub-family with redundancy parameter r > 2. The two constructions have different algebraic formulations, though they both have the same indirect structure. First MDS codes that are not cyclic are constructed and then by applying a certain mapping to their parity check matrices, non-equivalent cyclic codes with the same distance and density properties are obtained. Using the same proof techniques, a third infinite family of quasi-cyclic codes can be constructed.https://authors.library.caltech.edu/records/getc1-s1714The encoding complexity of network coding
https://resolver.caltech.edu/CaltechAUTHORS:LANieeetit06
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/TIT.2006.874434
In the multicast network coding problem, a source s needs to deliver h packets to a set of k terminals over an underlying communication network G. The nodes of the multicast network can be broadly categorized into two groups. The first group includes encoding nodes, i.e., nodes that generate new packets by combining data received from two or more incoming links. The second group includes forwarding nodes that can only duplicate and forward the incoming packets. Encoding nodes are, in general, more expensive due to the need to equip them with encoding capabilities. In addition, encoding nodes incur delay and increase the overall complexity of the network. Accordingly, in this paper, we study the design of multicast coding networks with a limited number of encoding nodes. We prove that in a directed acyclic coding network, the number of encoding nodes required to achieve the capacity of the network is bounded by h/sup 3/k/sup 2/. Namely, we present (efficiently constructible) network codes that achieve capacity in which the total number of encoding nodes is independent of the size of the network and is bounded by h/sup 3/k/sup 2/. We show that the number of encoding nodes may depend both on h and k by presenting acyclic coding networks that require /spl Omega/(h/sup 2/k) encoding nodes. In the general case of coding networks with cycles, we show that the number of encoding nodes is limited by the size of the minimum feedback link set, i.e., the minimum number of links that must be removed from the network in order to eliminate cycles. We prove that the number of encoding nodes is bounded by (2B+1)h/sup 3/k/sup 2/, where B is the minimum size of a feedback link set. Finally, we observe that determining or even crudely approximating the minimum number of required encoding nodes is an NP-hard problem.https://authors.library.caltech.edu/records/kan0p-0j970Shortening Array Codes and the Perfect 1-Factorization Conjecture
https://resolver.caltech.edu/CaltechAUTHORS:20170516-150511939
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/ISIT.2006.261572
The existence of a perfect 1-factorization of the complete graph K n, for arbitrary n, is a 40-year old open problem in graph theory. Two infinite families of perfect 1-factorizations are known for K_(2p) and K_(p+1), where p is a prime. It was shown in L. Xu et al. (1999) that finding a perfect 1-factorization of K_n can be reduced to a problem in coding, i.e. to constructing an MDS, lowest density array code of length n. In this paper, a new method for shortening arbitrary array codes is introduced. It is then used to derive the K_(p+1) family of perfect 1-factorizations from the K_(2p) family, by applying the reduction mentioned above. Namely, techniques from coding theory are used to prove a new result in graph theory.https://authors.library.caltech.edu/records/xqx6j-7xc33On the Capacity of Precision-Resolution Constrained Systems
https://resolver.caltech.edu/CaltechAUTHORS:20170509-172831834
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/ISIT.2006.262110
Arguably, the most famous constrained system is the (d, k)-RLL (run-length limited), in which a stream of bits obeys the constraint that every two 1's are separated by at least d 0's, and there are no more than k consecutive 0's anywhere in the stream. The motivation for this scheme comes from the fact that certain sensor characteristics restrict the minimum time between adjacent 1's or else the two will be merged in the receiver, while a clock drift between transmitter and receiver may cause spurious 0's or missing 0's at the receiver if too many appear consecutively.
The interval-modulation scheme introduced by Mukhtar and Bruck extends the RLL constraint and implicitly suggests away of taking advantage of higher-precision clocks. Their work however, deals only with an encoder/decoder construction.
In this work we introduce a more general framework which we call the precision-resolution (PR) constrained system. In PR systems, the encoder has precision constraints, while the decoder has resolution constraints. We examine the capacity of PR systems and show the gain in the presence of a high-precision encoder (thus, we place the PR system with integral encoder, (p=1, ɑ, θ)-PR, which turns out to be a simple extension of RLL, and the PR system with infinite-precision encoder, (∞, ɑ, θ)-PR, on two ends of a continuum). We derive an exact expression for their capacity in terms of the precision p, the minimal resolvable measurement at the decoder alpha, and the decoder resolution factor thetas. In an analogy to the RLL terminology these are the clock precision, the minimal time between peaks, and the clock drift. Surprisingly, even with an infinite-precision encoder, the capacity is finite.https://authors.library.caltech.edu/records/v2vf1-7kn44Cyclic Low-Density MDS Array Codes
https://resolver.caltech.edu/CaltechAUTHORS:20170516-163950291
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/ISIT.2006.261571
We construct two infinite families of low density MDS array codes which are also cyclic. One of these families includes the first such sub-family with redundancy parameter r > 2. The two constructions have different algebraic formulations, though they both have the same indirect structure. First MDS codes that are not cyclic are constructed and then by applying a certain mapping to their parity check matrices, non-equivalent cyclic codes with the same distance and density properties are obtained. Using the same proof techniques, a third infinite family of quasi-cyclic codes can be constructed.https://authors.library.caltech.edu/records/d2y00-smf23Anti-Jamming Schedules for Wireless Data Broadcast Systems
https://resolver.caltech.edu/CaltechAUTHORS:20170510-171516844
Authors: Codenotti, Paolo; Sprintson, Alexander; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/ISIT.2006.261756
Modern society is heavily dependent on wireless networks for providing voice and data communications. Wireless data broadcast has recently emerged as an attractive way to disseminate dynamic data to a large number of clients. In data broadcast systems, the server proactively transmits the information on a downlink channel; the clients access the data by listening to the channel. Wireless data broadcast systems can serve a large number of heterogeneous clients, minimizing power consumption as well as protecting the privacy of the clients' locations. The availability and relatively low cost of antennas resulted in a number of potential threats to the integrity of the wireless infrastructure. In particular, the data broadcast systems are vulnerable to jamming, i.e., the use of active signals to prevent data broadcast. The goal of jammers is to cause disruption, resulting in long waiting times and excessive power consumption. In this paper we investigate efficient schedules for wireless data broadcast that perform well in the presence of a jammer. We show that the waiting time of client can be reduced by adding redundancy to the schedule and establish upper and lower bounds on the achievable minimum waiting time under different requirements on the staleness of the transmitted data.https://authors.library.caltech.edu/records/ggkc8-7gh46Optimal Interleaving on Tori
https://resolver.caltech.edu/CaltechAUTHORS:JIAsiamjdm06
Authors: Jiang, Anxiao (Andrew); Cook, Matthew; Bruck, Jehoshua
Year: 2006
DOI: 10.1137/040618655
This paper studies $t$-interleaving on two-dimensional tori. Interleaving has applications in distributed data storage and burst error correction, and is closely related to Lee metric codes. A $t$-interleaving of a graph is defined as a vertex coloring in which any connected subgraph of $t$ or fewer vertices has a distinct color at every vertex. We say that a torus can be perfectly t-interleaved if its t-interleaving number (the minimum number of colors needed for a t-interleaving) meets the sphere-packing lower bound, $\lceil t^2/2 \rceil$. We show that a torus is perfectly t-interleavable if and only if its dimensions are both multiples of $\frac{t^2+1}{2}$ (if t is odd) or t (if t is even). The next natural question is how much bigger the t-interleaving number is for those tori that are not perfectly t-interleavable, and the most important contribution of this paper is to find an optimal interleaving for all sufficiently large tori, proving that when a torus is large enough in both dimensions, its t-interleaving number is at most just one more than the sphere-packing lower bound. We also obtain bounds on t-interleaving numbers for the cases where one or both dimensions are not large, thus completing a general characterization of t-interleaving numbers for two-dimensional tori. Each of our upper bounds is accompanied by an efficient t-interleaving scheme that constructively achieves the bound.https://authors.library.caltech.edu/records/m54rs-art16Exact Stochastic Simulation of Chemical Reactions with Cycle Leaping
https://resolver.caltech.edu/CaltechPARADISE:2006.ETR077
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2006
The stochastic simulation algorithm (SSA), first proposed by Gillespie, has become the workhorse of computational biology. It tracks integer quantities of the molecular species, executing reactions at random based on propensity calculations. An estimate for the resulting quantities of the different species is obtained by averaging the results of repeated trials. Unfortunately, for models with many reaction channels and many species, the algorithm requires a prohibitive amount of computation time. Many trials must be performed, each forming a lengthy trajectory through the state space. With coupled or reversible reactions, the simulation often loops through the same sequence of states repeatedly, consuming computing time, but making no forward progress.
We propose a algorithm that reduces the simulation time through cycle leaping: when cycles are encountered, the exit probabilities are calculated. Then, in a single bound, the simulation leaps directly to one of the exit states. The technique is exact, sampling the state space with the expected probability distribution. It is a component of a general framework that we have developed for stochastic simulation based on probabilistic analysis and caching.https://authors.library.caltech.edu/records/tzg9b-d6z05Increasing the Information Density of Storage Systems Using the Precision-Resolution Paradigm
https://resolver.caltech.edu/CaltechPARADISE:2007.ETR078
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2007
Arguably, the most prominent constrained system in storage applications is the (d, k)-RLL (Run-Length Limited) system, where every binary sequence obeys the constraint that every two adjacent 1's are separated by at least d consecutive 0's and at most k consecutive 0's, namely, runs of 0's are length limited. The motivation for the RLL constraint arises mainly from the physical limitations of the read and write technologies in magnetic and optical storage systems.
We revisit the rationale for the RLL system and reevaluate its relationship to the physical media. As a result, we introduce a new paradigm that better matches the physical constraints. We call the new paradigm the Precision-Resolution (PR) system, where the write operation is limited by precision and the read operation is limited by resolution.
We compute the capacity of a general PR system and demonstrate that it provides a significant increase in the information density compared to the traditional RLL system (for identical physical limitations). For example, the capacity of the (2, 10)-RLL used in CD-ROMs and DVDs is approximately 0.5418, while our PR system provides the capacity of about 0.7725, resulting in a potential increase of about 40% in information density.https://authors.library.caltech.edu/records/vmf0k-77n02Synthesizing Stochasticity in Biochemical Systems
https://resolver.caltech.edu/CaltechPARADISE:2007.ETR081
Authors: Fett, Brian; Bruck, Jehoshua; Riedel, Marc D.
Year: 2007
Randomness is inherent to biochemistry: at each instant, the sequence of reactions that fires is a matter of chance. Some biological systems exploit such randomness, choosing between different outcomes stochastically – in effect, hedging their bets with a portfolio of responses for different environmental conditions. In this paper, we discuss techniques for synthesizing such stochastic behavior in engineered biochemical systems. We propose a general method for designing a set of biochemical reactions that produces different combinations of molecular types according to a specified probability distribution. The response is precise and robust to perturbations. Furthermore, it is programmable: the probability distribution is a function of the quantities of input types. The method is modular and extensible. We discuss strategies for implementing various functional dependencies: linear, logarithmic, exponential, etc. This work has potential applications in domains such as biochemical sensing, drug production, and disease treatment. Moreover, it provides a framework for analyzing and characterizing the stochastic dynamics in natural biochemical systems such as the lysis/lysogeny switch of the lambda bacteriophage.https://authors.library.caltech.edu/records/hr87k-7y455Constrained Codes as Networks of Relations
https://resolver.caltech.edu/CaltechPARADISE:2007.ETR082
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2007
We revisit the well-known problem of determining the capacity of constrained systems. While the one-dimensional case is well understood, the capacity of two-dimensional systems is mostly unknown. When it is non-zero, except for the (1,1)-RLL system on the hexagonal lattice, there are no closed-form analytical solutions known. Furthermore, for the related problem of counting the exact number of constrained arrays of any given size, only exponential-time algorithms are known.
We present a novel approach to finding the exact capacity
of two-dimensional constrained systems, as well as efficiently counting the exact number of constrained arrays of any given size. To that end, we borrow graph-theoretic tools originally developed for the field of statistical mechanics, tools for efficiently simulating quantum circuits, as well as tools from the theory of the spectral distribution of Toeplitz matrices.https://authors.library.caltech.edu/records/ecxjb-m2n07Codes for Multi-Level Flash Memories: Correcting Asymmetric Limited-Magnitude Errors
https://resolver.caltech.edu/CaltechPARADISE:2007.ETR079
Authors: Cassuto, Yuval; Schwartz, Moshe; Bohassian, Vasken; Bruck, Jehoshua
Year: 2007
Several physical effects that limit the reliability and performance of Multilevel Flash memories induce errors that have low magnitude and are dominantly asymmetric. This paper studies block codes for asymmetric limited-magnitude errors over q-ary channels. We propose code constructions for such channels when the number of errors is bounded by t. The construction uses known codes for symmetric errors over small alphabets to protect large-alphabet symbols from asymmetric limited-magnitude errors. The encoding and decoding of these codes are performed over the small alphabet whose size depends only on the maximum error magnitude and is independent of the alphabet size of the outer code. An extension of the construction is proposed to include systematic codes as a benet to practical implementation.https://authors.library.caltech.edu/records/y1jv5-er092Floating Codes for Joint Information Storage in Write Asymmetric Memories
https://resolver.caltech.edu/CaltechPARADISE:2007.ETR080
Authors: Jiang, Anxiao (Andrew); Bohossian, Vasken; Bruck, Jehoshua
Year: 2007
Memories whose storage cells transit irreversibly between states have been common since the start of the data storage technology. In recent years, flash memories and other non-volatile memories based on floating-gate cells have become a very important family of such memories. We model them by the Write Asymmetric Memory (WAM), a memory where each cell is in one of q states – state 0, 1, ... , q-1 – and can only transit from a lower state to a higher state. Data stored in a WAM can be rewritten by shifting the cells to higher states. Since the state transition is irreversible, the number of times of rewriting is limited. When multiple variables are stored in a WAM, we study codes, which we call floating codes, that maximize the total number of times the variables can be written and rewritten.
In this paper, we present several families of floating codes
that either are optimal, or approach optimality as the codes get longer. We also present bounds to the performance of general floating codes. The results show that floating codes can integrate the rewriting capabilities of different variables to a surprisingly high degree.https://authors.library.caltech.edu/records/9y5yh-y1f10Buffer Coding for Asymmetric Multi-Level Memory
https://resolver.caltech.edu/CaltechPARADISE:2007.ETR083
Authors: Bohassian, Vasken; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2007
Certain storage media such as flash memories use write-asymmetric, multi-level storage elements. In such media, data is stored in a multi-level memory cell the contents of which can only be increased, or reset. The reset operation is expensive and should be delayed as much as possible. Mathematically, we consider the problem of writing a binary sequence into write-asymmetric q-ary cells, while recording the last r bits written. We want to maximize t, the number of possible writes, before a reset is needed. We introduce the term Buffer Code, to describe the solution to this problem. A buffer code is a code that remembers the r most recent values of a variable. We present the construction of a single-cell (n = 1) buffer code that can store a binary (l = 2) variable with t = [q/2^(r - 1)] + r - 2 and a universal upper bound to the number of rewrites that a single-cell buffer code can have: ..... We also show a binary buffer code with arbitrary n, q, r, namely, the code uses n q-ary cells to remember the r most recent values of one binary variable. The code can rewrite the variable times, which is asymptotically optimal in q and n. . We then extend the code construction for the case r = 2, and obtain a code that can rewrite the variable t = (q - 1)(n - 2) + 1 times. When q = 2, the code is strictly optimal.https://authors.library.caltech.edu/records/kzd49-t8346Synthesizing stochasticity in biochemical systems
https://resolver.caltech.edu/CaltechAUTHORS:20161019-153750816
Authors: Fett, Brian; Bruck, Jehoshua; Riedel, Marc D.
Year: 2007
DOI: 10.1145/1278480.1278643
Randomness is inherent to biochemistry: at each instant, the sequence of reactions that fires is a matter of chance. Some biological systems exploit such randomness, choosing between different outcomes stochastically - in effect, hedging their bets with a portfolio of responses for different environmental conditions. In this paper, we discuss techniques for synthesizing such stochastic behavior in engineered biochemical systems. We propose a general method for designing a set of biochemical reactions that produces different combinations of molecular types according to a specified probability distribution. The response is precise and robust to perturbations. Furthermore, it is programmable: the probability distribution is a function of the quantities of input types. The method is modular and extensible. We discuss strategies for implementing various functional dependencies: linear, logarithmic, exponential, etc. This work has potential applications in domains such as biochemical sensing, drug production, and disease treatment. Moreover, it provides a framework for analyzing and characterizing the stochastic dynamics in natural biochemical systems such as the lysis/lysogeny switch of the lambda bacteriophage.https://authors.library.caltech.edu/records/7g96g-byv50Floating Codes for Joint Information Storage in Write Asymmetric Memories
https://resolver.caltech.edu/CaltechAUTHORS:20170419-152416338
Authors: Jiang, Anxiao (Andrew); Bohossian, Vasken; Bruck, Jehoshua
Year: 2007
DOI: 10.1109/ISIT.2007.4557381
Memories whose storage cells transit irreversibly between states have been common since the start of the data storage technology. In recent years, flash memories and other non-volatile memories based on floating-gate cells have become a very important family of such memories. We model them by the write asymmetric memory (WAM), a memory where each cell is in one of q states - state 0, 1, middotmiddotmiddot, q - 1 - and can only transit from a lower state to a higher state. Data stored in a WAM can be rewritten by shifting the cells to higher states. Since the state transition is irreversible, the number of times of rewriting is limited. When multiple variables are stored in a WAM, we study codes, which we call floating codes, that maximize the total number of times the variables can be written and rewritten. In this paper, we present several families of floating codes that either are optimal, or approach optimality as the codes get longer. We also present bounds to the performance of general floating codes. The results show that floating codes can integrate the rewriting capabilities of different variables to a surprisingly high degree.https://authors.library.caltech.edu/records/192v0-rz826Distributed Broadcasting and Mapping Protocols in Directed Anonymous Networks
https://resolver.caltech.edu/CaltechPARADISE:2007.ETR084
Authors: Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2007
We initiate the study of distributed protocols over directed anonymous networks that are not necessarily strongly connected. In such networks, nodes are aware only of their incoming and outgoing edges, have no unique identity, and have no knowledge of the network topology or even bounds on its parameters, like the number of nodes or the network diameter. Anonymous networks are of interest in various settings such as wireless ad-hoc networks and peer to peer networks. Our goal is to create distributed protocols that reduce the uncertainty by distributing the knowledge of the network topology to all the nodes.
We consider two basic protocols: broadcasting and unique label assignment. These two protocols enable a complete mapping of the network and can serve as key building blocks in more advanced protocols. We develop distributed asynchronous protocols as well as derive lower bounds on their communication complexity, total bandwidth complexity, and node label complexity. The resulting lower bounds are sometimes surprisingly high, exhibiting the complexity of topology extraction in directed anonymous networks.https://authors.library.caltech.edu/records/2y7rc-55t61Constrained Codes as Networks of Relations
https://resolver.caltech.edu/CaltechAUTHORS:20170424-171247108
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2007
DOI: 10.1109/ISIT.2007.4557416
We revisit the well-known problem of determining the capacity of constrained systems. While the one-dimensional case is well understood, the capacity of two-dimensional systems is mostly unknown. When it is non-zero, except for the (1,∞ )- RLL system on the hexagonal lattice, there are no closed-form analytical solutions known. Furthermore, for the related problem of counting the exact number of constrained arrays of any given size, only exponential-time algorithms are known.
We present a novel approach to finding the exact capacity of two-dimensional constrained systems, as well as efficiently counting the exact number of constrained arrays of any given size. To that end, we borrow graph-theoretic tools originally developed for the field of statistical mechanics, tools for efficiently simulating quantum circuits, as well as tools from the theory of the spectral distribution of Toeplitz matrices.https://authors.library.caltech.edu/records/jfxhx-hrc09Codes for Multi-Level Flash Memories: Correcting Asymmetric Limited-Magnitude Errors
https://resolver.caltech.edu/CaltechAUTHORS:20170426-165849521
Authors: Cassuto, Yuval; Schwartz, Moshe; Bohossian, Vasken; Bruck, Jehoshua
Year: 2007
DOI: 10.1109/ISIT.2007.4557123
Several physical effects that limit the reliability and performance of Multilevel Flash memories induce errors that have low magnitude and are dominantly asymmetric. This paper studies block codes for asymmetric limited-magnitude errors over q-ary channels. We propose code constructions for such channels when the number of errors is bounded by t. The construction uses known codes for symmetric errors over small alphabets to protect large-alphabet symbols from asymmetric limited-magnitude errors. The encoding and decoding of these codes are performed over the small alphabet whose size depends only on the maximum error magnitude and is independent of the alphabet size of the outer code. An extension of the construction is proposed to include systematic codes as a benefit to practical implementation.https://authors.library.caltech.edu/records/r5wsg-xt282Buffer Coding for Asymmetric Multi-Level Memory
https://resolver.caltech.edu/CaltechAUTHORS:20170426-152709376
Authors: Bohossian, Vasken; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2007
DOI: 10.1109/ISIT.2007.4557384
Certain storage media such as flash memories use write-asymmetric, multi-level storage elements. In such media, data is stored in a multi-level memory cell the contents of which can only be increased, or reset. The reset operation is expensive and should be delayed as much as possible. Mathematically, we consider the problem of writing a binary sequence into write-asymmetric q-ary cells, while recording the last r bits written. We want to maximize t, the number of possible writes, before a reset is needed. We introduce the term Buffer Code, to describe the solution to this problem. A buffer code is a code that remembers the r most recent values of a variable. We present the construction of a single-cell (n = 1) buffer code that can store a binary (l = 2) variable with t = [q/2^(r - 1)] + r - 2 and a universal upper bound to the number of rewrites that a single-cell buffer code can have: ..... We also show a binary buffer code with arbitrary n, q, r, namely, the code uses n q-ary cells to remember the r most recent values of one binary variable. The code can rewrite the variable times, which is asymptotically optimal in q and n. . We then extend the code construction for the case r = 2, and obtain a code that can rewrite the variable t = (q - 1)(n - 2) + 1 times. When q = 2, the code is strictly optimal.https://authors.library.caltech.edu/records/j7xa3-njj15Distributed broadcasting and mapping protocols in directed anonymous networks
https://resolver.caltech.edu/CaltechAUTHORS:20161121-163644968
Authors: Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2007
DOI: 10.1145/1281100.1281184
In this work we study the fundamental problems of broad-
casting and mapping (label assignment and topology extraction) in directed anonymous networks. In such a network G, processors do not have unique identifiers, they execute identical protocols, and they have no knowledge of the topology of the network (even the size or bounds on it are unknown). The only knowledge available to a vertex is its own degree.https://authors.library.caltech.edu/records/tdksz-0tn97Computation with Finite Stochastic Chemical Reaction Networks
https://resolver.caltech.edu/CaltechPARADISE:2007.ETR085
Authors: Soloveichik, David; Cook, Matthew; Winfree, Erik; Bruck, Jehoshua
Year: 2007
A highly desired part of the synthetic biology toolbox is an embedded chemical microcontroller, capable of autonomously following a logic program specified by a set of instructions, and interacting with its cellular environment. Strategies for incorporating logic in aqueous chemistry have focused primarily on implementing components, such as logic gates, that are composed into larger circuits, with each logic gate in the circuit corresponding to one or more molecular species. With this paradigm, designing and producing new molecular species is necessary to perform larger computations. An alternative approach begins by noticing that chemical systems on the small scale are fundamentally discrete and stochastic. In particular, the exact molecular counts of each molecular species present, is an intrinsically available form of information. This might appear to be a very weak form of information, perhaps quite difficult for computations to utilize. Indeed, it has been shown that error-free Turing universal computation is impossible in this setting. Nevertheless, we show a design of a chemical computer that achieves fast and reliable Turing-universal computation using molecular counts. Our scheme uses only a small number of different molecular species to do computation of arbitrary complexity. The total probability of error of the computation can be made arbitrarily small (but not zero) by adjusting the initial molecular counts of certain species. While physical implementations would be difficult, these results demonstrate that molecular counts can be a useful form of information for small molecular systems such as those operating within cellular environments.https://authors.library.caltech.edu/records/f308m-93262MAP: Medial axis based geometric routing in sensor networks
https://resolver.caltech.edu/CaltechAUTHORS:20100505-134021747
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2007
DOI: 10.1007/s11276-006-9857-z
One of the challenging tasks in the deployment of dense wireless networks (like sensor networks) is in devising a routing scheme for node to node communication. Important consideration includes scalability, routing complexity, quality of communication paths and the load sharing of the routes. In this paper, we show that a compact and expressive abstraction of network connectivity by the medial axis enables efficient and localized routing. We propose MAP, a Medial Axis based naming and routing Protocol that does not require geographical locations, makes routing decisions locally, and achieves good load balancing. In its preprocessing phase, MAP constructs the medial axis of the sensor field, defined as the set of nodes with at least two closest boundary nodes. The medial axis of the network captures both the complex geometry and non-trivial topology of the sensor field. It can be represented succinctly by a graph whose size is in the order of the complexity of the geometric features (e.g., the number of holes). Each node is then given a name related to its position with respect to the medial axis. The routing scheme is derived through local decisions based on the names of the source and destination nodes and guarantees delivery with reasonable and natural routes. We show by both theoretical analysis and simulations that our medial axis based geometric routing scheme is scalable, produces short routes, achieves excellent load balancing, and is very robust to variations in the network model.https://authors.library.caltech.edu/records/762qx-x3s09Rank Modulation for Flash Memories
https://resolver.caltech.edu/CaltechPARADISE:2008.ETR086
Authors: Jiang, Anxiao (Andrew); Mateescu, Robert; Schwartz, Moshe; Bruck, Jehoshua
Year: 2008
We explore a novel data representation scheme for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. The only allowed charge-placement mechanism is a "push-to-the-top" operation which takes a single cell of the set and makes it the top-charged cell. The resulting scheme eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells.
We present unrestricted Gray codes spanning all possible n-cell states and using only "push-to-the-top" operations, and also construct balanced Gray codes. We also investigate optimal rewriting schemes for translating arbitrary input alphabet into n-cell states which minimize the number of programming operations.https://authors.library.caltech.edu/records/awqbq-dvx67Joint Coding for Flash Memory Storage
https://resolver.caltech.edu/CaltechPARADISE:2008.ETR087
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2008
Flash memory is an electronic non-volatile memory with wide applications. Due to the substantial impact of block erasure operations on the speed, reliability and longevity of flash memories, writing schemes that enable data to be modified numerous times without incurring the block erasure is desirable. This requirement is addressed by floating codes, a coding scheme that jointly stores and rewrites data and maximizes the rewriting capability of flash memories. In this paper, we present several new floating code constructions. They include both codes with specific parameters and general code constructions that are asymptotically optimal. We also present bounds to the performance of floating codes.https://authors.library.caltech.edu/records/616vy-hw290Constrained Codes as Networks of Relations
https://resolver.caltech.edu/CaltechAUTHORS:SCHWieeetit08
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2008
DOI: 10.1109/TIT.2008.920245
We address the well-known problem of determining the capacity of constrained coding systems. While the one-dimensional case is well understood to the extent that there are techniques for rigorously deriving the exact capacity, in contrast, computing the exact capacity of a two-dimensional constrained coding system is still an elusive research challenge. The only known exception in the two-dimensional case is an exact (however, not rigorous) solution to the (1,∞)-run-length limited (RLL) system on the hexagonal lattice. Furthermore, only exponential-time algorithms are known for the related problem of counting the exact number of constrained two-dimensional information arrays.
We present the first known rigorous technique that yields an exact capacity of a two-dimensional constrained coding system. In addition, we devise an efficient (polynomial time) algorithm for counting the exact number of constrained arrays of any given size. Our approach is a composition of a number of ideas and techniques: describing the capacity problem as a solution to a counting problem in networks of relations, graph-theoretic tools originally developed in the field of statistical mechanics, techniques for efficiently simulating quantum circuits, as well as ideas from the theory related to the spectral distribution of Toeplitz matrices. Using our technique, we derive a closed-form solution to the capacity related to the Path-Cover constraint in a two-dimensional triangular array (the resulting calculated capacity is 0.72399217...). Path-Cover is a generalization of the well known one-dimensional(0,1)-RLL constraint for which the capacity is known to be 0.69424....https://authors.library.caltech.edu/records/g35sa-7k789Codes for Asymmetric Limited-Magnitude Errors with Application to Multi-Level Flash Memories
https://resolver.caltech.edu/CaltechPARADISE:2008.ETR088
Authors: Cassuto, Yuval; Schwartz, Moshe; Bohossian, Vasken; Bruck, Jehoshua
Year: 2008
Several physical effects that limit the reliability and
performance of Multilevel Flash Memories induce errors that
have low magnitudes and are dominantly asymmetric. This paper studies block codes for asymmetric limited-magnitude errors over q-ary channels. We propose code constructions and bounds for such channels when the number of errors is bounded by t and the error magnitudes are bounded by ࡁ. The constructions utilize known codes for symmetric errors, over small alphabets, to protect large-alphabet symbols from asymmetric limited-magnitude errors. The encoding and decoding of these codes are performed over the small alphabet whose size depends only on the maximum error magnitude and is independent of the alphabet size of the outer code. Moreover, the size of the codes is shown
to exceed the sizes of known codes (for related error models), and asymptotic rate-optimality results are proved. Extensions of the construction are proposed to accommodate variations on the error model and to include systematic codes as a benefit to practical implementation.https://authors.library.caltech.edu/records/ddvmx-zv463Stochastic switching circuit synthesis
https://resolver.caltech.edu/CaltechAUTHORS:WILisit08
Authors: Wilhelm, Daniel; Bruck, Jehoshua
Year: 2008
DOI: 10.1109/ISIT.2008.4595215
Shannon in his 1938 Masterpsilas Thesis demonstrated that any Boolean function can be realized by a switching relay circuit, leading to the development of deterministic digital logic. Here, we replace each classical switch with a probabilistic switch (pswitch). We present algorithms for synthesizing circuits closed with a desired probability, including an algorithm that generates optimal size circuits for any binary fraction. We also introduce a new duality property for series-parallel stochastic switching circuits. Finally, we construct a universal probability generator which maps deterministic inputs to arbitrary probabilistic outputs. Potential applications exist in the analysis and design of stochastic networks in biology and engineering.https://authors.library.caltech.edu/records/8p2vh-7cv44The Alpha Project: a model system for systems biology research
https://resolver.caltech.edu/CaltechAUTHORS:20090728-082033135
Authors: Yu, R. C.; Resnekov, O.; Abola, A. P.; Andrews, S. S.; Benjamin, K. R.; Bruck, J.; Burbulis, I. E.; Colman-Lerner, A.; Endy, D.; Gordon, A.; Holl, M.; Lok, L.; Pesce, C. G.; Serra, E.; Smith, R. D.; Thomson, T. M.; Tsong, A. E.; Brent, R.
Year: 2008
DOI: 10.1049/iet-syb:20080127
One goal of systems biology is to understand how genome-encoded parts interact to produce quantitative
phenotypes. The Alpha Project is a medium-scale, interdisciplinary systems biology effort that aims to achieve this
goal by understanding fundamental quantitative behaviours of a prototypic signal transduction pathway, the yeast
pheromone response system from Saccharomyces cerevisiae. The Alpha Project distinguishes itself from many other
systems biology projects by studying a tightly bounded and well-characterised system that is easily modified by
genetic means, and by focusing on deep understanding of a discrete number of important and accessible
quantitative behaviours. During the project, the authors have developed tools to measure the appropriate data
and develop models at appropriate levels of detail to study a number of these quantitative behaviours. The
authors have also developed transportable experimental tools and conceptual frameworks for understanding
other signalling systems. In particular, the authors have begun to interpret system behaviours and their
underlying molecular mechanisms through the lens of information transmission, a principal function of signalling
systems. The Alpha Project demonstrates that interdisciplinary studies that identify key quantitative behaviours
and measure important quantities, in the context of well-articulated abstractions of system function and
appropriate analytical frameworks, can lead to deeper biological understanding. The authors' experience may
provide a productive template for systems biology investigations of other cellular systems.https://authors.library.caltech.edu/records/scn1j-3n835Optimal Universal Schedules for Discrete Broadcast
https://resolver.caltech.edu/CaltechAUTHORS:LANieeetit08
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2008
DOI: 10.1109/TIT.2008.928296
We study broadcast systems that distribute a series of data updates to a large number of passive clients. The updates are sent over a broadcast channel in the form of discrete packets. We assume that clients periodically access the channel to obtain the most recent update. Such scenarios arise in many practical applications, such as distribution of traffic information and market updates to mobile wireless devices.https://authors.library.caltech.edu/records/pznyh-27r04Programmability of Chemical Reaction Networks
https://resolver.caltech.edu/CaltechPARADISE:2008.ETR090
Authors: Cook, Matthew; Soloveichik, David; Winfree, Erik; Bruck, Jehoshua
Year: 2008
Motivated by the intriguing complexity of biochemical circuitry within individual cells we study Stochastic Chemical Reaction Networks (SCRNs), a formal model that considers a set of chemical reactions acting on a finite number of molecules in a well-stirred solution according to standard chemical kinetics equations. SCRNs have been widely used for describing naturally occurring (bio)chemical systems, and with the advent of synthetic biology they become a promising language for the design of artificial biochemical circuits. Our interest here is the computational power of SCRNs and how they relate to more conventional models of computation. We survey known connections and give new connections between SCRNs and Boolean Logic Circuits, Vector Addition Systems, Petri Nets, Gate Implementability, Primitive Recursive Functions, Register Machines, Fractran, and Turing Machines. A theme to these investigations is the thin line between decidable and undecidable questions about SCRN behavior.https://authors.library.caltech.edu/records/gf8h4-ta232Graphene-based atomic-scale switches
https://resolver.caltech.edu/CaltechAUTHORS:STAnl08
Authors: Standley, Brian; Bao, Wenzhong; Zhang, Hang; Bruck, Jehoshua; Lau, Chun Ning; Bockrath, Marc
Year: 2008
DOI: 10.1021/nl801774a
Graphene's remarkable mechanical and electrical properties, combined with its compatibility with existing planar silicon-based technology, make it an attractive material for novel computing devices. We report the development of a nonvolatile memory element based on graphene break junctions. Our devices have demonstrated thousands of writing cycles and long retention times. We propose a model for device operation based on the formation and breaking of carbon atomic chains that bridge the junctions. We demonstrate information storage based on the concept of rank coding, in which information is stored in the relative conductance of graphene switches in a memory cell.https://authors.library.caltech.edu/records/7wy4q-67270Computation with finite stochastic chemical reaction networks
https://resolver.caltech.edu/CaltechAUTHORS:20111020-132840264
Authors: Soloveichik, David; Cook, Matthew; Winfree, Erik; Bruck, Jehoshua
Year: 2008
DOI: 10.1007/s11047-008-9067-y
A highly desired part of the synthetic biology toolbox is an embedded chemical microcontroller, capable of autonomously following a logic program specified by a set of instructions, and interacting with its cellular environment. Strategies for incorporating logic in aqueous chemistry have focused primarily on implementing components, such as logic gates, that are composed into larger circuits, with each logic gate in the circuit corresponding to one or more molecular species. With this paradigm, designing and producing new molecular species is necessary to perform larger computations. An alternative approach begins by noticing that chemical systems on the small scale are fundamentally discrete and stochastic. In particular, the exact molecular counts of each molecular species present, is an intrinsically available form of information. This might appear to be a very weak form of information, perhaps quite difficult for computations to utilize. Indeed, it has been shown that error-free Turing universal computation is impossible in this setting. Nevertheless, we show a design of a chemical computer that achieves fast and reliable Turing-universal computation using molecular counts. Our scheme uses only a small number of different molecular species to do computation of arbitrary complexity. The total probability of error of the computation can be made arbitrarily small (but not zero) by adjusting the initial molecular counts of certain species. While physical implementations would be difficult, these results demonstrate that molecular counts can be a useful form of information for small molecular systems such as those operating within cellular environments.https://authors.library.caltech.edu/records/1abkk-rra71On the capacity of bounded rank modulation for flash memories
https://resolver.caltech.edu/CaltechPARADISE:2008.ETR091
Authors: Wang, Zhiying; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2009
Rank modulation has been recently introduced as a new information representation scheme for flash memories.
Given the charge levels of a group of flash cells, sorting
is used to induce a permutation, which in turn represents
data. Motivated by the lower sorting complexity of smaller
cell groups, we consider bounded rank modulation, where
a sequence of permutations of given sizes are used to
represent data. We study the capacity of bounded rank
modulation under the condition that permutations can
overlap for higher capacity.https://authors.library.caltech.edu/records/pe2zr-nrj54Network Coding: A Computational Perspective
https://resolver.caltech.edu/CaltechAUTHORS:LANieeetit09
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/TIT.2008.2008135
In this work, we study the computational perspective of network coding, focusing on two issues. First, we address the computational complexity of finding a network code for acyclic multicast networks. Second, we address the issue of reducing the amount of computation performed by network nodes. In particular, we consider the problem of finding a network code with the minimum possible number of encoding nodes, i.e., nodes that generate new packets by performing algebraic operations on packets received over incoming links.https://authors.library.caltech.edu/records/emahk-qb398The Robustness of Stochastic Switching Networks
https://resolver.caltech.edu/CaltechPARADISE:2009.ETR092
Authors: Loh, Po-Ling; Zhou, Hongchao; Bruck, Jehoshua
Year: 2009
Many natural systems, including chemical and biological systems, can be modeled using stochastic switching circuits. These circuits consist of stochastic switches, called pswitches, which operate with a fixed probability of being open or closed. We study the effect caused by introducing an error of size ∈ to each pswitch in a stochastic circuit. We analyze two constructions – simple series-parallel and general series-parallel circuits – and prove that simple series-parallel circuits are robust to small error perturbations, while general series-parallel circuits are not. Specifically, the total error introduced by perturbations of size less than ∈ is bounded by a constant multiple of ∈ in a simple series-parallel circuit, independent of the size of the circuit. However, the same result does not hold in the case of more general series-parallel circuits. In the case of a general stochastic circuit, we prove that the overall error probability is bounded by a linear function of the number of pswitches.https://authors.library.caltech.edu/records/zae12-vg894Localization and routing in sensor networks by local angle information
https://resolver.caltech.edu/CaltechAUTHORS:20090504-113921187
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2009
DOI: 10.1145/1464420.1464427
Location information is useful both for network organization and for sensor data integrity. In this article, we study the anchor-free 2D localization problem by using local angle measurements. We prove that given a unit disk graph and the angles between adjacent edges, it is NP-hard to find a valid embedding in the plane such that neighboring nodes are within distance 1 from each other and non-neighboring nodes are at least distance √2/2 away. Despite the negative results, however, we can find a planar spanner of a unit disk graph by using only local angles. The planar spanner can be used to generate a set of virtual coordinates that enable efficient and local routing schemes such as geographical routing or approximate shortest path routing. We also proposed a practical anchor-free embedding scheme by solving a linear program. We show by simulation that it gives both a good local embedding, with neighboring nodes embedded close and non-neighboring nodes far away, and a satisfactory global view such that geographical routing and approximate shortest path routing on the embedded graph are almost identical to those on the original (true) embedding.https://authors.library.caltech.edu/records/x2rfx-3jc47On the Expressibility of Stochastic Switching Circuits
https://resolver.caltech.edu/CaltechPARADISE:2009.ETR093
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2009
Stochastic switching circuits are relay circuits that consist of stochastic switches (that we call pswitches). We study the expressive power of these circuits; in particular, we address the following basic question: given an arbitrary integer q, and a pswitch set {1/q, 2/q, ..., (q–1)/q }, can we realize any rational probability with denominator q n (for arbitrary n) by a simple series-parallel stochastic switching circuit? In this paper, we generalized previous results and prove that when q is a multiple of 2 or 3 the answer is positive. We also show that when q is a prime number the answer is negative. In addition, we propose a greedy algorithm to realize desired reachable probabilities, and thousands of experiments show that this algorithm can achieve almost optimal size. Finally, we prove that any desired probability can be approximated well by a linear size circuit.https://authors.library.caltech.edu/records/3bvx9-zp442Shortening array codes and the perfect 1-factorization conjecture
https://resolver.caltech.edu/CaltechAUTHORS:20090717-115258499
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/TIT.2008.2009850
The existence of a perfect 1-factorization of the complete graph with n nodes, namely, K_n , for arbitrary even number n, is a 40-year-old open problem in graph theory. So far, two infinite families of perfect 1-factorizations have been shown to exist, namely, the factorizations of K_(p+1) and K_2p , where p is an arbitrary prime number (p > 2) . It was shown in previous work that finding a perfect 1-factorization of K_n is related to a problem in coding, specifically, it can be reduced to constructing an MDS (Minimum Distance Separable), lowest density array code. In this paper, a new method for shortening arbitrary array codes is introduced. It is then used to derive the K_(p+1) family of perfect 1-factorization from the K_2p family. Namely, techniques from coding theory are used to prove a new result in graph theory-that the two factorization families are related.https://authors.library.caltech.edu/records/d12hj-hqr49Universal Rewriting in Constrained Memories
https://resolver.caltech.edu/CaltechPARADISE:2009.ETR096
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2009
A constrained memory is a storage device whose elements change their states under some constraints. A typical example is flash memories, in which cell levels are easy to increase but hard to decrease. In a general rewriting model, the stored data changes with some pattern determined by the application. In a constrained memory, an appropriate representation is needed for the stored data to enable efficient rewriting.
In this paper, we define the general rewriting problem using a graph model. This model generalizes many known rewriting models such as floating codes, WOM codes, buffer codes, etc. We present a novel rewriting scheme for the flash-memory model and prove it is asymptotically optimal in a wide range of scenarios.
We further study randomization and probability distributions to data rewriting and study the expected performance. We present a randomized code for all rewriting sequences and a deterministic code for rewriting following any i.i.d. distribution. Both codes are shown to be optimal asymptotically.https://authors.library.caltech.edu/records/915ad-wt606Cyclic lowest density MDS array codes
https://resolver.caltech.edu/CaltechAUTHORS:20090514-113712174
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/TIT.2009.2013024
Three new families of lowest density maximum-distance separable (MDS) array codes are constructed, which are cyclic or quasi-cyclic. In addition to their optimal redundancy (MDS) and optimal update complexity (lowest density), the symmetry offered by the new codes can be utilized for simplified implementation in storage applications. The proof of the code properties has an indirect structure: first MDS codes that are not cyclic are constructed, and then transformed to cyclic codes by a minimum-distance preserving transformation.https://authors.library.caltech.edu/records/dnkfk-thb37Correcting Charge-Constrained Errors in the Rank-Modulation Scheme
https://resolver.caltech.edu/CaltechPARADISE:2009.ETR095
Authors: Jiang, Anxiao (Andrew); Schwartz, Moshe; Bruck, Jehoshua
Year: 2009
We investigate error-correcting codes for a novel storage technology for flash memories, the rank-modulation scheme. In this scheme, a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. The resulting scheme eliminates the need for discrete cell levels, overcomes overshoot errors when programming cells (a serious problem that reduces the writing speed), and mitigates the problem of asymmetric errors.
In this paper we study the properties of error-correcting codes for charge-constrained errors in the rank-modulation scheme. In this error model the number of errors corresponds to the minimal number of adjacent transpositions required to change a given stored permutation to another erroneous one – a distance measure known as Kendall's τ-distance.
We show bounds on the size of such codes, and use metric-embedding techniques to give constructions which translate a
wealth of knowledge of binary codes in the Hamming metric as well as q-ary codes in the Lee metric, to codes over permutations in Kendall's τ-metric. Specifically, the one-error-correcting codes we construct are at least half the ball-packing upper bound.https://authors.library.caltech.edu/records/mkc70-5fd26Stochastic Switching Circuit Synthesis
https://resolver.caltech.edu/CaltechPARADISE:2008.ETR089
Authors: Wilhelm, Daniel; Bruck, Jehoshua
Year: 2009
In his 1938 Master's Thesis, Shannon demonstrated that any Boolean function can be realized by a switching relay circuit, leading to the development of deterministic digital logic. Here, we replace each classical switch with a probabilistic switch (pswitch). We present algorithms for synthesizing circuits closed with a desired probability, including an algorithm that generates optimal size circuits for any binary fraction. We also introduce a new duality property for series-parallel stochastic switching circuits. Finally, we construct a universal probability generator which maps deterministic inputs to arbitrary probabilistic outputs. Potential applications exist in the analysis and design of stochastic networks in biology and engineering.https://authors.library.caltech.edu/records/80cap-8jm07Universal rewriting in constrained memories
https://resolver.caltech.edu/CaltechAUTHORS:20170321-172544029
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ISIT.2009.5205981
A constrained memory is a storage device whose elements change their states under some constraints. A typical example is flash memories, in which cell levels are easy to increase but hard to decrease. In a general rewriting model, the stored data changes with some pattern determined by the application. In a constrained memory, an appropriate representation is needed for the stored data to enable efficient rewriting.
In this paper, we define the general rewriting problem using a graph model. This model generalizes many known rewriting models such as floating codes, WOM codes, buffer codes, etc. We present a novel rewriting scheme for the flash-memory model and prove it is asymptotically optimal in a wide range of scenarios.
We further study randomization and probability distributions to data rewriting and study the expected performance. We present a randomized code for all rewriting sequences and a deterministic code for rewriting following any i.i.d, distribution. Both codes are shown to be optimal asymptotically.https://authors.library.caltech.edu/records/4k7ny-2dc62Rank Modulation for Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20090820-152000947
Authors: Jiang, Anxiao (Andrew); Mateescu, Robert; Schwartz, Moshe; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/TIT.2009.2018336
We explore a novel data representation scheme for multilevel flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. The only allowed charge-placement mechanism is a ldquopush-to-the-toprdquo operation, which takes a single cell of the set and makes it the top-charged cell. The resulting scheme eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. We present unrestricted Gray codes spanning all possible n-cell states and using only "push-to-the-top" operations, and also construct balanced Gray codes. One important application of the Gray codes is the realization of logic multilevel cells, which is useful in conventional storage solutions. We also investigate rewriting schemes for random data modification. We present both an optimal scheme for the worst case rewrite performance and an approximation scheme for the average-case rewrite performance.https://authors.library.caltech.edu/records/9zr6t-5rz21Data Movement in Flash Memories
https://resolver.caltech.edu/CaltechPARADISE:2009.ETR097
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Mateesu, Robert; Bruck, Jehoshua
Year: 2009
NAND flash memories are the most widely used non-volatile memories, and data movement is common in flash storage
systems. We study data movement solutions that minimize the number of block erasures, which are very important for the
efficiency and longevity of flash memories. To move data among n blocks with the help of Δ auxiliary blocks, where every block
contains m pages, we present algorithms that use θ(n • min{m, log_Δ n}) erasures without the tool of coding. We prove this is
almost the best possible for non-coding solutions by presenting a nearly matching lower bound. Optimal data movement can be
achieved using coding, where only θ(n) erasures are needed. We present a coding-based algorithm, which has very low coding
complexity, for optimal data movement. We further show the NP hardness of both coding-based and non-coding schemes when
the objective is to optimize data movement on a per instance basis.https://authors.library.caltech.edu/records/2gmxq-03v21Interleaving schemes on circulant graphs with two offsets
https://resolver.caltech.edu/CaltechAUTHORS:20090925-102048176
Authors: Slivkins, Aleksandrs; Bruck, Jehoshua
Year: 2009
DOI: 10.1016/j.disc.2009.01.020
Interleaving is used for error-correcting on a bursty noisy channel. Given a graph G describing the topology of the channel, we label the vertices of G so that each label-set is sufficiently sparse. The interleaving scheme corrects for any error burst of size at most t; it is a labeling where the distance between any two vertices in the same label-set is at least t.
We consider interleaving schemes on infinite circulant graphs with two offsets 1 and d. In such a graph the vertices are integers; edge ij exists if and only if |i−j|∈{1,d}. Our goal is to minimize the number of labels used.
Our constructions are covers of the graph by the minimal number of translates of some label-set S. We focus on minimizing the index of S, which is the inverse of its density rounded up. We establish lower bounds and prove that our constructions are optimal or almost optimal, both for the index of S and for the number of labels.https://authors.library.caltech.edu/records/xejj9-qf205Programmability of Chemical Reaction Networks
https://resolver.caltech.edu/CaltechAUTHORS:20111020-103016495
Authors: Cook, Matthew; Soloveichik, David; Winfree, Erik; Bruck, Jehoshua
Year: 2009
DOI: 10.1007/978-3-540-88869-7_27
Motivated by the intriguing complexity of biochemical circuitry within individual cells we study Stochastic Chemical Reaction Networks (SCRNs), a formal model that considers a set of chemical reactions acting on a finite number of molecules in a well-stirred solution according to standard chemical kinetics equations. SCRNs have been widely used for describing naturally occurring (bio)chemical systems, and with the advent of synthetic biology they become a promising language for the design of artificial biochemical circuits. Our interest here is the computational power of SCRNs and how they relate to more conventional models of computation. We survey known connections and give new connections between SCRNs and Boolean Logic Circuits, Vector Addition Systems, Petri nets, Gate Implementability, Primitive Recursive Functions, Register Machines, Fractran, and Turing Machines. A theme to these investigations is the thin line between decidable and undecidable questions about SCRN behavior.https://authors.library.caltech.edu/records/pgm0a-wzh97The robustness of stochastic switching networks
https://resolver.caltech.edu/CaltechAUTHORS:20100816-134022210
Authors: Loh, Po-Ling; Zhou, Hongchao; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ISIT.2009.5205379
Many natural systems, including chemical and biological systems, can be modeled using stochastic switching circuits. These circuits consist of stochastic switches, called pswitches, which operate with a fixed probability of being open or closed. We study the effect caused by introducing an error of size. to each pswitch in a stochastic circuit. We analyze two constructions--simple series-parallel and general series-parallel circuits--and prove that simple series-parallel circuits are robust to small error perturbations, while general series-parallel circuits are not. Specifically, the total error introduced by perturbations of size less than ε is bounded by a constant multiple of ε in a simple series-parallel circuit, independent of the size of the circuit. However, the same result does not hold in the case of more general series-parallel circuits. In the case of a general stochastic circuit, we prove that the overall error probability is bounded by a linear function of the number of pswitches.https://authors.library.caltech.edu/records/t7g6r-05h46On the expressibility of stochastic switching circuits
https://resolver.caltech.edu/CaltechAUTHORS:20100816-150432698
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ISIT.2009.5205401
Stochastic switching circuits are relay circuits that consist of stochastic switches (that we call pswitches). We study the expressive power of these circuits; in particular, we address the following basic question: given an arbitrary integer q, and a pswitch set {1/q, 2/q, ..., q-1/q}, can we realize any rational probability with denominator q^n (for arbitrary n) by a simple series-parallel stochastic switching circuit? In this paper, we generalized previous results and prove that when q is a multiple of 2 or 3 the answer is positive. We also show that when q is a prime number the answer is negative. In addition, we prove that any desired probability can be approximated well by a linear in n size circuit, with error less than q^(-n).https://authors.library.caltech.edu/records/kwnj6-a3h97On the capacity of bounded rank modulation for flash memories
https://resolver.caltech.edu/CaltechAUTHORS:20100816-142932373
Authors: Wang, Zhiying; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ISIT.2009.5205972
Rank modulation has been introduced as a new information representation scheme for flash memories. Given the charge levels of a group of flash cells, sorting is used to induce a permutation, which in turn represents data. Motivated by the lower sorting complexity of smaller cell groups, we consider bounded rank modulation, where a sequence of permutations of given sizes are used to represent data. We study the capacity of bounded rank modulation under the condition that permutations can overlap for higher capacity.https://authors.library.caltech.edu/records/7f210-bjx91Data movement in flash memories
https://resolver.caltech.edu/CaltechAUTHORS:20170321-173656746
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Mateescu, Robert; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ALLERTON.2009.5394879
NAND flash memories are the most widely used non-volatile memories, and data movement is common in flash storage systems. We study data movement solutions that minimize the number of block erasures, which are very important for the efficiency and longevity of flash memories. To move data among n blocks with the help of Δ auxiliary blocks, where every block contains m pages, we present algorithms that use θ(n · min{m, log_Δ n}) erasures without the tool of coding. We prove this is almost the best possible for non-coding solutions by presenting a nearly matching lower bound. Optimal data movement can be achieved using coding, where only θ(n) erasures are needed. We present a coding-based algorithm, which has very low coding complexity, for optimal data movement. We further show the NP hardness of both coding-based and non-coding schemes when the objective is to optimize data movement on a per instance basis.https://authors.library.caltech.edu/records/wtbra-ykx96Storage Coding for Wear Leveling in Flash Memories
https://resolver.caltech.edu/CaltechPARADISE:2009.ETR094
Authors: Jiang, Anxiao (Andrew); Mateescu, Robert; Yaakobi, Eitan; Bruck, Jehoshua; Siegel, Paul H.; Vardy, Alexander; Wolf, Jack K.
Year: 2009
NAND flash memories are currently the most widely
used type of flash memories. In a NAND flash memory, although a
cell block consists of many pages, to rewrite one page, the whole
block needs to be erased and reprogrammed. Block erasures
determine the longevity and efficiency of flash memories. So when
data is frequently reorganized, which can be characterized as a
data movement process, how to minimize block erasures becomes
an important challenge. In this paper, we show that coding
can significantly reduce block erasures for data movement, and
present several optimal or nearly optimal algorithms. While the
sorting-based non-coding schemes require O(n log n) erasures to
move data among n blocks, coding-based schemes use only O(n)
erasures and also optimize the utilization of storage space.https://authors.library.caltech.edu/records/yq8rd-mzk34Low-Complexity Codes for Random and Clustered High-Order Failures in Storage Arrays
https://resolver.caltech.edu/CaltechPARADISE:2009.ETR098
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2009
RC (Random/Clustered) codes are a new efficient array-code family for recovering from 4-erasures. RC codes correct most 4-erasures, and essentially all 4-erasures that are clustered. Clustered erasures are introduced as a new erasure model for storage arrays. This model draws its motivation from correlated device failures, that are caused by physical proximity of devices, or by age proximity of endurance-limited solid-state drives. The reliability of storage arrays that employ RC codes is analyzed and compared to known codes. The new RC code is significantly more efficient, in all practical implementation factors, than the best known 4-erasure correcting MDS code. These factors include: small-write update-complexity, full-device update-complexity, decoding complexity and number of supported devices in the array.https://authors.library.caltech.edu/records/90s7z-gkx15Cyclic Boolean circuits
https://resolver.caltech.edu/CaltechPARADISE:2009.ETR099
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2009
A Boolean circuit is a collection of gates and wires that performs a mapping from Boolean inputs to Boolean outputs. The accepted wisdom is that such circuits must have acyclic (i.e., loop-free or feed-forward) topologies. In fact, the model is often defined this way – as a directed acyclic graph (DAG). And yet simple examples suggest that this is incorrect. We advocate that Boolean circuits should have cyclic topologies (i.e., loops or feedback paths). In other work, we demonstrated the practical implications of this view: digital circuits can be designed with fewer gates if they contain cycles. In this paper, we explore the theoretical underpinnings of the idea. We show that the complexity of implementing Boolean functions can be lower with cyclic topologies than with acyclic topologies. With examples, we show that certain Boolean functions can be implemented by cyclic circuits with as little as one-half the number gates that are required by equivalent acyclic circuits.https://authors.library.caltech.edu/records/xbgvp-n7v04Partial Rank Modulation for Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20110331-130545474
Authors: Wang, Zhiying; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/ISIT.2010.5513597
Rank modulation was recently proposed as an information representation for multilevel flash memories, using permutations or ranks of n flash cells. The current decoding process finds the cell with the i-th highest charge level at iteration i, for i = 1, 2,...,n - 1. Motivated by the need to reduce the number of such iterations, we consider k-partial permutations, where only the highest k cell levels are considered for information representation. We propose a generalization of Gray codes for k-partial permutations such that information is updated efficiently.https://authors.library.caltech.edu/records/dzswn-2ar83Generalizing the Blum-Elias Method for Generating
Random Bits from Markov Chains
https://resolver.caltech.edu/CaltechAUTHORS:20110331-095348080
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/ISIT.2010.5513679
The problem of random number generation from
an uncorrelated random source (of unknown probability distribution)
dates back to von Neumann's 1951 work. Elias (1972)
generalized von Neumann's scheme and showed how to achieve
optimal efficiency in unbiased random bits generation. Hence, a
natural question is what if the sources are correlated? Both Elias
and Samueleson proposed methods for generating unbiased random
bits in the case of correlated sources (of unknown probability
distribution), specifically, they considered finite Markov chains.
However, their proposed methods are not efficient (Samueleson)
or have implementation difficulties (Elias). Blum (1986) devised
an algorithm for efficiently generating random bits from degree-
2 finite Markov chains in expected linear time, however, his
beautiful method is still far from optimality. In this paper, we
generalize Blum's algorithm to arbitrary degree finite Markov
chains and combine it with Elias's method for efficient generation
of unbiased bits. As a result, we provide the first known algorithm
that generates unbiased random bits from an arbitrary finite
Markov chain, operates in expected linear time and achieves the
information-theoretic upper bound on efficiency.https://authors.library.caltech.edu/records/wwjn6-psy10Data Movement and Aggregation in Flash Memories
https://resolver.caltech.edu/CaltechPARADISE:2010.ETR100
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Mateescu, Robert; Bruck, Jehoshua
Year: 2010
NAND flash memories have become the most widely used type of non-volatile memories. In a NAND flash memory, every block of memory cells consists of numerous pages, and rewriting a single page requires the whole block to be erased. As block erasures significantly reduce the longevity, speed and power efficiency of flash memories, it is critical to minimize the number of erasures when data are reorganized. This leads to the data movement problem, where data need to be switched in blocks, and the objective is to minimize the number of block erasures. It has been shown that optimal solutions can be obtained by coding. However, coding-based algorithms with the minimum coding complexity still remain an important topic to study.
In this paper, we present a very efficient data movement algorithm with coding over GF(2) and with the minimum storage requirement. We also study data movement with more auxiliary blocks and present its corresponding solution. Furthermore, we extend the study to the data aggregation problem, where data can not only be moved but also aggregated. We present both non-coding and coding-based solutions, and rigorously prove the performance gain by using coding.https://authors.library.caltech.edu/records/nemx7-1bs28On the Synthesis of Stochastic Flow Networks
https://resolver.caltech.edu/CaltechPARADISE:2010.ETR101
Authors: Zhou, Hongchao; Chen, Ho-Lin; Bruck, Jehoshua
Year: 2010
DOI: 10.48550/arXiv.1209.0724
A stochastic flow network is a directed graph with incoming edges (inputs) and outgoing edges (outputs), tokens enter through the input edges, travel stochastically in the network and can exit the network through the output edges. Each node in the network is a splitter, namely, a token can enter a node through an incoming edge and exit on one of the output edges according to a predefined probability distribution. We address the following synthesis question: Given a finite set of possible splitters and an arbitrary rational probability distribution, design a stochastic flow network, such that every token that enters the input edge will exit the outputs with the prescribed probability distribution.
The problem of probability synthesis dates back to von Neummann's 1951 work and was followed, among others, by Knuth and Yao in 1976, who demonstrated that arbitrary rational probabilities can be generated with tree networks; where minimizing the expected path length, the expected number of coin tosses in their paradigm, is the key consideration. Motivated by the synthesis of stochastic DNA based molecular systems, we focus on designing optimal size stochastic flow networks (the size of a network is the number of splitters). We assume that each splitter has two outgoing edges and is unbiased (probability 1/2 per output edge). We show that an arbitrary rational probability a/b with a ≤ b ≤ 2^n can be realized by a stochastic flow network of size n, we also show that this is optimal. We note that our stochastic flow networks have feedback (cycles in the network), in fact, we demonstrate that feedback improves the expressibility of stochastic flow networks, since without feedback only probabilities of the form a/2^n (a an integer) can be realized.https://authors.library.caltech.edu/records/tmtgy-rqv10On the Capacity of the Precision-Resolution System
https://resolver.caltech.edu/CaltechAUTHORS:20100407-095238680
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/TIT.2009.2039089
Arguably, the most prominent constrained system in storage applications is the (d,k)-run-length limited (RLL) system, where every binary sequence obeys the constraint that every two adjacent 1's are separated by at least d consecutive 0's and at most k consecutive 0's, namely, runs of 0's are length limited. The motivation for the RLL constraint arises mainly from the physical limitations of the read and write technologies in magnetic and optical storage systems. We revisit the rationale for the RLL system, reevaluate its relationship to the constraints of the physical media and propose a new framework that we call the Precision-Resolution (PR) system. Specifically, in the PR system there is a separation between the encoder constraints (which relate to the precision of writing information into the physical media) and the decoder constraints (which relate to its resolution, namely, the ability to distinguish between two different signals received by reading the physical media). We compute the capacity of a general PR system and compare it to the traditional RLL system.https://authors.library.caltech.edu/records/92wrr-4jm54Codes for Asymmetric Limited-Magnitude Errors With Application to Multilevel Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20100415-101346040
Authors: Cassuto, Yuval; Schwartz, Moshe; Bohossian, Vasken; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/TIT.2010.2040971
Several physical effects that limit the reliability and performance of multilevel flash memories induce errors that have low magnitudes and are dominantly asymmetric. This paper studies block codes for asymmetric limited-magnitude errors over q-ary channels. We propose code constructions and bounds for such channels when the number of errors is bounded by t and the error magnitudes are bounded by ℓ. The constructions utilize known codes for symmetric errors, over small alphabets, to protect large-alphabet symbols from asymmetric limited-magnitude errors. The encoding and decoding of these codes are performed over the small alphabet whose size depends only on the maximum error magnitude and is independent of the alphabet size of the outer code. Moreover, the size of the codes is shown to exceed the sizes of known codes (for related error models), and asymptotic rate-optimality results are proved. Extensions of the construction are proposed to accommodate variations on the error model and to include systematic codes as a benefit to practical implementation.https://authors.library.caltech.edu/records/rj4a5-1yq88Correcting Charge-Constrained Errors in the Rank-Modulation Scheme
https://resolver.caltech.edu/CaltechAUTHORS:20100617-092653362
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua; Schwartz, Moshe
Year: 2010
DOI: 10.1109/TIT.2010.2043764
We investigate error-correcting codes for a the
rank-modulation scheme with an application to flash memory
devices. In this scheme, a set of n cells stores information in the
permutation induced by the different charge levels of the individual
cells. The resulting scheme eliminates the need for discrete
cell levels, overcomes overshoot errors when programming cells (a
serious problem that reduces the writing speed), and mitigates the
problem of asymmetric errors. In this paper, we study the properties
of error-correcting codes for charge-constrained errors in the
rank-modulation scheme. In this error model the number of errors
corresponds to the minimal number of adjacent transpositions required
to change a given stored permutation to another erroneous
one—a distance measure known as Kendall's τ-distance.We show
bounds on the size of such codes, and use metric-embedding techniques
to give constructions which translate a wealth of knowledge
of codes in the Lee metric to codes over permutations in Kendall's
τ-metric. Specifically, the one-error-correcting codes we construct
are at least half the ball-packing upper bound.https://authors.library.caltech.edu/records/dqy7b-w7b59A Modular Voting Architecture ("Frog Voting")
https://resolver.caltech.edu/CaltechAUTHORS:20100805-151724246
Authors: Bruck, Shuki; Jefferson, David; Rivest, Ronald L.
Year: 2010
DOI: 10.1007/978-3-642-12980-3_5
This paper presents a new framework-a reference architecture-for
voting that we feel has many attractive features. It is not a machine
design, but rather a framework that will stimulate innovation and design.
It is potentially the standard architecture for all future voting equipment.
The ideas expressed here are subject to improvement and further
research.
(An early version of this paper appeared in [2, Part III]. This version of
the paper is very similar, but contains a postscript (Section 8) providing
commentary and discussion of perspectives on this proposal generated
during the intervening years between 2001 and 2008.)https://authors.library.caltech.edu/records/0v6sy-g8e79Data movement and aggregation in flash memories
https://resolver.caltech.edu/CaltechAUTHORS:20170309-135756699
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Mateescu, Robert; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/ISIT.2010.5513391
NAND flash memories have become the most widely used type of non-volatile memories. In a NAND flash memory, every block of memory cells consists of numerous pages, and rewriting a single page requires the whole block to be erased. As block erasures significantly reduce the longevity, speed and power efficiency of flash memories, it is critical to minimize the number of erasures when data are reorganized. This leads to the data movement problem, where data need to be switched in blocks, and the objective is to minimize the number of block erasures. It has been shown that optimal solutions can be obtained by coding. However, coding-based algorithms with the minimum coding complexity still remain an important topic to study.
In this paper, we present a very efficient data movement algorithm with coding over GF(2) and with the minimum storage requirement. We also study data movement with more auxiliary blocks and present its corresponding solution. Furthermore, we extend the study to the data aggregation problem, where data can not only be moved but also aggregated. We present both non-coding and coding-based solutions, and rigorously prove the performance gain by using coding.https://authors.library.caltech.edu/records/kbrtc-fah11Efficiently Generating Random Bits from Finite State Markov Chains
https://resolver.caltech.edu/CaltechPARADISE:2010.ETR102
Authors: Zhao, Hongchao; Bruck, Jehoshua
Year: 2010
The problem of random number generation from an uncorrelated random source (of unknown probability distribution) dates back to von Neumann's 1951 work. Elias (1972) generalized von Neumann's scheme and showed how to achieve optimal efficiency in unbiased random bits generation. Hence, a natural question is what if the sources are correlated? Both Elias and Samuelson proposed methods for generating unbiased random bits in the case of correlated sources (of unknown probability distribution), specifically, they considered finite Markov chains. However, their proposed methods are not efficient or have implementation difficulties. Blum (1986) devised an algorithm for efficiently generating random bits from degree-2 finite Markov chains in expected linear time, however, his beautiful method is still far from optimality on information-efficiency. In this paper, we generalize Blum's algorithm to arbitrary degree finite Markov chains and combine it with Elias's method for efficient generation of unbiased bits. As a result, we provide the first known algorithm that generates unbiased random bits from an arbitrary finite Markov chain, operates in expected linear time and achieves the information-theoretic upper bound on efficiency.https://authors.library.caltech.edu/records/07mcb-e2741On the Synthesis of Stochastic Flow Networks
https://resolver.caltech.edu/CaltechAUTHORS:20110331-132532031
Authors: Zhou, Hongchao; Chen, Ho-Lin; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/ISIT.2010.5513754
A stochastic flow network is a directed graph with incoming edges (inputs) and outgoing edges (outputs), tokens enter through the input edges, travel stochastically in the network and can exit the network through the output edges. Each node in the network is a splitter, namely, a token can enter a node through an incoming edge and exit on one of the output edges according to a predefined probability distribution. We address the following synthesis question: Given a finite set of possible splitters and an arbitrary rational probability distribution, design a stochastic flow network, such that every token that enters the input edge will exit the outputs with the prescribed probability distribution. The problem of probability synthesis dates back to von Neummann's 1951 work and was followed, among others, by Knuth and Yao in 1976, who demonstrated that arbitrary rational probabilities can be generated with tree networks; where minimizing the expected path length, the expected number of coin tosses in their paradigm, is the key consideration. Motivated by the synthesis of stochastic DNA based molecular systems, we focus on designing optimal-sized stochastic flow networks (the size of a network is the number of splitters). We assume that each splitter has two outgoing edges and is unbiased (probability 1/2 per output edge). We show that an arbitrary rational probability a/b with a ≤ b ≤ 2^n can be realized by a stochastic flow network of size n, we also show that this is optimal. We note that our stochastic flow networks have feedback (cycles in the network), in fact, we demonstrate that feedback improves the expressibility of stochastic flow networks, since without feedback only probabilities of the form ^a/_2^n) (a an integer) can be realized.https://authors.library.caltech.edu/records/hccfy-ah066Rebuilding for Array Codes in Distributed Storage Systems
https://resolver.caltech.edu/CaltechPARADISE:2010.ETR103
Authors: Wang, Zhiying; Dimakis, Alexandros G.; Bruck, Jehoshua
Year: 2010
In distributed storage systems that use coding, the issue of minimizing the communication required to rebuild a storage node after a failure arises. We consider the problem of repairing an erased node in a distributed storage system that uses an EVENODD code. EVENODD codes are maximum distance separable (MDS) array codes that are used to protect against erasures, and only require XOR operations for encoding and decoding. We show that when there are two redundancy nodes, to rebuild one erased systematic node, only 3=4 of the information needs to be transmitted. Interestingly, in many cases, the required disk I/O is also minimized.https://authors.library.caltech.edu/records/h4jxr-xyf21Storage Coding for Wear Leveling in Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20170309-140500073
Authors: Jiang, Anxiao (Andrew); Mateescu, Robert; Yaakobi, Eitan; Bruck, Jehoshua; Siegel, Paul H.; Vardy, Alexander; Wolf, Jack K.
Year: 2010
DOI: 10.1109/TIT.2010.2059833
Flash memory is a nonvolatile computer memory comprised of blocks of cells, wherein each cell is implemented as either NAND or NOR floating gate. NAND flash is currently the most widely used type of flash memory. In a NAND flash memory, every block of cells consists of numerous pages; rewriting even a single page requires the whole block to be erased and reprogrammed. Block erasures determine both the longevity and the efficiency of a flash memory. Therefore, when data in a NAND flash memory are reorganized, minimizing the total number of block erasures required to achieve the desired data movement is an important goal. This leads to the flash data movement problem studied in this paper. We show that coding can significantly reduce the number of block erasures required for data movement, and present several optimal or nearly optimal data-movement algorithms based upon ideas from coding theory and combinatorics. In particular, we show that the sorting-based (noncoding) schemes require O(n log n) erasures to move data among n blocks, whereas coding-based schemes require only O(n) erasures. Furthermore, coding-based schemes use only one auxiliary block, which is the best possible and achieve a good balance between the number of erasures in each of the n+1 blocks.https://authors.library.caltech.edu/records/cfs8k-bx907Rewriting Codes for Joint Information Storage in Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20101108-153955478
Authors: Jiang, Anxiao; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/TIT.2010.2059530
Memories whose storage cells transit irreversibly between
states have been common since the start of the data storage
technology. In recent years, flash memories have become a very
important family of such memories. A flash memory cell has q
states—state 0.1.....q-1 - and can only transit from a lower
state to a higher state before the expensive erasure operation takes
place. We study rewriting codes that enable the data stored in a
group of cells to be rewritten by only shifting the cells to higher
states. Since the considered state transitions are irreversible, the
number of rewrites is bounded. Our objective is to maximize the
number of times the data can be rewritten. We focus on the joint
storage of data in flash memories, and study two rewriting codes
for two different scenarios. The first code, called floating code, is for
the joint storage of multiple variables, where every rewrite changes
one variable. The second code, called buffer code, is for remembering
the most recent data in a data stream. Many of the codes
presented here are either optimal or asymptotically optimal. We
also present bounds to the performance of general codes. The results
show that rewriting codes can integrate a flash memory's
rewriting capabilities for different variables to a high degree.https://authors.library.caltech.edu/records/zy6p3-5qb43On a construction for constant-weight Gray codes for local rank modulation
https://resolver.caltech.edu/CaltechAUTHORS:20170309-151553099
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/EEEI.2010.5661923
We consider the local rank-modulation scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. The local rank-modulation, as a generalization of the rank-modulation scheme, has been recently suggested as a way of storing information in flash memory. We study constant-weight Gray codes for the local rank-modulation scheme in order to simulate conventional multilevel flash cells while retaining the benefits of rank modulation. We describe a construction for a codes of rate tending to 1.https://authors.library.caltech.edu/records/hc713-y6h87Rebuilding for Array Codes in Distributed Storage Systems
https://resolver.caltech.edu/CaltechAUTHORS:20110707-082718436
Authors: Wang, Zhiying; Dimakis, Alexandros G.; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/GLOCOMW.2010.5700274
In distributed storage systems that use coding, the issue of minimizing the communication required to rebuild a storage node after a failure arises. We consider the problem of repairing an erased node in a distributed storage system that uses an EVENODD code. EVENODD codes are maximum distance separable (MDS) array codes that are used to protect against erasures, and only require XOR operations for encoding and decoding. We show that when there are two redundancy nodes, to rebuild one erased systematic node, only 3/4 of the information needs to be transmitted. Interestingly, in many cases, the required disk I/O is also minimized.https://authors.library.caltech.edu/records/fd7qm-72d07Constant-Weight Gray Codes for Local Rank Modulation
https://resolver.caltech.edu/CaltechPARADISE:2010.ETR105
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2010
We consider the local rank-modulation scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. Local rank- modulation is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory.
We study constant-weight Gray codes for the local rank- modulation scheme in order to simulate conventional multi-level flash cells while retaining the benefits of rank modulation. We provide necessary conditions for the existence of cyclic and cyclic optimal Gray codes. We then specifically study codes of weight 2 and upper bound their efficiency, thus proving that there are no such asymptotically-optimal cyclic codes. In contrast, we study codes of weight 3 and efficiently construct codes which are asymptotically-optimal. We conclude with a construction of codes with asymptotically-optimal rate and weight asymptotically half the length, thus having an asymptotically-optimal charge difference between adjacent cells.https://authors.library.caltech.edu/records/vhtjs-aqs77Trajectory Codes for Flash Memory
https://resolver.caltech.edu/CaltechPARADISE:2010.ETR104
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2011
DOI: 10.48550/arXiv.1012.5430
Flash memory is well-known for its inherent asymmetry: the flash-cell charge levels are easy to increase but are hard to decrease. In a general rewriting model, the stored data changes its value with certain patterns. The patterns of data updates are determined by the data structure and the application, and are independent of the constraints imposed by the storage medium. Thus, an appropriate coding scheme is needed so that the data changes can be updated and stored efficiently under the storage-medium's constraints.
In this paper, we define the general rewriting problem using a graph model. It extends many known rewriting models such as floating codes, WOM codes, buffer codes, etc. We present a new rewriting scheme for flash memories, called the trajectory code, for rewriting the stored data as many times as possible without block erasures. We prove that the trajectory code is asymptotically optimal in a wide range of scenarios.
We also present randomized rewriting codes optimized for expected performance (given arbitrary rewriting sequences). Our rewriting codes are shown to be asymptotically optimal.https://authors.library.caltech.edu/records/j1e3x-bk329Generating Probability Distributions using Multivalued Stochastic Relay Circuits
https://resolver.caltech.edu/CaltechPARADISE:2011.ETR106
Authors: Lee, David; Bruck, Jehoshua
Year: 2011
DOI: 10.48550/arXiv.1102.1441
The problem of random number generation dates
back to von Neumann's work in 1951. Since then, many algorithms
have been developed for generating unbiased bits from
complex correlated sources as well as for generating arbitrary
distributions from unbiased bits. An equally interesting, but less
studied aspect is the structural component of random number
generation as opposed to the algorithmic aspect. That is, given
a network structure imposed by nature or physical devices,
how can we build networks that generate arbitrary probability
distributions in an optimal way?
In this paper, we study the generation of arbitrary probability
distributions in multivalued relay circuits, a generalization in
which relays can take on any of N states and the logical
'and' and 'or' are replaced with 'min' and 'max' respectively.
Previous work was done on two-state relays. We generalize these
results, describing a duality property and networks that generate
arbitrary rational probability distributions. We prove that these
networks are robust to errors and design a universal probability
generator which takes input bits and outputs arbitrary binary
probability distributions.https://authors.library.caltech.edu/records/ez3f7-4en24Generalized Gray Codes for Local Rank Modulation
https://resolver.caltech.edu/CaltechPARADISE:2011.ETR107
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2011
DOI: 10.48550/arXiv.1103.0317
We consider the local rank-modulation scheme in
which a sliding window going over a sequence of real-valued
variables induces a sequence of permutations. Local rank-modulation
is a generalization of the rank-modulation scheme,
which has been recently suggested as a way of storing information
in flash memory.
We study Gray codes for the local rank-modulation scheme
in order to simulate conventional multi-level flash cells while
retaining the benefits of rank modulation. Unlike the limited
scope of previous works, we consider code constructions for the
entire range of parameters including the code length, sliding
window size, and overlap between adjacent windows. We show
our constructed codes have asymptotically-optimal rate. We also
provide efficient encoding, decoding, and next-state algorithms.https://authors.library.caltech.edu/records/wdvya-sq235Compressed Encoding for Rank Modulation
https://resolver.caltech.edu/CaltechPARADISE:2011.ETR108
Authors: En Gad, Eyal; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2011
DOI: 10.48550/arXiv.1108.2741
Rank modulation has been recently proposed as
a scheme for storing information in flash memories. While
rank modulation has advantages in improving write speed and
endurance, the current encoding approach is based on the "push
to the top" operation that is not efficient in the general case. We
propose a new encoding procedure where a cell level is raised to
be higher than the minimal necessary subset -instead of all - of
the other cell levels. This new procedure leads to a significantly
more compressed (lower charge levels) encoding. We derive an
upper bound for a family of codes that utilize the proposed
encoding procedure, and consider code constructions that achieve
that bound for several special cases.https://authors.library.caltech.edu/records/6r95m-v4e81MDS Array Codes with Optimal Rebuilding
https://resolver.caltech.edu/CaltechPARADISE:2011.ETR110
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2011
DOI: 10.48550/arXiv.1103.3737
MDS array codes are widely used in storage systems
to protect data against erasures. We address the rebuilding ratio
problem, namely, in the case of erasures, what is the the fraction
of the remaining information that needs to be accessed in order
to rebuild exactly the lost information? It is clear that when the
number of erasures equals the maximum number of erasures
that an MDS code can correct then the rebuilding ratio is 1
(access all the remaining information). However, the interesting
(and more practical) case is when the number of erasures is
smaller than the erasure correcting capability of the code. For
example, consider an MDS code that can correct two erasures:
What is the smallest amount of information that one needs to
access in order to correct a single erasure? Previous work showed
that the rebuilding ratio is bounded between 1/2 and 3/4 , however,
the exact value was left as an open problem. In this paper, we
solve this open problem and prove that for the case of a single
erasure with a 2-erasure correcting code, the rebuilding ratio is
1/2 . In general, we construct a new family of r-erasure correcting
MDS array codes that has optimal rebuilding ratio of 1/r
in the
case of a single erasure. Our array codes have efficient encoding
and decoding algorithms (for the case r = 2 they use a finite field
of size 3) and an optimal update property.https://authors.library.caltech.edu/records/vp0h3-kfs88Patterned cells for phase change memories
https://resolver.caltech.edu/CaltechAUTHORS:20170213-160905267
Authors: Jiang, Anxiao (Andrew); Zhou, Hongchao; Wang, Zhiying; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033979
Phase-change memory (PCM) is an emerging nonvolatile memory technology that promises very high performance. It currently uses discrete cell levels to represent data, controlled by a single amorphous/crystalline domain in a cell. To improve data density, more levels per cell are needed. There exist a number of challenges, including cell programming noise, drifting of cell levels, and the high power requirement for cell programming. In this paper, we present a new cell structure called patterned cell, and explore its data representation schemes. Multiple domains per cell are used, and their connectivity is used to store data. We analyze its storage capacity, and study its error-correction capability and the construction of error-control codes.https://authors.library.caltech.edu/records/dpzzx-7bf24Patterned cells for phase change memories
https://resolver.caltech.edu/CaltechAUTHORS:20170213-160905267
Authors: Jiang, Anxiao (Andrew); Zhou, Hongchao; Wang, Zhiying; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033979
Phase-change memory (PCM) is an emerging nonvolatile memory technology that promises very high performance. It currently uses discrete cell levels to represent data, controlled by a single amorphous/crystalline domain in a cell. To improve data density, more levels per cell are needed. There exist a number of challenges, including cell programming noise, drifting of cell levels, and the high power requirement for cell programming. In this paper, we present a new cell structure called patterned cell, and explore its data representation schemes. Multiple domains per cell are used, and their connectivity is used to store data. We analyze its storage capacity, and study its error-correction capability and the construction of error-control codes.https://authors.library.caltech.edu/records/77cta-jap09Nonuniform Codes for Correcting Asymmetric Errors
https://resolver.caltech.edu/CaltechAUTHORS:20120406-093123448
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033689
Codes that correct asymmetric errors have important applications in storage systems, including optical disks and Read Only Memories. The construction of asymmetric error correcting codes is a topic that was studied extensively, however, the existing approach for code construction assumes that every codeword could sustain t asymmetric errors. Our main observation is that in contrast to symmetric errors, where the error probability of a codeword is context independent (since the error probability for 1s and 0s is identical), asymmetric errors are context dependent. For example, the all-1 codeword has a higher error probability than the all-0 codeword (since the only errors are 1 → 0). We call the existing codes uniform codes while we focus on the notion of nonuniform codes, namely, codes whose codewords can tolerate different numbers of asymmetric errors depending on their Hamming weights. The goal of nonuniform codes is to guarantee the reliability of every codeword, which is important in data storage to retrieve whatever one wrote in. We prove an almost explicit upper bound on the size of nonuniform asymmetric error correcting codes and present two general constructions. We also study the rate of nonuniform codes compared to uniform codes and show that there is a potential performance gain.https://authors.library.caltech.edu/records/bk5h1-qa850MDS Array Codes with Optimal Rebuilding
https://resolver.caltech.edu/CaltechAUTHORS:20120406-093959188
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033733
MDS array codes are widely used in storage systems to protect data against erasures. We address the rebuilding ratio problem, namely, in the case of erasures, what is the the fraction of the remaining information that needs to be accessed in order to rebuild exactly the lost information? It is clear that when the number of erasures equals the maximum number of erasures that an MDS code can correct then the rebuilding ratio is 1 (access all the remaining information). However, the interesting (and more practical) case is when the number of erasures is smaller than the erasure correcting capability of the code. For example, consider an MDS code that can correct two erasures: What is the smallest amount of information that one needs to access in order to correct a single erasure? Previous work showed that the rebuilding ratio is bounded between 1/2 and 3/4, however, the exact value was left as an open problem. In this paper, we solve this open problem and prove that for the case of a single erasure with a 2-erasure correcting code, the rebuilding ratio is 1/2. In general, we construct a new family of r-erasure correcting MDS array codes that has optimal rebuilding ratio of 1/r in the case of a single erasure. Our array codes have efficient encoding and decoding algorithms (for the case r = 2 they use a finite field of size 3) and an optimal update property.https://authors.library.caltech.edu/records/2rrs6-0d438Generalized Gray Codes for Local Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20120405-102509210
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6034262
We consider the local rank-modulation scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. Local rank-modulation is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study Gray codes for the local rank-modulation scheme in order to simulate conventional multi-level flash cells while retaining the benefits of rank modulation. Unlike the limited scope of previous works, we consider code constructions for the entire range of parameters including the code length, sliding window size, and overlap between adjacent windows. We show our constructed codes have asymptotically-optimal rate. We also provide efficient encoding, decoding, and next-state algorithms.https://authors.library.caltech.edu/records/e70ns-vgr95Error-Correcting Schemes with Dynamic Thresholds in Nonvolatile Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120406-094817699
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033936
Predetermined fixed thresholds are commonly used in nonvolatile memories for reading binary sequences, but they usually result in significant asymmetric errors after a long duration, due to voltage or resistance drift. This motivates us to construct error-correcting schemes with dynamic reading thresholds, so that the asymmetric component of errors are minimized. In this paper, we discuss how to select dynamic reading thresholds without knowing cell level distributions, and present several error-correcting schemes. Analysis based on Gaussian noise models reveals that bit error probabilities can be significantly reduced by using dynamic thresholds instead of fixed thresholds, hence leading to a higher information rate.https://authors.library.caltech.edu/records/mct4d-gk305Compressed Encoding for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20120405-104551517
Authors: En Gad, Eyal; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6034264
Rank modulation has been recently proposed as a scheme for storing information in flash memories. While rank modulation has advantages in improving write speed and endurance, the current encoding approach is based on the "push to the top" operation that is not efficient in the general case. We propose a new encoding procedure where a cell level is raised to be higher than the minimal necessary subset -instead of all - of the other cell levels. This new procedure leads to a significantly more compressed (lower charge levels) encoding. We derive an upper bound for a family of codes that utilize the proposed encoding procedure, and consider code constructions that achieve that bound for several special cases.https://authors.library.caltech.edu/records/6snc1-5vh55On Codes for Optimal Rebuilding Access
https://resolver.caltech.edu/CaltechPARADISE:2011.ETR111
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2011
DOI: 10.48550/arXiv.1107.1627
MDS (maximum distance separable) array codes
are widely used in storage systems due to their computationally
efficient encoding and decoding procedures. An MDS code with
r redundancy nodes can correct any r erasures by accessing
(reading) all the remaining information in both the systematic
nodes and the parity (redundancy) nodes. However, in practice,
a single erasure is the most likely failure event; hence, a natural
question is how much information do we need to access in order
to rebuild a single storage node? We define the rebuilding ratio
as the fraction of remaining information accessed during the
rebuilding of a single erasure. In our previous work we showed
that the optimal rebuilding ratio of 1/r is achievable (using
our newly constructed array codes) for the rebuilding of any
systematic node, however, all the information needs to be accessed
for the rebuilding of the parity nodes. Namely, constructing array
codes with a rebuilding ratio of 1/r was left as an open problem.
In this paper, we solve this open problem and present array codes
that achieve the lower bound of 1/r for rebuilding any single
systematic or parity node.https://authors.library.caltech.edu/records/vz336-bdm65Systematic Error-Correcting Codes for Rank Modulation
https://resolver.caltech.edu/CaltechPARADISE:2011.ETR112
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2011
DOI: 10.48550/arXiv.1310.6817
The rank modulation scheme has been proposed recently for efficiently writing and storing data in nonvolatile
memories. Error-correcting codes are very important for rank
modulation; however, existing results have bee limited.
In this work, we explore a new approach, systematic error-correcting codes for rank modulation. Systematic codes have the benefits of enabling efficient information retrieval and potentially supporting more efficient encoding and decoding procedures. We study systematic codes for rank modulation equipped with the Kendall's τ-distance. We present (k + 2, k) systematic codes for correcting one error, which have optimal rates unless perfect
codes exist. We also study the design of multi-error-correcting codes, and prove that for any 2 ≤ k < n, there always exists an (n,k) systematic code of minimum distance
n − k. Furthermore, we prove that for rank modulation, systematic codes achieve the same capacity as general error-correcting codes.https://authors.library.caltech.edu/records/xp404-dac89Neural network computation with DNA strand displacement cascades
https://resolver.caltech.edu/CaltechAUTHORS:20110801-112437228
Authors: Qian, Lulu; Winfree, Erik; Bruck, Jehoshua
Year: 2011
DOI: 10.1038/nature10262
The impressive capabilities of the mammalian brain—ranging from perception, pattern recognition and memory formation to decision making and motor activity control—have inspired their re-creation in a wide range of artificial intelligence systems for applications such as face recognition, anomaly detection, medical diagnosis and robotic vehicle control. Yet before neuron-based brains evolved, complex biomolecular circuits provided individual cells with the 'intelligent' behaviour required for survival. However, the study of how molecules can 'think' has not produced an equal variety of computational models and applications of artificial chemical systems. Although biomolecular systems have been hypothesized to carry out neural-network-like computations in vivo and the synthesis of artificial chemical analogues has been proposed theoretically, experimental work has so far fallen short of fully implementing even a single neuron. Here, building on the richness of DNA computing and strand displacement circuitry, we show how molecular systems can exhibit autonomous brain-like behaviours. Using a simple DNA gate architecture that allows experimental scale-up of multilayer digital circuits, we systematically transform arbitrary linear threshold circuits (an artificial neural network model) into DNA strand displacement cascades that function as small neural networks. Our approach even allows us to implement a Hopfield associative memory with four fully connected artificial neurons that, after training in silico, remembers four single-stranded DNA patterns and recalls the most similar one when presented with an incomplete pattern. Our results suggest that DNA strand displacement cascades could be used to endow autonomous chemical systems with the capability of recognizing patterns of molecular events, making decisions and responding to the environment.https://authors.library.caltech.edu/records/fryqk-9d474Linear extractors for extracting randomness from noisy sources
https://resolver.caltech.edu/CaltechAUTHORS:20120330-134402245
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033845
Linear transformations have many applications in information theory, like data compression and error-correcting codes design. In this paper, we study the power of linear transformations in randomness extraction, namely linear extractors, as another important application. Comparing to most existing methods for randomness extraction, linear extractors (especially those constructed with sparse matrices) are computationally fast and can be simply implemented with hardware like FPGAs, which makes them very attractive in practical use. We mainly focus on simple, efficient and sparse constructions of linear extractors. Specifically, we demonstrate that random matrices can generate random bits very efficiently from a variety of noisy sources, including noisy coin sources, bit-fixing sources, noisy (hidden) Markov sources, as well as their mixtures. It shows that low-density random matrices have almost the same efficiency as high-density random matrices when the input sequence is long, which provides a way to simplify hardware/software implementation. Note that although we constructed matrices with randomness, they are deterministic (seedless) extractors - once we constructed them, the same construction can be used for any number of times without using any seeds. Another way to construct linear extractors is based on generator matrices of primitive BCH codes. This method is more explicit, but less practical due to its computational complexity and dimensional constraints.https://authors.library.caltech.edu/records/2p13s-m0316Transforming Probabilities With Combinational Logic
https://resolver.caltech.edu/CaltechAUTHORS:20110922-104607230
Authors: Qian, Weikang; Riedel, Marc D.; Zhou, Hongchao; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/TCAD.2011.2144630
Schemes for probabilistic computation can exploit
physical sources to generate random values in the form of
bit streams. Generally, each source has a fixed bias and so
provides bits with a specific probability of being one. If many different probability values are required, it can be expensive to generate all of these directly from physical sources. This paper demonstrates novel techniques for synthesizing combinational logic that transforms source probabilities into different target probabilities. We consider three scenarios in terms of whether the source probabilities are specified and whether they can be
duplicated. In the case that the source probabilities are not specified and can be duplicated, we provide a specific choice, the set {0.4, 0.5}; we show how to synthesize logic that transforms probabilities from this set into arbitrary decimal probabilities. Further, we show that for any integer n ≥ 2, there exists a single probability that can be transformed into arbitrary base-n fractional probabilities. In the case that the source probabilities
are specified and cannot be duplicated, we provide two methods for synthesizing logic to transform them into target probabilities. In the case that the source probabilities are not specified, but once chosen cannot be duplicated, we provide an optimal choice.https://authors.library.caltech.edu/records/x1f81-z2p41Constant-Weight Gray Codes for Local Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20120420-092834437
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/TIT.2011.2162570
We consider the local rank-modulation (LRM) scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. LRM is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study constant-weight Gray codes for the LRM scheme in order to simulate conventional multilevel flash cells while retaining the benefits of rank modulation. We present a practical construction of codes with asymptotically-optimal rate and weight asymptotically half the length, thus having an asymptotically-optimal charge difference between adjacent cells. Next, we turn to examine the existence of optimal codes by specifically studying codes of weight 2 and 3. In the former case, we upper bound the code efficiency, proving that there are no such asymptotically-optimal cyclic codes. In contrast, for the latter case we construct codes which are asymptotically-optimal. We conclude by providing necessary conditions for the existence of cyclic and cyclic optimal Gray codes.https://authors.library.caltech.edu/records/xa8fe-w9a19Low-Complexity Array Codes for Random and Clustered 4-Erasures
https://resolver.caltech.edu/CaltechAUTHORS:20120203-154015264
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/TIT.2011.2171518
A new family of low-complexity array codes is proposed for correcting 4 column erasures. The new codes are tailored for the new error model of clustered column erasures that captures the properties of high-order failure combinations in storage arrays. The model of clustered column erasures considers the number of erased columns, together with the number of clusters into which they fall, without pre-defining the sizes of the clusters. This model addresses the problem of correlated device failures in storage arrays, whereby each failure event may affect multiple devices in a single cluster. The new codes correct essentially all combinations of clustered 4 erasures, i.e., those combinations that fall into three or less clusters. The new codes are significantly more efficient, in all relevant complexity measures, than the best known 4-erasure correcting codes. These measures include encoding complexity, decoding complexity and update complexity.https://authors.library.caltech.edu/records/3xrrc-q9d41On the Capacity and Programming of Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120326-082904019
Authors: Jiang, Anxiao (Andrew); Li, Hao; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/TIT.2011.2177755
Flash memories are currently the most widely used type of nonvolatile memories. A flash memory consists of floating-gate cells as its storage elements, where the charge level stored in a cell is used to represent data. Compared to magnetic recording and optical recording, flash memories have the unique property that the cells are programmed using an iterative procedure that monotonically shifts each cell's charge level upward toward its target value. In this paper, we model the cell as a monotonic storage channel, and explore its capacity and optimal programming. We present two optimal programming algorithms based on a few different noise models and optimization objectives.https://authors.library.caltech.edu/records/m53em-ky777Efficient Generation of Random Bits From Finite State Markov Chains
https://resolver.caltech.edu/CaltechAUTHORS:20120503-095551724
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/TIT.2011.2175698
The problem of random number generation from an uncorrelated random source (of unknown probability distribution) dates back to von Neumann's 1951 work. Elias (1972) generalized von Neumann's scheme and showed how to achieve optimal efficiency in unbiased random bits generation. Hence, a natural question is what if the sources are correlated? Both Elias and Samuelson proposed methods for generating unbiased random bits in the case of correlated sources (of unknown probability distribution), specifically, they considered finite Markov chains. However, their proposed methods are not efficient or have implementation difficulties. Blum (1986) devised an algorithm for efficiently generating random bits from degree-2 finite Markov chains in expected linear time, however, his beautiful method is still far from optimality on information-efficiency. In this paper, we generalize Blum's algorithm to arbitrary degree finite Markov chains and combine it with Elias's method for efficient generation of unbiased bits. As a result, we provide the first known algorithm that generates unbiased random bits from an arbitrary finite Markov chain, operates in expected linear time and achieves the information-theoretic upper bound on efficiency.https://authors.library.caltech.edu/records/hnf87-01v85Variable-level Cells for Nonvolatile Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120502-125850834
Authors: Jiang, Anxiao (Andrew); Zhou, Hongchao; Bruck, Jehoshua
Year: 2012
For many nonvolatile memories, – including flash
memories, phase-change memories, etc., – maximizing the storage
capacity is a key challenge. The existing method is to use multilevel cells (MLC) of more and more levels. The number of levels
supported by MLC is seriously constrained by the worst-case
performance of cell-programming noise and cell heterogeneity.
In this paper, we present variable-level cells (VLC), a new
scheme for maximum storage capacity. It adaptively chooses the
number of levels and the placement of the levels based on the
actual programming performance. We derive its storage capacity,
and present an optimal data representation scheme. We also study
rewriting schemes for VLC, and present inner and outer bounds
to its capacity region.https://authors.library.caltech.edu/records/n50yx-act24Patterned Cells for Phase Change Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120502-130441311
Authors: Jiang, Anxiao (Andrew); Zhou, Hongchao; Wang, Zhiying; Bruck, Jehoshua
Year: 2012
Phase-change memory (PCM) is an emerging nonvolatile memory technology that promises very high performance.
It currently uses discrete cell levels to represent data, controlled
by a single amorphous/crystalline domain in a cell. To improve
data density, more levels per cell are needed. There exist a number of challenges, including cell programming noise, drifting of
cell levels, and the high power requirement for cell programming.
In this paper, we present a new cell structure called patterned cell, and explore its data representation schemes. Multiple
domains per cell are used, and their connectivity is used to
store data. We analyze its storage capacity, and study its error-correction capability and the construction of error-control codes.https://authors.library.caltech.edu/records/6n5zz-gsa64Trade-offs between Instantaneous and Total Capacity in Multi-Cell Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120516-141137208
Authors: En Gad, Eyal; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2012
The limited endurance of flash memories is a major
design concern for enterprise storage systems. We propose a
method to increase it by using relative (as opposed to fixed)
cell levels and by representing the information with Write
Asymmetric Memory (WAM) codes. Overall, our new method
enables faster writes, improved reliability as well as improved
endurance by allowing multiple writes between block erasures.
We study the capacity of the new WAM codes with relative levels,
where the information is represented by multiset permutations
induced by the charge levels, and show that it achieves the
capacity of any other WAM codes with the same number of
writes. Specifically, we prove that it has the potential to double
the total capacity of the memory. Since capacity can be achieved
only with cells that have a large number of levels, we propose a
new architecture that consists of multi-cells - each an aggregation
of a number of floating gate transistors.https://authors.library.caltech.edu/records/gxf0c-h9x26Long MDS Codes for Optimal Repair Bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20120616-221646611
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2012
MDS codes are erasure-correcting codes that can
correct the maximum number of erasures given the number of
redundancy or parity symbols. If an MDS code has r parities
and no more than r erasures occur, then by transmitting all
the remaining data in the code one can recover the original
information. However, it was shown that in order to recover a
single symbol erasure, only a fraction of 1/r of the information
needs to be transmitted. This fraction is called the repair
bandwidth (fraction). Explicit code constructions were given in
previous works. If we view each symbol in the code as a vector
or a column, then the code forms a 2D array and such codes
are especially widely used in storage systems. In this paper, we
ask the following question: given the length of the column l, can
we construct high-rate MDS array codes with optimal repair
bandwidth of 1/r, whose code length is as long as possible? In
this paper, we give code constructions such that the code length
is (r + 1)log_r l.https://authors.library.caltech.edu/records/xt2bt-jdt16Variable-Length Extractors
https://resolver.caltech.edu/CaltechAUTHORS:20120828-165227181
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283024
We study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from non-ideal stochastic processes. The related
interval algorithm proposed by Han and Hoshi has asymptotically optimal performance; however, it assumes that the distribution of the input stochastic process is known. The motivation for our work is the fact that, in practice, sources of randomness have inherent correlations and are affected by measurement's noise. Namely, it is hard to obtain an accurate estimation of the distribution. This challenge was addressed by the concepts of seeded and seedless extractors that can handle general random sources with unknown distributions. However, known seeded and seedless extractors provide extraction efficiencies that are
substantially smaller than Shannon's entropy limit. Our main
contribution is the design of extractors that have a variable input length and a fixed output length, are efficient in the consumption of symbols from the source, are capable of generating random bits from general stochastic processes and approach the information theoretic upper bound on efficiency.https://authors.library.caltech.edu/records/sd3m9-v3h14Trade-offs between Instantaneous and Total Capacity in Multi-Cell Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120828-151832251
Authors: En Gad, Eyal; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6284712
The limited endurance of flash memories is a major design concern for enterprise storage systems. We propose a method to increase it by using relative (as opposed to fixed) cell levels and by representing the information with Write Asymmetric Memory (WAM) codes. Overall, our new method enables faster writes, improved reliability as well as improved endurance by allowing multiple writes between block erasures. We study the capacity of the new WAM codes with relative levels, where the information is represented by multiset permutations induced by the charge levels, and show that it achieves the capacity of any other WAM codes with the same number of writes. Specifically, we prove that it has the potential to double the total capacity of the memory. Since capacity can be achieved only with cells that have a large number of levels, we propose a new architecture that consists of multi-cells — each an aggregation of a number of floating gate transistors.https://authors.library.caltech.edu/records/n5h8x-1m320Systematic Error-Correcting Codes for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20120828-151501177
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6284106
The rank modulation scheme has been proposed recently for efficiently writing and storing data in nonvolatile memories. Error-correcting codes are very important for rank modulation, and they have attracted interest among researchers. In this work, we explore a new approach, systematic error-correcting codes for rank modulation. In an (n,k) systematic code, we use the permutation induced by the levels of n cells to store data, and the permutation induced by the first k cells (k < n) has a one-to-one mapping to information bits. Systematic codes have the benefits of enabling efficient information retrieval and potentially supporting more efficient encoding and decoding procedures. We study systematic codes for rank modulation equipped with the Kendall's τ-distance. We present (k + 2, k) systematic codes for correcting one error, which have optimal sizes unless perfect codes exist. We also study the design of multi-error-correcting codes, and prove that for any 2 ≤ k < n, there always exists an (n, k) systematic code of minimum distance n-k. Furthermore, we prove that for rank modulation, systematic codes achieve the same capacity as general error-correcting codes.https://authors.library.caltech.edu/records/92gz1-39x98On the Uncertainty of Information Retrieval in Associative Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120828-144523977
Authors: Yaakobi, Eitan; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283016
We (people) are memory machines. Our decision processes, emotions and interactions with the world around us are based on and driven by associations to our memories. This
natural association paradigm will become critical in future memory systems, namely, the key question will not be "How do I store more information?" but rather, "Do I have the relevant information? How do I retrieve it?" The focus of this paper is to make a first step in this direction.
We define and solve a very basic problem in associative
retrieval. Given a word W, the words in the memory that are
t-associated with W are the words in the ball of radius t around W. In general, given a set of words, say W, X and Y, the words that are t-associated with {W, X, Y} are those in the memory that are within distance t from all the three words. Our main goal is to study the maximum size of the t-associated set as a function of the number of input words and the minimum distance of the words in memory - we call this value the uncertainty of an associative memory. We derive the uncertainty of the associative memory that consists of all the binary vectors with an arbitrary number of input words. In addition, we study the retrieval
problem, namely, how do we get the t-associated set given
the inputs? We note that this paradigm is a generalization of the sequences reconstruction problem that was proposed by Levenshtein (2001). In this model, a word is transmitted over multiple channels. A decoder receives all the channel outputs and decodes the transmitted word. Levenshtein computed the minimum number of channels that guarantee a successful decoder - this value happens to be the uncertainty of an associative memory with two input words.https://authors.library.caltech.edu/records/bw7jd-3r945Long MDS Codes for Optimal Repair Bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20120829-103740126
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283041
MDS codes are erasure-correcting codes that can correct the maximum number of erasures given the number of redundancy or parity symbols. If an MDS code has r parities and no more than r erasures occur, then by transmitting all the remaining data in the code one can recover the original information. However, it was shown that in order to recover a single symbol erasure, only a fraction of 1/r of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column, then the code forms a 2D array and such codes are especially widely used in storage systems. In this paper, we ask the following question: given the length of the column l, can we construct high-rate MDS array codes with optimal repair bandwidth of 1/r, whose code length is as long as possible? In this paper, we give code constructions such that the code length is (r + l)logr l.https://authors.library.caltech.edu/records/hgwxz-m6n08Long MDS Codes for Optimal Repair Bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20130204-132322886
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283041
MDS codes are erasure-correcting codes that can correct the maximum number of erasures given the number of redundancy or parity symbols. If an MDS code has r parities and no more than r erasures occur, then by transmitting all the remaining data in the code one can recover the original information. However, it was shown that in order to recover a single symbol erasure, only a fraction of 1/r of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column, then the code forms a 2D array and such codes are especially widely used in storage systems. In this paper, we ask the following question: given the length of the column l, can we construct high-rate MDS array codes with optimal repair bandwidth of 1/r, whose code length is as long as possible? In this paper, we give code constructions such that the code length is (r + l)log_r l.https://authors.library.caltech.edu/records/gktdd-nq646Decoding of Cyclic Codes over Symbol-Pair Read Channels
https://resolver.caltech.edu/CaltechAUTHORS:20120828-151322448
Authors: Yaakobi, Eitan; Bruck, Jehoshua; Siegel, Paul H.
Year: 2012
DOI: 10.1109/ISIT.2012.6284053
Symbol-pair read channels, in which the outputs of the read process are pairs of consecutive symbols, were recently studied by Cassuto and Blaum. This new paradigm is motivated by the limitations of the reading process in high density data storage systems. They studied error correction in this new paradigm, specifically, the relationship between the minimum Hamming distance of an error correcting code and the minimum pair distance, which is the minimum Hamming distance between symbol-pair vectors derived from codewords of the code. It was proved that for a linear cyclic code with minimum Hamming distance d_H, the corresponding minimum pair distance is at least d_H + 3. Our main contribution is proving that, for a given linear cyclic code with a minimum Hamming distance d_H, the minimum pair distance is at least d_H + [dH/2]. We also describe decoding algorithms, based upon bounded distance decoders for the cyclic code, whose pair-symbol error correcting capabilities reflects the larger minimum pair distance. In addition, we consider the case where a read channel output is a prescribed number, b > 2, of consecutive symbols and provide some generalizations of our results. We note that the symbol-pair read channel problem is a special case of the sequence reconstruction problem that was introduced by Levenshtein.https://authors.library.caltech.edu/records/7pmwy-6c076Access vs. Bandwidth in Codes for Storage
https://resolver.caltech.edu/CaltechAUTHORS:20120829-092120549
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283042
Maximum distance separable (MDS) codes are widely used in storage systems to protect against disks (nodes) failures. An (n, k, l) MDS code uses n nodes of capacity l to store k information nodes. The MDS property guarantees the resiliency to any n − k node failures. An optimal bandwidth (resp. optimal access) MDS code communicates (resp. accesses) the minimum amount of data during the recovery process of a single failed node. It was shown that this amount equals a fraction of 1/(n − k) of data stored in each node. In previous optimal bandwidth constructions, l scaled polynomially with k in codes with asymptotic rate < 1. Moreover, in constructions with constant number of parities, i.e. rate approaches 1, l scaled exponentially w.r.t. k. In this paper we focus on the practical case of n − k = 2, and ask the following question: Given the capacity of a node l what is the largest (w.r.t. k) optimal bandwidth (resp. access) (k + 2, k, l) MDS code. We give an upper bound for the general case, and two tight bounds in the special cases of two important families of codes.https://authors.library.caltech.edu/records/8ne9e-4q567Cyclic Boolean circuits
https://resolver.caltech.edu/CaltechAUTHORS:20121109-103154708
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2012
DOI: 10.1016/j.dam.2012.03.039
A Boolean circuit is a collection of gates and wires that performs a mapping from Boolean inputs to Boolean outputs. The accepted wisdom is that such circuits must have acyclic (i.e., loop-free or feed-forward) topologies. In fact, the model is often defined this way–as a directed acyclic graph (DAG). And yet simple examples suggest that this is incorrect. We advocate that Boolean circuits should have cyclic topologies (i.e., loops or feedback paths). In other work, we demonstrated the practical implications of this view: digital circuits can be designed with fewer gates if they contain cycles. In this paper, we explore the theoretical underpinnings of the idea. We show that the complexity of implementing Boolean functions can be lower with cyclic topologies than with acyclic topologies. With examples, we show that certain Boolean functions can be implemented by cyclic circuits with as little as one-half the number of gates that are required by equivalent acyclic circuits. We also show a quadratic upper bound: given a cyclic Boolean circuit with m gates, there exists an equivalent acyclic Boolean circuit with m^2 gates.https://authors.library.caltech.edu/records/hkdp1-4se80Content-assisted file decoding for nonvolatile memories
https://resolver.caltech.edu/CaltechAUTHORS:20170207-175141968
Authors: Li, Yue; Wang, Yue; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ACSSC.2012.6489154
Nonvolatile memories (NVMs) such as flash memories play a significant role in meeting the data storage requirements of today's computation activities. The rapid increase of storage density for NVMs however brings reliability issues due to closer alignment of adjacent cells on chip, and more levels that are programmed into a cell. We propose a new method for error correction, which uses the random access capability of NVMs and the redundancy that inherently exists in information content. Although it is theoretically possible to remove the redundancy via data compression, existing source coding algorithms do not remove all of it for efficient computation. We propose a method that can be combined with existing storage solutions for text files, namely content-assisted decoding. Using the statistical properties of words and phrases in the text of a given language, our decoder identifies the location of each subcodeword representing some word in a given input noisy codeword, and flips the bits to compute a most likely word sequence. The decoder can be adapted to work together with traditional ECC decoders to keep the number of errors within the correction capability of traditional decoders. The combined decoding framework is evaluated with a set of benchmark files.https://authors.library.caltech.edu/records/53g4r-0wb20Sequence Reconstruction for Grassmann Graphs and Permutations
https://resolver.caltech.edu/CaltechAUTHORS:20130215-095250632
Authors: Yaakobi, Eitan; Schwartz, Moshe; Langberg, Michael; Bruck, Jehoshua
Year: 2013
The sequence-reconstruction problem was first proposed
by Levenshtein in 2001. This problem studies the model
where the same word is transmitted over multiple channels. If
the transmitted word belongs to some code of minimum distance
d and there are at most r errors in every channel, then the minimum
number of channels that guarantees a successful decoder
(under the assumption that all channel outputs are distinct) has
to be greater than the largest intersection of two balls of radius
r and with distance at least d between their centers.
This paper studies the combinatorial problem of computing
the largest intersection of two balls for two cases. In the first
part we solve this problem in the Grassmann graph for all values
of d and r. In the second part we derive similar results for
permutations under Kendall's t-metric for some special cases of
d and r.https://authors.library.caltech.edu/records/447km-4es02Rank-Modulation Rewriting Codes for Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20130128-144020108
Authors: En Gad, Eyal; Yaakobi, Eitan; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2013
Current flash memory technology is focused on
cost minimization of the stored capacity. However, the resulting
approach supports a relatively small number of write-erase
cycles. This technology is effective for consumer devices (smartphones
and cameras) where the number of write-erase cycles is
small, however, it is not economical for enterprise storage systems
that require a large number of lifetime writes.
Our proposed approach for alleviating this problem consists of
the efficient integration of two key ideas: (i) improving reliability
and endurance by representing the information using relative
values via the rank modulation scheme and (ii) increasing the
overall (lifetime) capacity of the flash device via rewriting codes,
namely, performing multiple writes per cell before erasure.
We propose a new scheme that combines rank-modulation
with rewriting. The key benefits of the new scheme include: (i)
the ability to store close to 2 bits per cell on each write, and
rewrite the memory close to q times, where q is the number
of levels in each cell, and (ii) efficient encoding and decoding
algorithms that use the recently proposed polar WOM codes.https://authors.library.caltech.edu/records/h9m1f-ak728Codes for Network Switches
https://resolver.caltech.edu/CaltechAUTHORS:20130128-153803180
Authors: Wang, Zhiying; Shaked, Omer; Cassuto, Yuval; Bruck, Jehoshua
Year: 2013
A network switch routes data packets between its
multiple input and output ports. Packets from input ports are
stored upon arrival in a switch fabric comprising multiple
memory banks. This can result in memory contention when
distinct output ports request packets from the same memory
bank, resulting in a degraded switching bandwidth. To solve this
problem, we propose to add redundant memory banks for storing
the incoming packets. The problem we address is how to minimize
the number of redundant memory banks given some guaranteed
contention resolution capability. We present constructions of
new switch memory architectures based on different coding
techniques. The codes allow decreasing the redundancy by 1/2
or 2/3, depending on the request specifications, compared to
non-coding solutions.https://authors.library.caltech.edu/records/7ct46-a3j12Information-Theoretic Study of Voting Systems
https://resolver.caltech.edu/CaltechAUTHORS:20130215-092855327
Authors: Yaakobi, Eitan; Langberg, Michael; Bruck, Jehoshua
Year: 2013
The typical paradigm in voting theory involves n
voters and m candidates. Every voter ranks the candidates resulting
in a permutation of the m candidates. A key problem is
to derive the aggregate result of the voting. A popular method
for vote aggregation is based on the Condorcet criterion. The
Condorcet winner is the candidate who wins every other candidate
by pairwise majority. However, the main disadvantage of
this approach, known as the Condorcet paradox, is that such a
winner does not necessarily exist since this criterion does not admit
transitivity. This paradox is mathematically likely (if voters
assign rankings uniformly at random, then with probability approaching
one with the number of candidates, there will not be
a Condorcet winner), however, in real life scenarios such as elections,
it is not likely to encounter the Condorcet paradox. In this
paper we attempt to improve our intuition regarding the gap between
the mathematics and reality of voting systems. We study a
special case where there is global intransitivity between all candidates.
We introduce tools from information theory and derive
an entropy-based characterization of global intransitivity. In addition,
we tighten this characterization by assuming that votes
tend to be similar; in particular they can be modeled as permutations
that are confined to a sphere defined by the Kendalls τ
distance.https://authors.library.caltech.edu/records/ebd4t-esq24In-Memory Computing of Akers Logic Array
https://resolver.caltech.edu/CaltechAUTHORS:20130215-092157295
Authors: Yaakobi, Eitan; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2013
This work studies memories from a different perspective,
while the goal is to explore the concept of in-memory
computing. Our point of departure is an old study of logic arrays
by Akers in 1972. We demonstrate how these arrays can
simultaneously store information and perform logic operations.
We first extend the structure of these arrays for non-binary
alphabets. We then show how a special structure of these arrays
can both store elements and output a sorted version of them. We
also study other examples of the in-memory computing concept.
In this setup, it is shown how information can be stored and
computed with, and the array can tolerate or detect errors in
the stored data.https://authors.library.caltech.edu/records/8g08j-vwq41Error-Correcting Codes for Multipermutations
https://resolver.caltech.edu/CaltechAUTHORS:20130215-094523026
Authors: Buzaglo, Sarit; Yaakobi, Eitan; Etzion, Tuvi; Bruck, Jehoshua
Year: 2013
THIS PAPER IS ELIGIBLE FOR THE STUDENT
PAPER AWARD. Multipermutations appear in various applications
in information theory. New applications such as rank
modulation for flash memories and voting have suggested the need
to consider error-correcting codes for multipermutations. The construction
of codes is challenging when permutations are considered
and it becomes even a harder problem for multipermutations. In
this paper we discuss the general problem of error-correcting codes
for multipermutations. We present some tight bounds on the size of
error-correcting codes for several families of multipermutations.
We find the capacity of the channels of multipermutations and
characterize families of perfect codes in this metric which we
believe are the only such perfect codes.https://authors.library.caltech.edu/records/ft4vp-ka042Building Consensus via Iterative Voting
https://resolver.caltech.edu/CaltechAUTHORS:20130215-093909657
Authors: Farnoud (Hassanzadeh), Farzad; Yaakobi, Eitan; Touri, Behrouz; Milenkovic, Olgica; Bruck, Jehoshua
Year: 2013
In networked systems comprised of many agents, it is often required to reach a common operating point of all agents, termed the network consensus. We consider two iterative methods for reaching a ranking (ordering) consensus over a voter network, where the initial preference of every voter is of the form of a full ordering of candidates. The voters are allowed, one at a time and based on some random scheme, to change their vote to bring them "closer" to the opinions of selected subsets of peers. The first consensus method is based on changing votes one adjacent swap at a time; the second method is based on changing a vote via averaging with the
votes of peers, potentially leading to many adjacent swaps at a time vote. For the first model, we characterize convergence points and conditions for convergence. For the second model, we prove convergence to a global ranking and derive the rate of convergence to this consensus.https://authors.library.caltech.edu/records/81c1h-c4n08Zigzag Codes: MDS Array Codes With Optimal Rebuilding
https://resolver.caltech.edu/CaltechAUTHORS:20130321-102330661
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2013
DOI: 10.1109/TIT.2012.2227110
Maximum distance separable (MDS) array codes are widely used in storage systems to protect data against erasures. We address the rebuilding ratio problem, namely, in the case of erasures, what is the fraction of the remaining information that needs to be accessed in order to rebuild exactly the lost information? It is clear that when the number of erasures equals the maximum number of erasures that an MDS code can correct, then the rebuilding ratio is 1 (access all the remaining information). However, the interesting and more practical case is when the number of erasures is smaller than the erasure correcting capability of the code. For example, consider an MDS code that can correct two erasures: What is the smallest amount of information that one needs to access in order to correct a single erasure? Previous work showed that the rebuilding ratio is bounded between 1/2 and 3/4; however, the exact value was left as an open problem. In this paper, we solve this open problem and prove that for the case of a single erasure with a two-erasure correcting code, the rebuilding ratio is 1/2. In general, we construct a new family of r-erasure correcting MDS array codes that has optimal rebuilding ratio of 1/(r) in the case of a single erasure. Our array codes have efficient encoding and decoding algorithms (for the cases r=2 and r=3, they use a finite field of size 3 and 4, respectively) and an optimal update property.https://authors.library.caltech.edu/records/c80wj-h6a11On the Average Complexity of Reed–Solomon List Decoders
https://resolver.caltech.edu/CaltechAUTHORS:20130429-085506668
Authors: Cassuto, Yuval; Bruck, Jehoshua; McEliece, Robert J.
Year: 2013
DOI: 10.1109/TIT.2012.2235522
The number of monomials required to interpolate a received word in an algebraic list decoder for Reed-Solomon codes depends on the instantaneous channel error, and not only on the decoder design parameters. The implications of this fact are that the decoder should be able to exhibit lower decoding complexity for low-weight errors and, consequently, enjoy a better average-case decoding complexity and a higher decoding throughput. On the analytical side, this paper studies the dependence of interpolation costs on instantaneous errors, in both hard- and soft-decision decoders. On the algorithmic side, it provides an efficient interpolation algorithm, based on the state-of-the-art interpolation algorithm, that enjoys reduced running times for reduced interpolation costs.https://authors.library.caltech.edu/records/z781c-c3a08Nonuniform Codes for Correcting Asymmetric Errors in Data Storage
https://resolver.caltech.edu/CaltechAUTHORS:20130617-115747407
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2013
DOI: 10.1109/TIT.2013.2241175
The construction of asymmetric error-correcting codes is a topic that was studied extensively, however; the existing approach for code construction assumes that every codeword should tolerate t asymmetric errors. Our main observation is that in contrast to symmetric errors, asymmetric errors are content dependent. For example, in Z-channels, the all-1 codeword is prone to have more errors than the all-0 codeword. This motivates us to develop nonuniform codes whose codewords can tolerate different numbers of asymmetric errors depending on their Hamming weights. The idea in a nonuniform codes' construction is to augment the redundancy in a content-dependent way and guarantee the worst case reliability while maximizing the code size. In this paper, we first study nonuniform codes for Z-channels, namely, they only suffer one type of errors, say 1→ 0. Specifically, we derive their upper bounds, analyze their asymptotic performances, and introduce two general constructions. Then, we extend the concept and results of nonuniform codes to general binary asymmetric channels, where the error probability for each bit from 0 to 1 is smaller than that from 1 to 0.https://authors.library.caltech.edu/records/y34mf-tx768Trajectory Codes for Flash Memory
https://resolver.caltech.edu/CaltechAUTHORS:20130826-103812140
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2013
DOI: 10.1109/TIT.2013.2251755
A generalized rewriting model is defined for flash memory that represents stored data and permitted rewrite operations by a directed graph. This model is a generalization of previously introduced rewriting models of codes, including floating codes, write-once memory codes, and buffer codes. This model is used to design a new rewriting code for flash memories. The new code, referred to as trajectory code, allows stored data to be rewritten as many times as possible without block erasures. It is proved that the trajectory codes are asymptotically optimal for a wide range of scenarios. In addition, rewriting codes that use a randomized rewriting scheme are presented that obtain good performance with high probability for all possible rewrite sequences.https://authors.library.caltech.edu/records/gt5x1-9xd26Sequence reconstruction for Grassmann graphs and permutations
https://resolver.caltech.edu/CaltechAUTHORS:20170125-143159155
Authors: Yaakobi, Eitan; Schwartz, Moshe; Langberg, Michael; Bruck, Jehoshua
Year: 2013
DOI: 10.1109/ISIT.2013.6620351
The sequence-reconstruction problem was first proposed by Levenshtein in 2001. This problem studies the model where the same word is transmitted over multiple channels. If the transmitted word belongs to some code of minimum distance d and there are at most r errors in every channel, then the minimum number of channels that guarantees a successful decoder (under the assumption that all channel outputs are distinct) has to be greater than the largest intersection of two balls of radius r and with distance at least d between their centers.
This paper studies the combinatorial problem of computing the largest intersection of two balls for two cases. In the first part we solve this problem in the Grassmann graph for all values of d and r. In the second part we derive similar results for permutations under Kendall's τ-metric for some special cases of d and r.https://authors.library.caltech.edu/records/qn4yr-2yw53Generalized Gray Codes for Local Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20131017-105937816
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2013
DOI: 10.1109/TIT.2013.2268534
We consider the local rank-modulation scheme, in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. Local rank-modulation is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study gray codes for the local rank-modulation scheme in order to simulate conventional multilevel flash cells while retaining the benefits of rank modulation. Unlike the limited scope of previous works, we consider code constructions for the entire range of parameters including the code length, sliding-window size, and overlap between adjacent windows. We show that the presented codes have asymptotically optimal rate. We also provide efficient encoding, decoding, and next-state algorithms.https://authors.library.caltech.edu/records/gnz6e-9vx94Approximate Sorting of Data Streams with Limited Storage
https://resolver.caltech.edu/CaltechAUTHORS:20141203-103210856
Authors: Farnoud (Hassanzadeh), Farzad; Yaakobi, Eitan; Bruck, Jehoshua
Year: 2014
DOI: 10.1007/978-3-319-08783-2_40
We consider the problem of approximate sorting of a data stream (in one pass) with limited internal storage where the goal is not to rearrange data but to output a permutation that reflects the ordering of the elements of the data stream as closely as possible. Our main objective is to study the relationship between the quality of the sorting and the amount of available storage. To measure quality, we use permutation distortion metrics, namely the Kendall tau and Chebyshev metrics, as well as mutual information, between the output permutation and the true ordering of data elements. We provide bounds on the performance of algorithms with limited storage and present a simple algorithm that asymptotically requires a constant factor as much storage as an optimal algorithm in terms of mutual information and average Kendall tau distortion.https://authors.library.caltech.edu/records/k2hzs-wx406The Capacity of String-Replication Systems
https://resolver.caltech.edu/CaltechAUTHORS:20140127-105959677
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2014
DOI: 10.48550/arXiv.1401.4634
It is known that the majority of the human genome
consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple replication rules, including those resembling genomic replication processes. In other words, our goal is to find out the capacity, or the expressive power, of these string-replication
systems. Our results include exact capacities, and
bounds on the capacities, of four fundamental string-replication systems.https://authors.library.caltech.edu/records/gnct0-whj77Rate-Distortion for Ranking with Incomplete Information
https://resolver.caltech.edu/CaltechAUTHORS:20140127-104737329
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2014
DOI: 10.48550/arXiv.1401.3093
We study the rate-distortion relationship in the set
of permutations endowed with the Kendall t-metric and the
Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall t-metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small
distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.https://authors.library.caltech.edu/records/xw27t-p9b59Access Versus Bandwidth in Codes for Storage
https://resolver.caltech.edu/CaltechAUTHORS:20140425-151109319
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2014
DOI: 10.1109/TIT.2014.2305698
Maximum distance separable (MDS) codes are widely used in storage systems to protect against disk (node) failures. A node is said to have capacity l over some field F, if it can store that amount of symbols of the field. An (n, k, l) MDS code uses n nodes of capacity l to store k information nodes. The MDS property guarantees the resiliency to any n-k node failures. An optimal bandwidth (respectively, optimal access) MDS code communicates (respectively, accesses) the minimum amount of data during the repair process of a single failed node. It was shown that this amount equals a fraction of 1/(n - k) of data stored in each node. In previous optimal bandwidth constructions, l scaled polynomially with k in codes when the asymptotic rate is less than 1. Moreover, in constructions with a constant number of parities, i.e., when the rate approaches 1, l is scaled exponentially with k. In this paper, we focus on the case of linear codes with linear repair operations and constant number of parities n - k = r, and ask the following question: given the capacity of a node l what is the largest number of information disks k in an optimal bandwidth (respectively, access) (k + r, k, l) MDS code? We give an upper bound for the general case, and two tight bounds in the special cases of two important families of codes. The first is a family of codes with optimal update property, and the second is a family with optimal access property. Moreover, the bounds show that in some cases optimal-bandwidth codes have larger k than optimal-access codes, and therefore these two measures are not equivalent.https://authors.library.caltech.edu/records/677nb-sgx08Synthesis of Stochastic Flow Networks
https://resolver.caltech.edu/CaltechAUTHORS:20140627-103709955
Authors: Zhou, Hongchao; Chen, Ho-Lin; Bruck, Jehoshua
Year: 2014
DOI: 10.1109/TC.2012.270
A stochastic flow network is a directed graph with incoming edges (inputs) and outgoing edges (outputs), tokens enter through the input edges, travel stochastically in the network, and can exit the network through the output edges. Each node in the network is a splitter, namely, a token can enter a node through an incoming edge and exit on one of the output edges according to a predefined probability distribution. Stochastic flow networks can be easily implemented by beam splitters, or by DNA-based chemical reactions, with promising applications in optical computing, molecular computing and stochastic computing. In this paper, we address a fundamental synthesis question: Given a finite set of possible splitters and an arbitrary rational probability distribution, design a stochastic flow network, such that every token that enters the input edge will exit the outputs with the prescribed probability distribution. The problem of probability transformation dates back to von Neumann's 1951 work and was followed, among others, by Knuth and Yao in 1976. Most existing works have been focusing on the "simulation" of target distributions. In this paper, we design optimal-sized stochastic flow networks for "synthesizing" target distributions. It shows that when each splitter has two outgoing edges and is unbiased, an arbitrary rational probability ɑ/b with ɑ ≤ b ≤ 2^n can be realized by a stochastic flow network of size n that is optimal. Compared to the other stochastic systems, feedback (cycles in networks) strongly improves the expressibility of stochastic flow networks.https://authors.library.caltech.edu/records/edqx4-2j933Guest Editorial: Communication Methodologies for the Next-Generation Storage Systems
https://resolver.caltech.edu/CaltechAUTHORS:20140529-153422648
Authors: Dolecek, Lara; Blaum, Mario; Bruck, Jehoshua; Jiang, Anxiao (Andrew); Ramchandran, Kannan; Vasic, Bane
Year: 2014
DOI: 10.1109/JSAC.2014.140501
This issue consists of 22 high-caliber papers with contributions from both academia and industry. The papers are organized into the following six sections: (i) Channel Modeling and Signal Processing Algorithms for Emerging Memory Technologies, (ii) Error Control Coding Techniques for Flash Memories, (iii) Algebraic Methods with Applications to Non-Volatile Memories, (iv) Polar Codes with Application to Storage, (v) Performance Limits of Storage Systems, and (vi) Codes for Distributed Network Storage.https://authors.library.caltech.edu/records/n9ytp-az569The capacity of string-duplication systems
https://resolver.caltech.edu/CaltechAUTHORS:20150227-082940148
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2014
DOI: 10.1109/ISIT.2014.6875043
It is known that the majority of the human genome consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple duplication rules, including those resembling genomic duplication processes. In other words, our goal is to find out the capacity, or the expressive power, of these string-duplication systems. Our results include the exact capacities, and bounds on the capacities, of four fundamental string-duplication systems.https://authors.library.caltech.edu/records/v7rpj-t4x19Polar coding for noisy write-once memories
https://resolver.caltech.edu/CaltechAUTHORS:20150227-084706095
Authors: En Gad, Eyal; Li, Yue; Kliewer, Joerg; Langberg, Michael; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2014
DOI: 10.1109/ISIT.2014.6875111
We consider the noisy write-once memory (WOM) model to capture the behavior of data-storage devices such as flash memories. The noisy WOM is an asymmetric channel model with non-causal state information at the encoder. We show that a nesting of non-linear polar codes achieves the corresponding Gelfand-Pinsker bound with polynomial complexity.https://authors.library.caltech.edu/records/ggxqm-zd693Bounds for Permutation Rate-Distortion
https://resolver.caltech.edu/CaltechAUTHORS:20150227-075642886
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2014
DOI: 10.1109/ISIT.2014.6874784
We study the rate-distortion relationship in the set of permutations endowed with the Kendall t-metric and the Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case distortion analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall τ-metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.https://authors.library.caltech.edu/records/2bt4c-2y871Systematic Error-Correcting Codes for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20150202-150301749
Authors: Zhou, Hongchao; Schwartz, Moshe; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2014
DOI: 10.1109/TIT.2014.2365499
The rank-modulation scheme has been recently proposed for efficiently storing data in nonvolatile memories. In this paper, we explore [n, k, d] systematic error-correcting codes for rank modulation. Such codes have length n, k information symbols, and minimum distance d. Systematic codes have the benefits of enabling efficient information retrieval in conjunction with memory-scrubbing schemes. We study systematic codes for rank modulation under Kendall's T-metric as well as under the ℓ∞-metric. In Kendall's T-metric, we present [k + 2, k, 3] systematic codes for correcting a single error, which have optimal rates, unless systematic perfect codes exist. We also study the design of multierror-correcting codes, and provide a construction of [k + t + 1, k, 2t + 1] systematic codes, for large-enough k. We use nonconstructive arguments to show that for rank modulation, systematic codes achieve the same capacity as general error-correcting codes. Finally, in the ℓ∞-metric, we construct two [n, k, d] systematic multierror-correcting codes, the first for the case of d = 0(1) and the second for d = Θ(n). In the latter case, the codes have the same asymptotic rate as the best codes currently known in this metric.https://authors.library.caltech.edu/records/zcv5z-eg591Logic operations in memory using a memristive Akers array
https://resolver.caltech.edu/CaltechAUTHORS:20150105-103814566
Authors: Levy, Yifat; Bruck, Jehoshua; Cassuto, Yuval; Friedman, Eby G.; Kolodny, Avinoam; Yaakobi, Eitan; Kvatinsky, Shahar
Year: 2014
DOI: 10.1016/j.mejo.2014.06.006
In-memory computation is one of the most promising features of memristive memory arrays. In this paper, we propose an array architecture that supports in-memory computation based on a logic array first proposed in 1972 by Sheldon Akers. The Akers logic array satisfies this objective since this array can realize any Boolean function, including bit sorting. We present a hardware version of a modified Akers logic array, where the values stored within the array serve as primary inputs. The proposed logic array uses memristors, which are nonvolatile memory devices with noteworthy properties. An Akers logic array with memristors combines memory and logic operations, where the same array stores data and performs computation. This combination opens opportunities for novel non-von Neumann computer architectures, while reducing power and enhancing memory bandwidth.https://authors.library.caltech.edu/records/ezrw6-4p418Capacity and expressiveness of genomic tandem duplication
https://resolver.caltech.edu/CaltechAUTHORS:20151012-144840366
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ISIT.2015.7282795
The majority of the human genome consists of
repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTCTGTGC,TGTG is a tandem repeat, namely, generated from AGTCTGC by a tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called the seed) by tandem duplications of bounded length. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2, 3,
and 4. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system, as the feasibility of expressing arbitrary substrings. We then completely
characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. Noticing that a system with capacity = 1 is expressive, we prove that for an alphabet size ≥ 4, the capacity is strictly smaller than 1, independent of the seed and the duplication lengths. The proof of this limit on the capacity (note that the genomic alphabet size
is 4), is related to an interesting result by Axel Thue from 1906 which states that there exist arbitrary length sequences with no tandem repeats (square-free) for alphabet size ≥ 3. Finally, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for
these systems.https://authors.library.caltech.edu/records/j1tbj-7vr53Is there a new way to correct errors
https://resolver.caltech.edu/CaltechAUTHORS:20161111-153306808
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ITA.2015.7308985
The classic approach for error correction is to add controlled external redundancy to data. This approach, called error-correcting codes, has been studied extensively. And the rates of ECCs are approaching theoretical limits. We explore a second approach for error correction in this work, which is to use the redundancy inside data, even if it is just the residual redundancy after data compression. We focus on text data, and show that this approach based on language processing can significantly improve the error correction performance.https://authors.library.caltech.edu/records/s1kqa-sqd05Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes
https://resolver.caltech.edu/CaltechAUTHORS:20150206-113520153
Authors: En Gad, Eyal; Li, Yue; Kliewer, Joerg; Langberg, Michael; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2015
DOI: 10.1109/TIT.2016.2539967
We propose efficient coding schemes for two communication settings: 1. asymmetric channels, and 2. channels
with an informed encoder. These settings are important in non-volatile memories, as well as optical and broadcast
communication. The schemes are based on non-linear polar codes, and they build on and improve recent work
on these settings. In asymmetric channels, we tackle the exponential storage requirement of previously known
schemes, that resulted from the use of large Boolean functions. We propose an improved scheme, that achieves the
capacity of asymmetric channels with polynomial computational complexity and storage requirement.
The proposed non-linear scheme is then generalized to the setting of channel coding with an informed encoder,
using a multicoding technique. We consider specific instances of the scheme for flash memories, that incorporate
error-correction capabilities together with rewriting. Since the considered codes are non-linear, they eliminate
the requirement of previously known schemes (called polar write-once-memory codes) for shared randomness
between the encoder and the decoder. Finally, we mention that the multicoding scheme is also useful for broadcast
communication in Marton's region, improving upon previous schemes for this setting.https://authors.library.caltech.edu/records/t5pap-5z628Rewriting Flash Memories by Message Passing
https://resolver.caltech.edu/CaltechAUTHORS:20150209-161244506
Authors: En Gad, Eyal; Huang, Wentao; Li, Yue; Bruck, Jehoshua
Year: 2015
DOI: 10.48550/arXiv.1502.00189
This paper constructs WOM codes that combine
rewriting and error correction for mitigating the reliability and the endurance problems in flash memory.We consider a rewriting model that is of practical interest to flash applications where only the second write uses WOM codes. Our WOM code construction is based on binary erasure quantization with LDGM codes, where the rewriting uses message passing and has potential to share the
efficient hardware implementations with LDPC codes in practice. We show that the coding scheme achieves the capacity of the rewriting model. Extensive simulations show that the rewriting performance of our scheme compares favorably with that of polar WOM code in the rate region where high rewriting success probability is desired. We further augment our coding schemes with error correction capability. By drawing a connection to the
conjugate code pairs studied in the context of quantum error
correction, we develop a general framework for constructing
error-correction WOM codes. Under this framework, we give
an explicit construction of WOM codes whose codewords are
contained in BCH codes.https://authors.library.caltech.edu/records/7kq5n-d3534Capacity and Expressiveness of Genomic Tandem Duplication
https://resolver.caltech.edu/CaltechAUTHORS:20150209-155348874
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Bruck, Jehoshua
Year: 2015
DOI: 10.48550/arXiv.1509.06029
The majority of the human genome consists of repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTCTGTGC, TGTG is a tandem repeat, namely, it was generated from AGTCTGC by tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called
the seed) by tandem duplication of length bounded by a constant. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2; 3; and 4. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system, as the feasibility of
expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. Noticing that a system with capacity = 1 is expressive, we prove that for an alphabet size ≥ 4, the capacity is strictly smaller than 1, independent of
the seed and the duplication lengths. The proof of this limit on the capacity (note that the genomic alphabet size is 4), is related to an interesting result by Axel Thue from 1906 which states that there exist arbitrary length sequences with no tandem repeats (square-free) for alphabet size ≥ 3. Finally, our results illustrate
that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems.https://authors.library.caltech.edu/records/asa2n-zzv53A Stochastic Model for Genomic Interspersed Duplication
https://resolver.caltech.edu/CaltechAUTHORS:20150209-161532302
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2015
Mutation processes such as point mutation, insertion,
deletion, and duplication (including tandem and interspersed
duplication) have an important role in evolution, as
they lead to genomic diversity, and thus to phenotypic variation. In this work, we study the expressive power of interspersed duplication, i.e., its ability to generate diversity, via a simple but fundamental stochastic model, where the length and the location of the subsequence that is duplicated and the point of insertion of the copy are chosen randomly. In contrast to combinatorial models, where the goal is to determine the set of possible outcomes regardless of their likelihood, in stochastic
systems, we investigate the properties of the set of high-probability sequences. In particular we provide results regarding the asymptotic behavior of frequencies of symbols and short words in a sequence evolving through interspersed duplication. The study of such a systems is an important step towards the design and analysis of more realistic and sophisticated models of genomic mutation processes.https://authors.library.caltech.edu/records/vyjrj-amv90Error correction through language processing
https://resolver.caltech.edu/CaltechAUTHORS:20160915-122649767
Authors: Jiang, Anxiao (Andrew); Li, Yue; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ITW.2015.7133145
There are two fundamental approaches for error correction. One approach is to add external redundancy to data. The other approach is to use the redundancy inside data, even if it is only the residual redundancy after a data compression algorithm. The first approach, namely error-correcting codes (ECCs), has been studied actively over the past seventy years. In this work, we explore the second approach, and show that it can substantially enhance the error-correction performance. This work focuses on error correction of texts in English as a case study. It proposes a scheme that combines language-based decoding with ECC decoding. Both analysis and experimental results are presented. The scheme can be extended to content-based decoding for more types of data with rich structures.https://authors.library.caltech.edu/records/me5vr-pzd35Communication Efficient Secret Sharing
https://resolver.caltech.edu/CaltechAUTHORS:20150529-105023455
Authors: Huang, Wentao; Langberg, Michael; Kliewer, Joerg; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/TIT.2016.2616144
A secret sharing scheme is a method to store information securely and reliably. Particularly, in the threshold secret sharing scheme (due to Shamir), a secret is divided
into shares, encoded and distributed to parties, such that any large enough collection of parties can decode the secret, and a smaller (then threshold) set of parties cannot
collude to deduce any information about the secret. While Shamir's scheme was studied for more than 35 years, the question of minimizing its communication bandwidth was
not considered. Specifically, assume that a user (or a collection of parties) wishes to decode the secret by receiving information from a set of parties; the question we
study is how to minimize the total amount of communication between the user and the parties. We prove a tight lower bound on the amount of communication necessary for
decoding, and construct secret sharing schemes achieving the bound. The key idea for achieving optimal communication bandwidth is to let the user receive information from
more than the necessary number of parties. In contrast, the current paradigm in secret sharing schemes is to decode from a minimum set of parties. Hence, existing secret
sharing schemes are not optimal in terms of communication bandwidth. In addition, we consider secure distributed storage where our proposed communication efficient secret
sharing schemes improve disk access complexity during decoding.https://authors.library.caltech.edu/records/0518b-hky47Rewriting Flash Memories by Message Passing
https://resolver.caltech.edu/CaltechAUTHORS:20151012-142447290
Authors: En Gad, Eyal; Huang, Wentao; Li, Yue; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ISIT.2015.7282534
This paper constructs WOM codes that combine rewriting and error correction for mitigating the reliability and the endurance problems in flash memory.We consider a rewriting model that is of practical interest to flash applications where only the second write uses WOM codes. Our WOM code construction is based on binary erasure quantization with LDGM codes, where the rewriting uses message passing and has potential to share the efficient hardware implementations with LDPC codes in practice. We show that the coding scheme achieves the capacity of the rewriting model. Extensive simulations show that the rewriting performance of our scheme compares favorably with that of polar WOM code in the rate region where high rewriting success probability is desired. We further augment our coding schemes with error correction capability. By drawing a connection to the conjugate code pairs studied in the context of quantum error correction, we develop a general framework for constructing error-correction WOM codes. Under this framework, we give an explicit construction of WOM codes whose codewords are contained in BCH codes.https://authors.library.caltech.edu/records/6vsed-x7y38A Stochastic Model for Genomic Interspersed Duplication
https://resolver.caltech.edu/CaltechAUTHORS:20151012-143650853
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ISIT.2015.7282586
Mutation processes such as point mutation, insertion, deletion, and duplication (including tandem and interspersed duplication) have an important role in evolution, as they lead to genomic diversity, and thus to phenotypic variation. In this work, we study the expressive power of interspersed duplication, i.e., its ability to generate diversity, via a simple but fundamental stochastic model, where the length and the location of the substring that is duplicated and the point of insertion of the copy are chosen randomly. We investigate the properties of the set of high-probability sequences in these stochastic systems. In particular we provide results regarding the asymptotic behavior of frequencies of symbols and strings in a sequence evolving through interspersed duplication. The study of such systems is an important step towards the design and analysis of more realistic and sophisticated models of genomic mutation processes.https://authors.library.caltech.edu/records/gwm2s-gj497Rank-Modulation Rewrite Coding for Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20150814-153713652
Authors: En Gad, Eyal; Yaakobi, Eitan; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2015
DOI: 10.1109/TIT.2015.2442579
The current flash memory technology focuses on the cost minimization of its static storage capacity. However, the resulting approach supports a relatively small number of program-erase cycles. This technology is effective for consumer devices (e.g., smartphones and cameras) where the number of program-erase cycles is small. However, it is not economical for enterprise storage systems that require a large number of lifetime writes. The proposed approach in this paper for alleviating this problem consists of the efficient integration of two key ideas: 1) improving reliability and endurance by representing the information using relative values via the rank modulation scheme and 2) increasing the overall (lifetime) capacity of the flash device via rewriting codes, namely, performing multiple writes per cell before erasure. This paper presents a new coding scheme that combines rank-modulation with rewriting. The key benefits of the new scheme include: 1) the ability to store close to 2 bit per cell on each write with minimal impact on the lifetime of the memory and 2) efficient encoding and decoding algorithms that make use of capacity-achieving write-once-memory codes that were proposed recently.https://authors.library.caltech.edu/records/5e76c-xe789Reliability and Hardware Implementation of Rank Modulation Flash Memory
https://resolver.caltech.edu/CaltechAUTHORS:20160909-113811678
Authors: Ma, Yanjun; Li, Yue; Kan, Edwin Chihchuan; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/NVMTS.2015.7457493
We review a novel data representation scheme for NAND flash memory named rank modulation (RM), and discuss its hardware implementation. We show that under the normal threshold voltage (Vth) variations, RM has intrinsic read reliability advantage over conventional multiple-level cells. Test results demonstrating superior reliability using commercial flash chips are reviewed and discussed. We then present a read method based on relative sensing time, which can obtain the rank of all cells in the group in one read cycle. The improvement in reliability and read speed enable similar program-and-verify time in RM as that of conventional MLC flash.https://authors.library.caltech.edu/records/dvrcy-z2n42Algorithms for Generating Probabilities with Multivalued Stochastic Relay Circuits
https://resolver.caltech.edu/CaltechAUTHORS:20151221-152740679
Authors: Lee, David T.; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/TC.2015.2401027
The problem of random number generation dates back to Von Neumann's work in 1951. Since then, many algorithms have been developed for generating unbiased bits from complex correlated sources as well as for generating arbitrary distributions from unbiased bits. An equally interesting, but less studied aspect is the structural component of random number generation. That is, given a set of primitive sources of randomness, and given composition rules induced by a device or nature, how can we build networks that generate arbitrary probability distributions? In this paper, we study the generation of arbitrary probability distributions in multivalued relay circuits, a generalization in which relays can take on any of N states and the logical 'and' and 'or' are replaced with 'min' and 'max' respectively. These circuits can be thought of as modeling the timing of events which depend on other event occurrences. We describe a duality property and give algorithms that synthesize arbitrary rational probability distributions. We prove that these networks are robust to errors and design a universal probability generator which takes input bits and outputs any desired binary probability distribution.https://authors.library.caltech.edu/records/9f9ps-p5918The Capacity of String-Duplication Systems
https://resolver.caltech.edu/CaltechAUTHORS:20160119-142638953
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2015.2505735
It is known that the majority of the human genome consists of duplicated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from duplicated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence using simple duplication rules, including those resembling genomic-duplication processes. In other words, our goal is to find the capacity, or the expressive power, of these string-duplication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-duplication systems. The study of these fundamental biologically inspired systems is an important step toward modeling and analyzing more complex biological processes.https://authors.library.caltech.edu/records/e2hgt-sj167Codes Correcting Erasures and Deletions for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20160225-140310853
Authors: Gabrys, Ryan; Yaakobi, Eitan; Farnoud (Hassanzadeh), Farzad; Sala, Frederic; Bruck, Jehoshua; Dolecek, Lara
Year: 2016
DOI: 10.1109/TIT.2015.2493147
Error-correcting codes for permutations have received considerable attention in the past few years, especially in applications of the rank modulation scheme for flash memories. While codes over several metrics have been studied, such as the Kendall τ, Ulam, and Hamming distances, no recent research has been carried out for erasures and deletions over permutations. In rank modulation, flash memory cells represent a permutation, which is induced by their relative charge levels. We explore problems that arise when some of the cells are either erased or deleted. In each case, we study how these erasures and deletions affect the information carried by the remaining cells. In particular, we study models that are symbol-invariant, where unaffected elements do not change their corresponding values from those in the original permutation, or permutation-invariant, where the remaining symbols are modified to form a new permutation with fewer elements. Our main approach in tackling these problems is to build upon the existing works of error-correcting codes and leverage them in order to construct codes in each model of deletions and erasures. The codes we develop are in certain cases asymptotically optimal, while in other cases, such as for codes in the Ulam distance, improve upon the state of the art results.https://authors.library.caltech.edu/records/km1t9-d3q89Systematic Codes for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20160120-084734898
Authors: Buzaglo, Sarit; Yaakobi, Eitan; Etzion, Tuvi; Bruck, Jehosua
Year: 2016
DOI: 10.48550/arXiv.1311.7113
The goal of this paper is to construct systematic
error-correcting codes for permutations and multi permutations in the Kendall's τ-metric. These codes are
important in new applications such as rank modulation for
flash memories. The construction is based on error-correcting codes for multi-permutations and a partition of the set of permutations into error-correcting codes. For a given large enough number of information symbols k, and for any integer t, we present a construction for (k + r, k) systematic t-error-correcting codes, for permutations from S_(k+r), with less redundancy symbols than the number of redundancy symbols in the codes of the known constructions. In particular, for a given t and for sufficiently large k we can obtain r = t+1. The same construction is also applied to obtain related systematic error-correcting codes for multi-permutations.https://authors.library.caltech.edu/records/zbh85-zrr40Streaming Algorithms for Optimal Generation of Random Bits
https://resolver.caltech.edu/CaltechAUTHORS:20160120-102504919
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2016
DOI: 10.48550/arXiv.1209.0730
Generating random bits from a source of biased coins (the biased is unknown) is a classical question that was originally studied by von Neumann. There are a number of known algorithms that have asymptotically optimal information efficiency, namely, the expected number of generated random bits per input bit is asymptotically close to the entropy of the source. However, only the original von Neumann algorithm has a 'streaming property' - it operates on a single input bit at a time and it generates random bits when possible, alas, it does not have an optimal information efficiency.
The main contribution of this paper is an algorithm that generates random bit streams from biased coins, uses bounded space and runs in expected linear time. As the size of the allotted space increases, the algorithm approaches the information-theoretic upper bound on efficiency. In addition, we discuss how to extend this algorithm to generate random bit streams from m-sided dice or correlated sources such as Markov chains.https://authors.library.caltech.edu/records/g02dz-15w91Efficiently Extracting Randomness from Imperfect Stochastic Processes
https://resolver.caltech.edu/CaltechAUTHORS:20160120-104324682
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2016
DOI: 10.48550/arXiv.1209.0734
We study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from non-ideal stochastic processes. The related interval algorithm proposed by Han and Hoshi has asymptotically optimal performance; however, it assumes that the distribution of the input stochastic process is known. The motivation for our work is the fact that, in practice, sources of randomness have inherent correlations and are affected by measurement's noise. Namely, it is hard to obtain an accurate estimation of the distribution. This challenge was addressed by the concepts of seeded and seedless extractors that can handle general random sources with unknown distributions. However, known seeded and seedless extractors provide extraction efficiencies that are substantially smaller than Shannon's entropy limit. Our main contribution is the design of extractors that have a variable input-length and a fixed output length, are efficient in the consumption of symbols from the source, are capable of generating random bits from general stochastic processes and approach the information theoretic upper bound on efficiency.https://authors.library.caltech.edu/records/hknk5-gcf87Balanced Modulation for Nonvolatile Memories
https://resolver.caltech.edu/CaltechAUTHORS:20160120-083936607
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2016
DOI: 10.48550/arXiv.1209.0744
This paper presents a practical writing/reading scheme in nonvolatile memories, called balanced modulation, for minimizing the asymmetric component of errors. The main idea is to encode data using a balanced error-correcting code. When reading information from a block, it adjusts the reading threshold such that the resulting word is also balanced or approximately balanced. Balanced modulation has suboptimal performance for any cell-level distribution and it can be easily implemented in the current systems of nonvolatile memories. Furthermore, we studied the construction of balanced error-correcting codes, in particular, balanced LDPC codes. It has very efficient encoding and decoding algorithms, and it is more efficient than prior construction of balanced error-correcting codes.https://authors.library.caltech.edu/records/d8zca-x3206A Universal Scheme for Transforming Binary Algorithms to Generate Random Bits from Loaded Dice
https://resolver.caltech.edu/CaltechAUTHORS:20160120-102042704
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2016
DOI: 10.48550/arXiv.1209.0726
In this paper, we present a universal scheme for transforming an arbitrary algorithm for biased 2-face coins to generate random bits from the general source of an m-sided die, hence enabling the application of existing algorithms to general sources. In addition, we study approaches of efficiently generating a prescribed number of random bits from an arbitrary biased coin. This contrasts with most existing works, which typically assume that the number of coin tosses is fixed, and they generate a variable number of random bits.https://authors.library.caltech.edu/records/abqc2-0en96Explicit MDS Codes for Optimal Repair Bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20160120-152728882
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2016
DOI: 10.48550/arXiv.1411.6328
MDS codes are erasure-correcting codes that can correct the maximum number of erasures for a given number of
redundancy or parity symbols. If an MDS code has r parities and no more than r erasures occur, then by transmitting
all the remaining data in the code, the original information can be recovered. However, it was shown that in order
to recover a single symbol erasure, only a fraction of 1/r of the information needs to be transmitted. This fraction
is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each
symbol in the code as a vector or a column over some field, then the code forms a 2D array and such codes are
especially widely used in storage systems. In this paper, we address the following question: given the length of the
column l, number of parities r, can we construct high-rate MDS array codes with optimal repair bandwidth of 1/r,
whose code length is as long as possible? In this paper, we give code constructions such that the code length is
(r +1) log_r l.https://authors.library.caltech.edu/records/xmt8m-whw26Secure RAID Schemes for Distributed Storage
https://resolver.caltech.edu/CaltechAUTHORS:20160125-120110556
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/ISIT.2016.7541529
We propose secure RAID, i.e., low-complexity schemes to store information in a distributed manner that is resilient to node failures and resistant to node eavesdropping. We generalize the concept of systematic encoding to secure RAID and show that systematic schemes have significant advantages in the efficiencies of encoding, decoding and random access. For the practical high rate regime, we construct three XOR-based systematic secure RAID schemes with optimal or almost optimal encoding and
decoding complexities, from the EVENODD codes and B codes, which are array codes widely used in the RAID architecture. The schemes can tolerate up to two node failures and two eavesdropping nodes. For more general parameters we construct systematic secure RAID schemes from Reed-Solomon codes, and show that they are significantly more efficient than Shamir's secret sharing scheme. Our results suggest that building "keyless", information-theoretic security into the RAID architecture is practical.https://authors.library.caltech.edu/records/g8ntm-21346Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms
https://resolver.caltech.edu/CaltechAUTHORS:20160125-143414675
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/ISIT.2016.7541455
The ability to store data in the DNA of a living
organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next
to the original. In particular, we present two families of codes for correcting errors due to tandem-duplications of a fixed length; the first family can correct any number of errors while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant
k, where we are primarily focused on the cases of k = 2, 3.https://authors.library.caltech.edu/records/fmjec-t7m90Bounds for Permutation Rate-Distortion
https://resolver.caltech.edu/CaltechAUTHORS:20160119-151223798
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2015.2504521
We study the rate-distortion relationship in the set of permutations endowed with the Kendall t-metric and the Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case distortion analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall τ-metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.https://authors.library.caltech.edu/records/3hq71-2m313The Synthesis and Analysis of Stochastic Switching Circuits
https://resolver.caltech.edu/CaltechAUTHORS:20160203-092316194
Authors: Zhou, Hongchao; Loh, Po-Ling; Bruck, Jehoshua
Year: 2016
DOI: 10.48550/arXiv.1209.0715
Stochastic switching circuits are relay circuits that
consist of stochastic switches called pswitches. The study of stochastic switching circuits has widespread applications in many fields of computer science, neuroscience, and biochemistry. In this paper, we discuss several properties of stochastic switching circuits, including robustness, expressibility, and probability approximation. First, we study the robustness, namely, the effect caused by introducing an error of size Є to each pswitch in a stochastic circuit. We analyze two constructions and prove that simple series-parallel circuits are robust to small error perturbations,
while general series-parallel circuits are not. Specifically, the total error introduced by perturbations of size less than Є is bounded by a constant multiple of Є in a simple series-parallel circuit, independent of the size of the circuit. Next, we study the expressibility of stochastic switching circuits: Given an integer q and a pswitch set S = {1/q,2/q,...,q-1/q}, can we synthesize any rational probability with denominator q^n (for arbitrary n) with a simple series-parallel stochastic switching
circuit? We generalize previous results and prove that when q is a multiple of 2 or 3, the answer is yes. We also show that when q is a prime number larger than 3, the answer is no. Probability approximation is studied for a general case of an arbitrary pswitch set S = {s_1, s_2,... , s_(|S|)}. In this case, we propose an algorithm based on local optimization to approximate any desired probability. The analysis reveals that the approximation error of a switching circuit decreases exponentially with an increasing circuit size.https://authors.library.caltech.edu/records/rzzjv-4hf17Error Characterization and Mitigation for 16nm MLC NAND Flash Memory under Total Ionizing Dose Effect
https://resolver.caltech.edu/CaltechAUTHORS:20161004-114313111
Authors: Li, Yue; Sheldon, Douglas J.; Ramos, Andre S.; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/IRPS.2016.7574638
This paper studies the system-level reliability of 16nm MLC NAND flash memories under total ionizing dose (TID) effect. Errors that occur in the parts under TID effect are characterized at multiple levels. Results show that faithful data recovery only lasts until 9k rad. Data errors observed in irradiated flash samples are strongly asymmetric. To improve the reliability of the parts, we study error mitigation methods that consider the specific properties of TID errors. First, we implement a novel data representation scheme that stores data using the relative order of cell voltages. The representation is more robust against uniform asymmetric threshold voltage shift of floating gates. Experimental results show that the scheme reduces errors at least by 50% for blocks with less than 3k program/erase cycles and 10k rad. Second, we conduct empirical evaluations of memory scrubbing schemes. Based on the results, we identify a scheme that refreshes cells without doing block erasure. Evaluation results show that parts under this scrubbing scheme survive up to 8k PECs and 57k rad total doses.https://authors.library.caltech.edu/records/yzxhn-5qe35Data archiving in 1x-nm NAND flash memories: Enabling long-term storage using rank modulation and scrubbing
https://resolver.caltech.edu/CaltechAUTHORS:20161003-155557445
Authors: Li, Yue; Gad, Eyal En; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2016
DOI: 10.1109/IRPS.2016.7574572
The challenge of using inexpensive and high-density NAND flash for archival storage was posed recently for reducing data center costs. However, such flash memory is becoming more susceptible to noise, and its reliability issues has become the major concern for its adoption by long-term storage systems. This paper studies the system-level reliability of archival storage that uses 1x-nm NAND flash memory. We analyze retention error behavior, and show that 1x-nm MLC and TLC flash do not immediately qualify for long-term storage. We then implement the rank modulation (RM) scheme and memory scrubbing (MS) for retention period (RP) enhancement. The RM scheme provides a new data representation using the relative order of cell voltages, which provides higher reliability against uniform asymmetric threshold voltage shift due to charge leakage. Results show that the new representation reduces raw bit error rate (RBER) by 45% on average, and using RM and MS together provides up to 196, 171, 146 and 121 years of RPs for blocks with 0, 25, 50 and 75 program/erase cycles, respectively.https://authors.library.caltech.edu/records/0w2v4-pqq54Constructions and Decoding of Cyclic Codes Over b-Symbol Read Channels
https://resolver.caltech.edu/CaltechAUTHORS:20160426-074534474
Authors: Yaakobi, Eitan; Bruck, Jehoshua; Siegel, Paul H.
Year: 2016
DOI: 10.1109/TIT.2016.2522434
Symbol-pair read channels, in which the outputs of the read process are pairs of consecutive symbols, were recently studied by Cassuto and Blaum. This new paradigm is motivated by the limitations of the reading process in high density data storage systems. They studied error correction in this new paradigm, specifically, the relationship between the minimum Hamming distance of an error correcting code and the minimum pair distance, which is the minimum Hamming distance between symbol-pair vectors derived from codewords of the code. It was proved that for a linear cyclic code with minimum Hamming distance d_H, the corresponding minimum pair distance is at least d_H +3. In this paper, we show that, for a given linear cyclic code with a minimum Hamming distance d_H, the minimum pair distance is at least d_H + (d_H/2). We then describe a decoding algorithm, based upon a bounded distance decoder for the cyclic code, whose symbol-pair error correcting capabilities reflect the larger minimum pair distance. Finally, we consider the case where the read channel output is a larger number, b ≥3, of consecutive symbols, and we provide extensions of several concepts, results, and code constructions to this setting.https://authors.library.caltech.edu/records/g86g2-92h26Systematic Error-Correcting Codes for Permutations and Multi-Permutations
https://resolver.caltech.edu/CaltechAUTHORS:20160825-142242242
Authors: Buzaglo, Sarit; Yaakobi, Eitan; Etzion, Tuvi; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2016.2543739
Multi-permutations and in particular permutations appear in various applications in an information theory. New applications, such as rank modulation for flash memories, have suggested the need to consider error-correcting codes for multi-permutations. In this paper, we study systematic error-correcting codes for multi-permutations in general and for permutations in particular. For a given number of information symbols k, and for any integer t, we present a construction of (k+r,k)systematic t-error-correcting codes, for permutations of length k+r, where the number of redundancy symbols r is relatively small. In particular, for a given t and for sufficiently large k, we obtain r=t+1, while a lower bound on the number of redundancy symbols is shown to be t. The same construction is also applied to obtain related systematic error-correcting codes for any types of multi-permutations.https://authors.library.caltech.edu/records/s703z-6bk56The capacity of some Pólya string models
https://resolver.caltech.edu/CaltechAUTHORS:20160824-102815029
Authors: Elishco, Ohad; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/ISIT.2016.7541303
We study random string-duplication systems, called Pólya string models, motivated by certain random mutation processes in the genome of living organisms. Unlike previous works that study the combinatorial capacity of string-duplication systems, or peripheral properties such as symbol frequency, this work provides exact capacity or bounds on it, for several probabilistic models. In particular, we give the exact capacity of the random tandem-duplication system, and the end-duplication system, and bound the capacity of the complement tandem-duplication system. Interesting connections are drawn between the former and the beta distribution common to population genetics, as well as between the latter system and signatures of random permutations.https://authors.library.caltech.edu/records/fqgw6-9mh41Secure RAID Schemes for Distributed Storage
https://resolver.caltech.edu/CaltechAUTHORS:20160823-165433889
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/ISIT.2016.7541529
We propose secure RAID, i.e., low-complexity
schemes to store information in a distributed manner that is
resilient to node failures and resistant to node eavesdropping. We generalize the concept of systematic encoding to secure RAID and show that systematic schemes have significant advantages in the efficiencies of encoding, decoding and random access. For
the practical high rate regime, we construct three XOR-based
systematic secure RAID schemes with optimal encoding and
decoding complexities, from the EVENODD codes and B codes,
which are array codes widely used in the RAID architecture.
These schemes optimally tolerate two node failures and two
eavesdropping nodes. For more general parameters, we construct efficient systematic secure RAID schemes from Reed-Solomon codes. Our results suggest that building "keyless", information-theoretic security into the RAID architecture is practical.https://authors.library.caltech.edu/records/5q6h9-6cr80On the duplication distance of binary strings
https://resolver.caltech.edu/CaltechAUTHORS:20160824-101618060
Authors: Alon, Noga; Bruck, Jehoshua; Farnoud (Hassanzadeh), Farzad; Jain, Siddharth
Year: 2016
DOI: 10.1109/ISIT.2016.7541301
We study the tandem duplication distance between binary sequences and their roots. This distance is motivated by genomic tandem duplication mutations and counts the smallest number of tandem duplication events that are required to take one sequence to another. We consider both exact and approximate tandem duplications, the latter leading to a combined duplication/Hamming distance. The paper focuses on the maximum value of the duplication distance to the root. For exact duplication, denoting the maximum distance to the root of a sequence of length n by f(n), we prove that f(n) = Θ(n). For the case of approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show using the Plotkin bound that the maximum distance has a sharp transition from linear to logarithmic in n at β = 1/2.https://authors.library.caltech.edu/records/hzp2x-rjp69Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms
https://resolver.caltech.edu/CaltechAUTHORS:20160823-165024070
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/ISIT.2016.7541455
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present a family of codes for correcting errors due to tandem-duplications of a fixed length and any number of errors. We also study codes for correcting tandem duplications of length up to a given constant k, where we are primarily focused on the cases of k = 2, 3.https://authors.library.caltech.edu/records/q6hxk-ejv90Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes
https://resolver.caltech.edu/CaltechAUTHORS:20160622-104849133
Authors: En Gad, Eyal; Li, Yue; Kliewer, Jörg; Langberg, Michael; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2016.2539967
We propose efficient coding schemes for two communication settings: 1) asymmetric channels and 2) channels with an informed encoder. These settings are important in non-volatile memories, as well as optical and broadcast communication. The schemes are based on non-linear polar codes, and they build on and improve recent work on these settings. In asymmetric channels, we tackle the exponential storage requirement of previously known schemes that resulted from the use of large Boolean functions. We propose an improved scheme that achieves the capacity of asymmetric channels with polynomial computational complexity and storage requirement. The proposed non-linear scheme is then generalized to the setting of channel coding with an informed encoder using a multicoding technique. We consider specific instances of the scheme for flash memories that incorporate error-correction capabilities together with rewriting. Since the considered codes are non-linear, they eliminate the requirement of previously known schemes (called polar write-once-memory codes) for shared randomness between the encoder and the decoder. Finally, we mention that the multicoding scheme is also useful for broadcast communication in Marton's region, improving upon previous schemes for this setting.https://authors.library.caltech.edu/records/hynhb-7fv07Explicit Minimum Storage Regenerating Codes
https://resolver.caltech.edu/CaltechAUTHORS:20160930-131654865
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2016.2553675
In distributed storage, a file is stored in a set of nodes and protected by erasure-correcting codes. Regenerating code is a type of code with two properties: first, it can reconstruct the entire file in the presence of any r node erasures for some specified integer r; second, it can efficiently repair an erased node from any subset of remaining nodes with a given size. In the repair process, the amount of information transmitted from each node normalized by the storage size per node is termed repair bandwidth (fraction). When the storage size per node is minimized, the repair bandwidth is lower bounded by 1/r, where r is the number of parity nodes. A code attaining this lower bound is said to have optimal repair. We consider codes with minimum storage size per node and optimal repair, called minimum storage regenerating (MSR) codes. In particular, if an MSR code has r parities and any r erasures occur, then by transmitting all the information from the remaining nodes, the original file can be reconstructed. On the other hand, if only one erasure occurs, only a fraction of 1/r of the information in each remaining node needs to be transmitted. If we view each node as a vector or a column over some field, then the code forms a 2-D array. Given the length of the column l and the number of parities r, we explicitly construct the high-rate MSR codes. The number of systematic nodes of our construction is (r + 1) log_rl, which is longer than previously known results. Besides, we construct the MSR codes with other desirable properties: first, the codes with low complexity when the information is updated, and second, the codes with low access or storage node I/O cost during repair.https://authors.library.caltech.edu/records/17jmt-mh471Approximate sorting of data streams with limited storage
https://resolver.caltech.edu/CaltechAUTHORS:20161202-085415227
Authors: Farnoud (Hassanzadeh), Farzad; Yaakobi, Eitan; Bruck, Jehoshua
Year: 2016
DOI: 10.1007/s10878-015-9930-6
We consider the problem of approximate sorting of a data stream (in one pass) with limited internal storage where the goal is not to rearrange data but to output a permutation that reflects the ordering of the elements of the data stream as closely as possible. Our main objective is to study the relationship between the quality of the sorting and the amount of available storage. To measure quality, we use permutation distortion metrics, namely the Kendall tau, Chebyshev, and weighted Kendall metrics, as well as mutual information, between the output permutation and the true ordering of data elements. We provide bounds on the performance of algorithms with limited storage and present a simple algorithm that asymptotically requires a constant factor as much storage as an optimal algorithm in terms of mutual information and average Kendall tau distortion. We also study the case in which only information about the most recent elements of the stream is available. This setting has applications to learning user preference rankings in services such as Netflix, where items are presented to the user one at a time.https://authors.library.caltech.edu/records/qqdg7-0na97Duplication Distance to the Root for Binary Sequences
https://resolver.caltech.edu/CaltechAUTHORS:20161108-134615672
Authors: Alon, Noga; Bruck, Jehoshua; Farnoud (Hassanzadeh), Farzad; Jain, Siddharth
Year: 2016
We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form x = abc → y = abbc, where
x and y are sequences and a, b, and c are their substrings, needed to generate a binary sequence of length
n starting from a square-free sequence from the set
{0, 1, 01, 10, 010, 101}. This problem is a restricted case of finding the duplication/deduplication
distance between two sequences, defined as the minimum number of duplication and deduplication
operations required to transform one sequence to the other. We consider both exact and approximate tandem duplications. For exact duplication, denoting the maximum distance to the root of a sequence of length n by f(n), we prove that
f(n) = θ(n). For the case of approximate
duplication, where a β-fraction of symbols may be duplicated incorrectly, we show that the
maximum distance has a sharp transition from linear in n to logarithmic at β = 1/2. We also
study the duplication distance to the root for sequences with a given root and for special classes of
sequences, namely, the de Bruijn sequences, the Thue-Morse sequence, and the Fibbonaci words.
The problem is motivated by genomic tandem duplication mutations and the smallest number of
tandem duplication events required to generate a given biological sequence.https://authors.library.caltech.edu/records/p1xt4-p6969Communication Efficient Secret Sharing
https://resolver.caltech.edu/CaltechAUTHORS:20161011-150403696
Authors: Huang, Wentao; Langberg, Michael; Kliewer, Joerg; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/TIT.2016.2616144
A secret sharing scheme is a method to store information securely and reliably. Particularly, in a threshold secret sharing scheme, a secret is encoded into n shares, such that any set of at least t_1 shares suffice to decode the secret, and any set of at most t_2 < t_1 shares reveal no information about the secret. Assuming that each party holds a share and a user wishes to decode the secret by receiving information from a set of parties; the question we study is how to minimize the amount of communication between the user and the parties. We show that the necessary amount of communication, termed "decoding bandwidth", decreases as the number of parties that participate in decoding increases. We prove a tight lower bound on the decoding bandwidth, and construct secret sharing schemes achieving the bound. Particularly, we design a scheme that achieves the optimal decoding bandwidth when d parties participate in decoding, universally for all t_1 ≤ d ≤ n. The scheme is based on a generalization of Shamir's secret sharing scheme and preserves its simplicity and efficiency. In addition, we consider the setting of secure distributed storage where the proposed communication efficient secret sharing schemes not only improve decoding bandwidth but further improve disk access complexity during decoding.https://authors.library.caltech.edu/records/qtvry-hmz07Noise and Uncertainty in String-Duplication Systems
https://resolver.caltech.edu/CaltechAUTHORS:20170119-133807104
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2017
Duplication mutations play a critical role in the
generation of biological sequences. Simultaneously, they
have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation
noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations of a constant length, the generation capacity of these mutations is zero. However, if the number of point mutations increases to a constant fraction of the number of duplications, then the capacity is nonzero. Lower and upper bounds for this capacity are also presented. Another problem that we study is concerned with the
mismatch between code design and channel in data storage in the DNA of living organisms with respect to duplication mutations. In this context, we consider the uncertainty of such a mismatched coding scheme measured as the maximum number of input codewords that can lead to the same output.https://authors.library.caltech.edu/records/vq6gy-wh217Optimal Rebuilding of Multiple Erasures in MDS Codes
https://resolver.caltech.edu/CaltechAUTHORS:20170119-080421044
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/TIT.2016.2633411
Maximum distance separable (MDS) array codes are widely used in storage systems due to their computationally efficient encoding and decoding procedures. An MDS code with r redundancy nodes can correct any r node erasures by accessing (reading) all the remaining information in the surviving nodes. However, in practice, e erasures are a more likely failure event, for some 1≤ehttps://authors.library.caltech.edu/records/bk2bc-4f252Correcting errors by natural redundancy
https://resolver.caltech.edu/CaltechAUTHORS:20170907-081956775
Authors: Jiang, Anxiao (Andrew); Upadhyaya, Pulakesh; Haratsch, Erich F.; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ITA.2017.8023455
For the storage of big data, there are significant challenges with its long-term reliability. This paper studies how to use the natural redundancy in data for error correction, and how to combine it with error-correcting codes to effectively improve data reliability. It explores several aspects of natural redundancy, including the discovery of natural redundancy in compressed data, the efficient decoding of codes with random structures, the capacity of error-correcting codes that contain natural redundancy, and the time-complexity tradeoff between source coding and channel coding.https://authors.library.caltech.edu/records/bm7m7-z9t46Switch Codes: Codes for Fully Parallel Reconstruction
https://resolver.caltech.edu/CaltechAUTHORS:20170315-151626957
Authors: Wang, Zhiying; Kiah, Han Mao; Cassuto, Yuval; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/TIT.2017.2664867
Network switches and routers scale in rate by distributing the packet read/write operations across multiple memory banks. Rate scaling is achieved so long as sufficiently many packets can be written and read in parallel. However, due to the non-determinism of the read process, parallel pending read requests may contend on memory banks, and thus significantly lower the switching rate. In this paper, we provide a constructive study of codes that guarantee fully parallel data reconstruction without contention. We call these codes "switch codes," and construct three optimal switch-code families with different parameters. All the constructions use only simple XOR-based encoding and decoding operations, an important advantage when operated in ultra-high speeds. Switch codes achieve their good performance by spanning simultaneous disjoint local-decoding sets for all their information symbols. Switch codes may be regarded as an extreme version of the previously studied batch codes, where the switch version requires parallel reconstruction of all the information symbols.https://authors.library.caltech.edu/records/323t5-36794Secure RAID schemes from EVENODD and STAR codes
https://resolver.caltech.edu/CaltechAUTHORS:20170816-162125720
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ISIT.2017.8006600
We study secure RAID, i.e., low-complexity schemes to store information in a distributed manner that is resilient to node failures and resistant to node eavesdropping. We describe a technique to shorten the secure EVENODD scheme in [6], which can optimally tolerate 2 node failures and 2 eavesdropping nodes. The shortening technique allows us to obtain secure EVENODD schemes of arbitrary lengths, which is important for practical application. We also construct a new secure RAID scheme from the STAR code. The scheme can tolerate 3 node failures and 3 eavesdropping nodes with optimal encoding/decoding and random access complexity.https://authors.library.caltech.edu/records/62pqt-f5y63Secret sharing with optimal decoding and repair bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20170816-153318334
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ISIT.2017.8006842
This paper studies the communication efficiency of threshold secret sharing schemes. We construct a family of Shamir's schemes with asymptotically optimal decoding bandwidth for arbitrary parameters. We also construct a family of secret sharing schemes with both optimal decoding and optimal repair bandwidth for arbitrary parameters. The construction leads to a family of regenerating codes allowing centralized repair of multiple node failures with small sub-packetization.https://authors.library.caltech.edu/records/6v3tf-gx561Noise and uncertainty in string-duplication systems
https://resolver.caltech.edu/CaltechAUTHORS:20170816-165117076
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ISIT.2017.8007104
Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations of a constant length, the generation capacity of these mutations is zero. However, if the number of point mutations increases to a constant fraction of the number of duplications, then the capacity is nonzero. Lower and upper bounds for this capacity are also presented. Another problem that we study is concerned with the mismatch between code design and channel in data storage in the DNA of living organisms with respect to duplication mutations. In this context, we consider the uncertainty of such a mismatched coding scheme measured as the maximum number of input codewords that can lead to the same output.https://authors.library.caltech.edu/records/vqckw-85z25Generic Secure Repair for Distributed Storage
https://resolver.caltech.edu/CaltechAUTHORS:20170713-092535943
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2017
DOI: 10.48550/arXiv.1706.00500
This paper studies the problem of repairing secret sharing schemes, i.e., schemes that encode a message into n shares, assigned to n nodes, so that any n − r nodes can decode the message but any colluding z nodes cannot infer any information about the message. In the event of node failures so that shares held by the failed nodes are lost, the system needs to be repaired by
reconstructing and reassigning the lost shares to the failed (or replacement) nodes. This can be achieved trivially by a trustworthy third-party that receives the shares of the available nodes, recompute and reassign the lost shares. The interesting question, studied in the paper, is how to repair without a trustworthy third-party. The main issue that arises is repair security: how to maintain the
requirement that any colluding z nodes, including the failed nodes, cannot learn any information about the message, during and after the repair process? We solve this secure repair problem from the perspective of secure multi-party computation. Specifically, we design generic repair schemes that can securely repair any (scalar or vector) linear secret sharing schemes. We prove a lower bound on the repair bandwidth of secure repair schemes and show that the proposed secure repair schemes achieve the optimal
repair bandwidth up to a small constant factor when
n dominates z, or when the secret sharing scheme being repaired has optimal rate. We adopt a formal information-theoretic approach in our analysis and bounds. A main idea in our schemes is to allow a more flexible repair model than the straightforward one-round repair model implicitly assumed by existing secure regenerating codes. Particularly, the proposed secure repair schemes are simple and efficient two-round protocols.https://authors.library.caltech.edu/records/28k7n-46f98Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms
https://resolver.caltech.edu/CaltechAUTHORS:20170330-092251690
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/TIT.2017.2688361
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present two families of codes for correcting errors due to tandem duplications of a fixed length; the first family can correct any number of errors while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant k , where we are primarily focused on the cases of k=2,3 . Finally, we provide a full classification of the sets of lengths allowed in tandem duplication that result in a unique root for all sequences.https://authors.library.caltech.edu/records/fwh9p-xx682Stopping Set Elimination for LDPC Codes
https://resolver.caltech.edu/CaltechAUTHORS:20180125-132316726
Authors: Jiang, Anxiao (Andrew); Upadhyaya, Pulakesh; Wang, Ying; Narayanan, Krishna R.; Zhou, Hongchao; Sima, Jin; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ALLERTON.2017.8262806
This work studies the Stopping-Set Elimination Problem, namely, given a stopping set, how to remove the fewest erasures so that the remaining erasures can be decoded by belief propagation in k iterations (including k =∞). The NP-hardness of the problem is proven. An approximation algorithm is presented for k = 1. And efficient exact algorithms are presented for general k when the stopping sets form trees.https://authors.library.caltech.edu/records/vnf04-m3829Capacity and Expressiveness of Genomic Tandem Duplication
https://resolver.caltech.edu/CaltechAUTHORS:20170719-165632439
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/TIT.2017.2728079
The majority of the human genome consists of repeated sequences. An important type of repeated sequences common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTCGC, TGTG is a tandem repeat, that may be generated from AGTCTGC by a tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a seed, i.e. a small initial string, by tandem duplications of bounded length. We study the capacity of such a system, a notion that quantifies the system's generating power. Our results include exact capacity values for certain tandem duplication string systems. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system as the capability of expressing arbitrary substrings. We then completely characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. In particular, based on a celebrated result by Axel Thue from 1906, presenting a construction for ternary squarefree sequences, we show that for alphabets of size 4 or larger, bounded tandem duplication systems, regardless of the seed and the bound on duplication length, are not fully expressive, i.e. they cannot generate all strings even as substrings of other strings. Note that the alphabet of size 4 is of particular interest as it pertains to the genomic alphabet. Building on this result, we also show that these systems do not have full capacity. In general, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems.https://authors.library.caltech.edu/records/f06ka-6gp96Duplication Distance to the Root for Binary Sequences
https://resolver.caltech.edu/CaltechAUTHORS:20170726-162754925
Authors: Alon, Noga; Bruck, Jehoshua; Farnoud (Hassanzadeh), Farzad; Jain, Siddharth
Year: 2017
DOI: 10.1109/TIT.2017.2730864
We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form x = abc → y = abbc, where x and y are sequences and a, b, and c are their substrings, needed to generate a binary sequence of length n starting from a square-free sequence from the set {0, 1, 01, 10, 010, 101}. This problem is a restricted case of finding the duplication/deduplication distance between two sequences, defined as the minimum number of duplication and deduplication operations required to transform one sequence to the other. We consider both exact and approximate tandem duplications. For exact duplication, denoting the maximum distance to the root of a sequence of length n by f(n), we prove that f(n) = Θ(n). For the case of approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show that the maximum distance has a sharp transition from linear in n to logarithmic at β = 1/2. We also study the duplication distance to the root for the set of sequences arising from a given root and for special classes of sequences, namely, the De Bruijn sequences, the Thue-Morse sequence, and the Fibonacci words. The problem is motivated by genomic tandem duplication mutations and the smallest number of tandem duplication events required to generate a given biological sequence.https://authors.library.caltech.edu/records/vnqxp-zs828Probabilistic switching circuits in DNA
https://resolver.caltech.edu/CaltechAUTHORS:20180117-072812871
Authors: Wilhelm, Daniel; Bruck, Jehoshua; Qian, Lulu
Year: 2018
DOI: 10.1073/pnas.1715926115
PMCID: PMC5798357
A natural feature of molecular systems is their inherent stochastic behavior. A fundamental challenge related to the programming of molecular information processing systems is to develop a circuit architecture that controls the stochastic states of individual molecular events. Here we present a systematic implementation of probabilistic switching circuits, using DNA strand displacement reactions. Exploiting the intrinsic stochasticity of molecular interactions, we developed a simple, unbiased DNA switch: An input signal strand binds to the switch and releases an output signal strand with probability one-half. Using this unbiased switch as a molecular building block, we designed DNA circuits that convert an input signal to an output signal with any desired probability. Further, this probability can be switched between 2^n different values by simply varying the presence or absence of n distinct DNA molecules. We demonstrated several DNA circuits that have multiple layers and feedback, including a circuit that converts an input strand to an output strand with eight different probabilities, controlled by the combination of three DNA molecules. These circuits combine the advantages of digital and analog computation: They allow a small number of distinct input molecules to control a diverse signal range of output molecules, while keeping the inputs robust to noise and the outputs at precise values. Moreover, arbitrarily complex circuit behaviors can be implemented with just a single type of molecular building block.https://authors.library.caltech.edu/records/9wjw1-zhb03Attaining the 2nd Chargaff Rule by Tandem Duplications
https://resolver.caltech.edu/CaltechAUTHORS:20180105-092230028
Authors: Jain, Siddharth; Raviv, Netanel; Bruck, Jehoshua
Year: 2018
Erwin Chargaff in 1950 made an experimental observation that the count of A is equal to the count of T and the count of C is equal to the count of G in DNA. This observation played a crucial rule in the discovery of the double stranded helix structure by Watson and Crick. However, this symmetry was also observed in single stranded DNA. This phenomenon was termed as 2nd Chargaff Rule. This symmetry has been verified experimentally in genomes of several different species not only for mononucleotides but also for reverse complement pairs of larger lengths up to a small error. While the symmetry in double stranded DNA is related to base pairing, and replication mechanisms, the symmetry in a single stranded DNA is still a mystery in its function and source. In this work, we define a sequence generation model based on reverse complement tandem duplications. We show that this model generates sequences that satisfy the 2nd Chargaff Rule even when the duplication lengths are very small when compared to the length of sequences. We also provide estimates on the number of generations that are needed by this model to generate sequences that satisfy 2nd Chargaff Rule. We provide theoretical bounds on the disruption in symmetry for different values of duplication lengths under this model. Moreover, we experimentally compare the disruption in the symmetry incurred by our model with what is observed in human genome data.https://authors.library.caltech.edu/records/cs627-f8728Stash in a Flash
https://resolver.caltech.edu/CaltechAUTHORS:20180308-133517936
Authors: Zuck, Aviad; Li, Yue; Bruck, Jehoshua; Porter, Donald E.; Tsafrir, Dan
Year: 2018
Encryption is a useful tool to protect data confidentiality. Yet it is still challenging to hide the very presence of encrypted, secret data from a powerful adversary. This paper presents a new technique to hide data in flash by manipulating the voltage level of pseudo-randomlyselected flash cells to encode two bits (rather than one) in the cell. In this model, we have one "public" bit interpreted using an SLC-style encoding, and extract a private bit using an MLC-style encoding. The locations of cells that encode hidden data is based on a secret key known only to the hiding user.
Intuitively, this technique requires that the voltage level in a cell encoding data must be (1) not statistically distinguishable from a cell only storing public data, and (2) the user must be able to reliably read the hidden data from this cell. Our key insight is that there is a wide enough variation in the range of voltage levels in a typical flash device to obscure the presence of fine-grained changes to a small fraction of the cells, and that the variation is wide enough to support reliably re-reading hidden data. We demonstrate that our hidden data and underlying voltage manipulations go undetected by support vector machine based supervised learning which performs similarly to a random guess. The error rates of our scheme are low enough that the data is recoverable months after being stored. Compared to prior work, our technique provides 24x and 50x higher encoding and decoding throughput and doubles the capacity, while being 37x more power efficient.https://authors.library.caltech.edu/records/msxyd-8x553Two Deletion Correcting Codes from Indicator Vectors
https://resolver.caltech.edu/CaltechAUTHORS:20180709-103747373
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2018
DOI: 10.1109/ISIT.2018.8437868
Construction of capacity achieving deletion correcting codes has been a baffling challenge for decades. A recent breakthrough by Brakensiek et al., alongside novel applications in DNA storage, have reignited the interest in this longstanding open problem. In spite of recent advances, the amount of redundancy in existing codes is still orders of magnitude away from being optimal. In this paper, a novel approach for constructing binary two-deletion correcting codes is proposed. By this approach, parity symbols are computed from indicator vectors (i.e., vectors that indicate the positions of certain patterns) of the encoded message, rather than from the message itself. Most interestingly, the parity symbols and the proof of correctness are a direct generalization of their counterparts in the Varshamov- Tenengolts construction. Our techniques require 7log(n)+o(log(n) redundant bits to encode an n-bit message, which is near-optimal.https://authors.library.caltech.edu/records/g7qc8-w6s47Stash in a Flash
https://resolver.caltech.edu/CaltechAUTHORS:20190328-165908772
Authors: Zuck, Aviad; Li, Yue; Bruck, Jehoshua; Porter, Donald E.; Tsafrir, Dan
Year: 2018
DOI: 10.1145/3211890.3211906
[no abstract]https://authors.library.caltech.edu/records/56bqy-q0610How to Best Share a Big Secret
https://resolver.caltech.edu/CaltechAUTHORS:20180828-142513016
Authors: Shor, Roman; Yadgar, Gala; Huang, Wentao; Yaakobi, Eitan; Bruck, Jehoshua
Year: 2018
DOI: 10.1145/3211890.3211896
When sensitive data is stored in the cloud, the only way to ensure its secrecy is by encrypting it before it is uploaded. The emerging multi-cloud model, in which data is stored redundantly in two or more independent clouds, provides an opportunity to protect sensitive data with secret-sharing schemes. Both data-protection approaches are considered computationally expensive, but recent advances reduce their costs considerably: (1) Hardware acceleration methods promise to eliminate the computational complexity of encryption, but leave clients with the challenge of securely managing encryption keys. (2) Secure RAID, a recently proposed scheme, minimizes the computational overheads of secret sharing, but requires non-negligible storage overhead and random data generation. Each data-protection approach offers different tradeoffs and security guarantees. However, when comparing them, it is difficult to determine which approach will provide the best application-perceived performance, because previous studies were performed before their recent advances were introduced.
To bridge this gap, we present the first end-to-end comparison of state-of-the-art encryption-based and secret sharing data protection approaches. Our evaluation on a local cluster and on a multi-cloud prototype identifies the tipping point at which the bottleneck of data protection shifts from the computational overhead of encoding and random data generation to storage and network bandwidth and global availability.https://authors.library.caltech.edu/records/q4798-q1h04Attaining the 2nd Chargaff Rule by Tandem Duplications
https://resolver.caltech.edu/CaltechAUTHORS:20181126-143839274
Authors: Jain, Siddharth; Raviv, Netanel; Bruck, Jehoshua
Year: 2018
DOI: 10.1109/ISIT.2018.8437526
Erwin Chargaff in 1950 made an experimental observation that the count of A is equal to the count of T and the count of C is equal to the count of G in DNA. This observation played a crucial role in the discovery of the double stranded helix structure by Watson and Crick. However, this symmetry was also observed in single stranded DNA. This phenomenon was termed as the 2nd Chargaff Rule. This symmetry has been verified experimentally in genomes of several different species not only for mononucleotides but also for reverse complement pairs of larger lengths upto a small error. While the symmetry in double stranded DNA is related to base pairing and replication mechanisms, the symmetry in a single stranded DNA is still a mystery in its function and source. In this work, we define a sequence generation model based on reverse complement tandem duplications. We show that this model generates sequences that satisfy the 2nd Chargaff Rule even when the duplication lengths are very small when compared to the length of sequences. We also provide estimates on the number of generations that are needed by this model to generate sequences that satisfy the 2nd Chargaff Rule. We provide theoretical bounds on the disruption in symmetry for different values of duplication lengths under this model. Moreover, we experimentally compare the disruption in the symmetry incurred by our model with what is observed in human genome data.https://authors.library.caltech.edu/records/fa2gc-evf17Two Deletion Correcting Codes from Indicator Vectors
https://resolver.caltech.edu/CaltechAUTHORS:20180709-102730008
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2018
Construction of capacity achieving deletion correcting codes has been a baffling challenge for decades. A recent breakthrough by Brakensiek et al., alongside novel applications in DNA storage, have reignited the interest in this longstanding open problem. In spite of recent advances, the amount of redundancy in existing codes is still orders of magnitude away from being optimal. In this paper, a novel approach for constructing binary two-deletion correcting codes is proposed. By this approach, parity symbols are computed from indicator vectors (i.e., vectors that indicate the positions of certain patterns) of the encoded message, rather than from the message itself. Most interestingly, the parity symbols and the proof of correctness are a direct generalization of their counterparts in the Varshamov-Tenengolts construction. Our techniques require 7 log(n) + o(log(n) redundant bits to encode an n-bit message, which is near-optimal.https://authors.library.caltech.edu/records/v820p-vn536Secure RAID Schemes from EVENODD and STAR Codes
https://resolver.caltech.edu/CaltechAUTHORS:20180709-101600551
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2018
We study secure RAID, i.e., low-complexity schemes to store information in a distributed manner that is resilient to node failures and resistant to node eavesdropping. We describe a technique to shorten the secure EVENODD scheme in [6], which can optimally tolerate 2 node failures and 2 eavesdropping nodes. The shortening technique allows us to obtain secure EVENODD schemes of arbitrary lengths, which is important for practical application. We also construct a new secure RAID scheme from the STAR code. The scheme can tolerate 3 node failures and 3 eavesdropping nodes with optimal encoding/decoding and random access complexity.https://authors.library.caltech.edu/records/j9nc6-ns538Secret Sharing with Optimal Decoding and Repair Bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20180709-102239656
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2018
This paper studies the communication efficiency of threshold secret sharing schemes. We construct a family of Shamir's schemes with asymptotically optimal decoding bandwidth for arbitrary parameters. We also construct a family of secret sharing schemes with both optimal decoding bandwidth and optimal repair bandwidth for arbitrary parameters. The construction also leads to a family of regenerating codes allowing centralized repair of multiple node failures with small sub-packetization.https://authors.library.caltech.edu/records/yfm4q-xt198The Capacity of Some Pólya String Models
https://resolver.caltech.edu/CaltechAUTHORS:20180820-100255874
Authors: Elishco, Ohad; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2018
We study random string-duplication systems, which we call Pólya string models. These are motivated by DNA storage in living organisms, and certain random mutation processes that affect their genome. Unlike previous works that study the combinatorial capacity of string-duplication systems, or various string statistics, this work provides exact capacity or bounds on it, for several probabilistic models. In particular, we study the capacity of noisy string-duplication systems, including the tandem-duplication, end-duplication, and interspersed-duplication systems. Interesting connections are drawn between some systems and the signature of random permutations, as well as to the beta distribution common in population genetics.https://authors.library.caltech.edu/records/f8j16-pfh10On Coding over Sliced Information
https://resolver.caltech.edu/CaltechAUTHORS:20181002-162910265
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2018
The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is asymptotically equal to the amount required in the classical error correcting paradigm.https://authors.library.caltech.edu/records/gvax7-79g91Short Tandem Repeats Information in TCGA is Statistically Biased by Amplification
https://resolver.caltech.edu/CaltechAUTHORS:20190114-091231818
Authors: Jain, Siddharth; Mazaheri, Bijan; Raviv, Netanel; Bruck, Jehoshua
Year: 2019
DOI: 10.1101/518878
The current paradigm in data science is based on the belief that given sufficient amounts of data, classifiers are likely to uncover the distinction between true and false hypotheses. In particular, the abundance of genomic data creates opportunities for discovering disease risk associations and help in screening and treatment. However, working with large amounts of data is statistically beneficial only if the data is statistically unbiased. Here we demonstrate that amplification methods of DNA samples in TCGA have a substantial effect on short tandem repeat (STR) information. In particular, we design a classifier that uses the STR information and can distinguish between samples that have an analyte code D and an analyte code W. This artificial bias might be detrimental to data driven approaches, and might undermine the conclusions based on past and future genome wide studies.https://authors.library.caltech.edu/records/j3fdk-efx04Cancer Classification from Healthy DNA using Machine Learning
https://resolver.caltech.edu/CaltechAUTHORS:20190114-074334836
Authors: Jain, Siddharth; Mazaheri, Bijan; Raviv, Netanel; Bruck, Jehoshua
Year: 2019
DOI: 10.1101/517839
The genome is traditionally viewed as a time-independent source of information; a paradigm that drives researchers to seek correlations between the presence of certain genes and a patient's risk of disease. This analysis neglects genomic temporal changes, which we believe to be a crucial signal for predicting an individual's susceptibility to cancer. We hypothesize that each individual's genome passes through an evolution channel (The term channel is motivated by the notion of communication channel introduced by Shannon in 1948 and started the area of Information Theory), that is controlled by hereditary, environmental and stochastic factors. This channel differs among individuals, giving rise to varying predispositions to developing cancer. We introduce the concept of mutation profiles that are computed without any comparative analysis, but by analyzing the short tandem repeat regions in a single healthy genome and capturing information about the individual's evolution channel. Using machine learning on data from more than 5,000 TCGA cancer patients, we demonstrate that these mutation profiles can accurately distinguish between patients with various types of cancer. For example, the pairwise validation accuracy of the classifier between PAAD (pancreas) patients and GBM (brain) patients is 93%. Our results show that healthy unaffected cells still contain a cancer-specific signal, which opens the possibility of cancer prediction from a healthy genome.https://authors.library.caltech.edu/records/rtdrc-qrh95Estimation of duplication history under a stochastic model for tandem repeats
https://resolver.caltech.edu/CaltechAUTHORS:20190211-084750757
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2019
DOI: 10.1186/s12859-019-2603-1
Background: Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have led to their formation. The ability to extract this information can enhance our understanding of evolutionary mechanisms.
Results: We present a stochastic model for the formation of tandem repeats via tandem duplication and substitution mutations. Based on the analysis of this model, we develop a method for estimating the relative mutation rates of duplications and substitutions, as well as the total number of mutations, in the history of a tandem repeat sequence. We validate our estimation method via Monte Carlo simulation and show that it outperforms the state-of-the-art algorithm for discovering the duplication history. We also apply our method to tandem repeat sequences in the human genome, where it demonstrates the different behaviors of micro- and mini-satellites and can be used to compare mutation rates across chromosomes. It is observed that chromosomes that exhibit the highest mutation activity in tandem repeat regions are the same as those thought to have the highest overall mutation rates. However, unlike previous works that rely on comparing human and chimpanzee genomes to measure mutation rates, the proposed method allows us to find chromosomes with the highest mutation activity based on a single genome, in essence by comparing (approximate) copies of the pattern in tandem repeats.
Conclusion: The prevalence of tandem repeats in most organisms and the efficiency of the proposed method enable studying various aspects of the formation of tandem repeats and the surrounding sequences in a wide range of settings.https://authors.library.caltech.edu/records/wxrqy-7c610Download and Access Trade-offs in Lagrange Coded Computing
https://resolver.caltech.edu/CaltechAUTHORS:20190220-123432908
Authors: Raviv, Netanel; Yu, Qian; Bruck, Jehoshua; Avestimehr, Salman
Year: 2019
Lagrange Coded Computing (LCC) is a recently
proposed technique for resilient, secure, and private computation
of arbitrary polynomials in distributed environments. By
mapping such computations to composition of polynomials, LCC
allows the master node to complete the computation by accessing
a minimal number of workers and downloading all of their
content, thus providing resiliency to the remaining stragglers.
However, in the most common case in which the number of
stragglers is less than in the worst case scenario, much of the
computational power of the system remains unexploited. To
amend this issue, in this paper we expand LCC by studying a
fundamental trade-off between download and access, and present
two contributions. In the first contribution, it is shown that
without any modification to the encoding process, the master
can decode the computations by accessing a larger number of
nodes, however downloading less information from each node in
comparison with LCC (i.e., trading access for download). This
scheme relies on decoding a particular polynomial in the ideal
that is generated by the polynomials of interest, a technique we
call Ideal Decoding. This new scheme also improves LCC in the
sense that for systems with adversaries, the overall downloaded
bandwidth is smaller than in LCC. In the second contribution
we study a real-time model of this trade-off, in which the data
from the workers is downloaded sequentially. By clustering nodes
of similar delays and encoding the function with Universally
Decodable Matrices, the master can decode once sufficient data is
downloaded from every cluster, regardless of the internal delays
within that cluster. This allows the master to utilize the partial
work that is done by stragglers, rather than to ignore it, a feature
that most past works in coded computing are lacking.https://authors.library.caltech.edu/records/ejevm-acb46On the Uncertainty of Information Retrieval in Associative Memories
https://resolver.caltech.edu/CaltechAUTHORS:20181101-121348346
Authors: Yaakobi, Eitan; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/tit.2018.2878750
We (people) are memory machines. Our decision processes, emotions, and interactions with the world around us are based on and driven by associations to our memories. This natural association paradigm will become critical in future memory systems, namely, the key question will not be "How do I store more information?" but rather, "Do I have the relevant information? How do I retrieve it?"
The focus of this paper is to make a first step in this direction. We define and solve a very basic problem in associative retrieval. Given a word W, the words in the memory that are t-associated with W are the words in the ball of radius t around W. In general, given a set of words, say W, X and Y, the words that are t-associated with {W, X, Y} are those in the memory that are within distance t from all the three words. Our main goal is to study the maximum size of the t-associated set as a function of the number of input words and the minimum distance of the words in memory - we call this value the uncertainty of an associative memory. In this work we consider the Hamming distance and derive the uncertainty of the associative memory that consists of all the binary vectors with an arbitrary number of input words. In addition, we study the retrieval problem, namely, how do we get the t-associated set given the inputs? We note that this paradigm is a generalization of the sequences reconstruction problem that was proposed by Levenshtein (2001). In this model, a word is transmitted over multiple channels. A decoder receives all the channel outputs and decodes the transmitted word. Levenshtein computed the minimum number of channels that guarantee a successful decoder - this value happens to be the uncertainty of an associative memory with two input words.https://authors.library.caltech.edu/records/w6ppr-j3j45Optimal k-Deletion Correcting Codes
https://resolver.caltech.edu/CaltechAUTHORS:20190826-143512243
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/ISIT.2019.8849750
Levenshtein introduced the problem of constructing k-deletion correcting codes in 1966, proved that the optimal redundancy of those codes is O(k log N), and proposed an optimal redundancy single-deletion correcting code (using the so-called VT construction). However, the problem of constructing optimal redundancy k-deletion correcting codes remained open. Our key contribution is a solution to this longstanding open problem. We present a k-deletion correcting code that has redundancy 8k log n + o(log n) and encoding/decoding algorithms of complexity O(n^(2k+1)) for constant k.https://authors.library.caltech.edu/records/b8mh8-1ja32On Coding Over Sliced Information
https://resolver.caltech.edu/CaltechAUTHORS:20191004-100333511
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/isit.2019.8849596
The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide a code construction for a single substitution that is shown to be asymptotically optimal up to constants. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is orderwise equivalent to the amount required in the classical error correcting paradigm.https://authors.library.caltech.edu/records/7mswn-cjs38Download and Access Trade-offs in Lagrange Coded Computing
https://resolver.caltech.edu/CaltechAUTHORS:20191004-100332096
Authors: Raviv, Netanel; Yu, Qian; Bruck, Jehoshua; Avestimehr, Salman
Year: 2019
DOI: 10.1109/isit.2019.8849547
Lagrange Coded Computing (LCC) is a recently proposed technique for resilient, secure, and private computation of arbitrary polynomials in distributed environments. By mapping such computations to composition of polynomials, LCC allows the master node to complete the computation by accessing a minimal number of workers and downloading all of their content, thus providing resiliency to the remaining stragglers. However, in the most common case in which the number of stragglers is less than in the worst case scenario, much of the computational power of the system remains unexploited. To amend this issue, in this paper we expand LCC by studying a fundamental trade-off between download and access, and present two contributions. In the first contribution, it is shown that without any modification to the encoding process, the master can decode the computations by accessing a larger number of nodes, however downloading less information from each node in comparison with LCC (i.e., trading access for download). This scheme relies on decoding a particular polynomial in the ideal that is generated by the polynomials of interest, a technique we call Ideal Decoding. This new scheme also improves LCC in the sense that for systems with adversaries, the overall downloaded bandwidth is smaller than in LCC. In the second contribution we study a real-time model of this trade-off, in which the data from the workers is downloaded sequentially. By clustering nodes of similar delays and encoding the function with Universally Decodable Matrices, the master can decode once sufficient data is downloaded from every cluster, regardless of the internal delays within that cluster. This allows the master to utilize the partial work that is done by stragglers, rather than to ignore it, a feature that most past works in coded computing are lacking.https://authors.library.caltech.edu/records/8grfh-99r86Correcting Deletions in Multiple-Heads Racetrack Memories
https://resolver.caltech.edu/CaltechAUTHORS:20191004-100332823
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/isit.2019.8849783
One of the main challenges in developing racetrack memory systems is the limited precision in controlling the track shifts, that in turn affects the reliability of reading and writing the data. The current proposal for combating deletions in racetrack memories is to use redundant heads per-track resulting in multiple copies (potentially erroneous) and solving a specialized version of a sequence reconstruction problem. Using this approach, k-deletion correcting codes of length n, with d heads per-track, with redundancy log log n + 4 were constructed. However, the code construction requires that k ≤ d. For k > d, the best known construction improves slightly over the classic one head deletion code. Here we address the question: What is the best redundancy that can be achieved for a k-deletion code (k is a constant) if the number of heads is fixed at d (due to area limitations)? Our key result is an answer to this question, namely, we construct codes that can correct k deletions, for any k beyond the known limit of d. The code has O(k^4 dlog log n) redundancy for the case when k ≤ 2d − 1. In addition, when k ≥ 2d, the code has 2⌊k/d⌋ log n + o(log n) redundancy.https://authors.library.caltech.edu/records/ewcv1-ny846Iterative Programming of Noisy Memory Cells
https://resolver.caltech.edu/CaltechAUTHORS:20191004-104451577
Authors: Horovitz, Michal; Yaakobi, Eitan; Gad, Eyal En; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/ITW44776.2019.8989404
In this paper, we study a model, which was first presented by Bunte and Lapidoth, that mimics the programming operation of memory cells. Under this paradigm we assume that cells are programmed sequentially and individually. The programming process is modeled as transmission over a channel, while it is possible to read the cell state in order to determine its programming success, and in case of programming failure, to reprogram the cell again. Reprogramming a cell can reduce the bit error rate, however this comes with the price of increasing the overall programming time and thereby affecting the writing speed of the memory. An iterative programming scheme is an algorithm which specifies the number of attempts to program each cell. Given the programming channel and constraints on the average and maximum number of attempts to program a cell, we study programming schemes which maximize the number of bits that can be reliably stored in the memory. We extend the results by Bunte and Lapidoth and study this problem when the programming channel is either the BSC, BEC, or Z channel. For the BSC and the BEC our analysis is also extended for the case where the error probabilities on consecutive writes are not necessarily the same. Lastly, we also study a related model which is motivated by the synthesis process of DNA molecules.https://authors.library.caltech.edu/records/gj67s-2vh47The Capacity of String-Replication Systems
https://resolver.caltech.edu/CaltechAUTHORS:20191004-144116872
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2019
DOI: 10.48550/arXiv.1401.4634
It is known that the majority of the human genome consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple replication rules, including those resembling genomic replication processes. In other words, our goal is to find out the capacity, or the expressive power, of these string-replication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-replication systems.https://authors.library.caltech.edu/records/sg05y-kb033Rate-Distortion for Ranking with Incomplete Information
https://resolver.caltech.edu/CaltechAUTHORS:20191004-151348066
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2019
DOI: 10.48550/arXiv.1401.3093
We study the rate-distortion relationship in the set of permutations endowed with the Kendall Tau metric and the Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall Tau metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.https://authors.library.caltech.edu/records/kjq75-6ha83On Codes for Optimal Rebuilding Access
https://resolver.caltech.edu/CaltechAUTHORS:20191004-150559432
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2019
DOI: 10.48550/arXiv.1107.1627
MDS (maximum distance separable) array codes are widely used in storage systems due to their computationally efficient encoding and decoding procedures. An MDS code with r redundancy nodes can correct any r erasures by accessing (reading) all the remaining information in both the systematic nodes and the parity (redundancy) nodes. However, in practice, a single erasure is the most likely failure event; hence, a natural question is how much information do we need to access in order to rebuild a single storage node? We define the rebuilding ratio as the fraction of remaining information accessed during the rebuilding of a single erasure. In our previous work we showed that the optimal rebuilding ratio of 1/r is achievable (using our newly constructed array codes) for the rebuilding of any systematic node, however, all the information needs to be accessed for the rebuilding of the parity nodes. Namely, constructing array codes with a rebuilding ratio of 1/r was left as an open problem. In this paper, we solve this open problem and present array codes that achieve the lower bound of 1/r for rebuilding any single systematic or parity node.https://authors.library.caltech.edu/records/necdh-j6n98Linear Transformations for Randomness Extraction
https://resolver.caltech.edu/CaltechAUTHORS:20191004-151746022
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2019
DOI: 10.48550/arXiv.1209.0732
Information-efficient approaches for extracting randomness from imperfect sources have been extensively studied, but simpler and faster ones are required in the high-speed applications of random number generation. In this paper, we focus on linear constructions, namely, applying linear transformation for randomness extraction. We show that linear transformations based on sparse random matrices are asymptotically optimal to extract randomness from independent sources and bit-fixing sources, and they are efficient (may not be optimal) to extract randomness from hidden Markov sources. Further study demonstrates the flexibility of such constructions on source models as well as their excellent information-preserving capabilities. Since linear transformations based on sparse random matrices are computationally fast and can be easy to implement using hardware like FPGAs, they are very attractive in the high-speed applications. In addition, we explore explicit constructions of transformation matrices. We show that the generator matrices of primitive BCH codes are good choices, but linear transformations based on such matrices require more computational time due to their high densities.https://authors.library.caltech.edu/records/m07z5-rej76Generic Secure Repair for Distributed Storage
https://resolver.caltech.edu/CaltechAUTHORS:20191004-142514161
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2019
DOI: 10.48550/arXiv.1706.00500
This paper studies the problem of repairing secret sharing schemes, i.e., schemes that encode a message into n shares, assigned to n nodes, so that any n−r nodes can decode the message but any colluding z nodes cannot infer any information about the message. In the event of node failures so that shares held by the failed nodes are lost, the system needs to be repaired by reconstructing and reassigning the lost shares to the failed (or replacement) nodes. This can be achieved trivially by a trustworthy third-party that receives the shares of the available nodes, recompute and reassign the lost shares. The interesting question, studied in the paper, is how to repair without a trustworthy third-party. The main issue that arises is repair security: how to maintain the requirement that any colluding z nodes, including the failed nodes, cannot learn any information about the message, during and after the repair process? We solve this secure repair problem from the perspective of secure multi-party computation. Specifically, we design generic repair schemes that can securely repair any (scalar or vector) linear secret sharing schemes. We prove a lower bound on the repair bandwidth of secure repair schemes and show that the proposed secure repair schemes achieve the optimal repair bandwidth up to a small constant factor when n dominates z, or when the secret sharing scheme being repaired has optimal rate. We adopt a formal information-theoretic approach in our analysis and bounds. A main idea in our schemes is to allow a more flexible repair model than the straightforward one-round repair model implicitly assumed by existing secure regenerating codes. Particularly, the proposed secure repair schemes are simple and efficient two-round protocols.https://authors.library.caltech.edu/records/9r887-bym78Generating Probability Distributions using Multivalued Stochastic Relay Circuits
https://resolver.caltech.edu/CaltechAUTHORS:20191004-150014222
Authors: Lee, David; Bruck, Jehoshua
Year: 2019
DOI: 10.48550/arXiv.1102.1441
The problem of random number generation dates back to von Neumann's work in 1951. Since then, many algorithms have been developed for generating unbiased bits from complex correlated sources as well as for generating arbitrary distributions from unbiased bits. An equally interesting, but less studied aspect is the structural component of random number generation as opposed to the algorithmic aspect. That is, given a network structure imposed by nature or physical devices, how can we build networks that generate arbitrary probability distributions in an optimal way? In this paper, we study the generation of arbitrary probability distributions in multivalued relay circuits, a generalization in which relays can take on any of N states and the logical 'and' and 'or' are replaced with 'min' and 'max' respectively. Previous work was done on two-state relays. We generalize these results, describing a duality property and networks that generate arbitrary rational probability distributions. We prove that these networks are robust to errors and design a universal probability generator which takes input bits and outputs arbitrary binary probability distributions.https://authors.library.caltech.edu/records/b9hj3-tm168The Entropy Rate of Some Pólya String Models
https://resolver.caltech.edu/CaltechAUTHORS:20190829-100210948
Authors: Elishco, Ohad; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/tit.2019.2936556
We study random string-duplication systems, which we call Pólya string models. These are motivated by a class of mutations that are common in most organisms and lead to an abundance of repeated sequences in their genomes. Unlike previous works that study the combinatorial capacity of string-duplication systems, or in a probabilistic setting, various string statistics, this work provides the exact entropy rate or bounds on it, for several probabilistic models. The entropy rate determines the compressibility of the resulting sequences, as well as quantifying the amount of sequence diversity that these mutations can create. In particular, we study the entropy rate of noisy string-duplication systems, including the tandem-duplication, end-duplication, and interspersed-duplication systems, where in all cases we study duplication of length 1 only. Interesting connections are drawn between some systems and the signature of random permutations, as well as to the beta distribution common in population genetics.https://authors.library.caltech.edu/records/134ne-9h832Improve Robustness of Deep Neural Networks by Coding
https://resolver.caltech.edu/CaltechAUTHORS:20201209-153308085
Authors: Huang, Kunping; Raviv, Netanel; Jain, Siddharth; Upadhyaya, Pulakesh; Bruck, Jehoshua; Siegel, Paul H.; Jiang, Anxiao (Andrew)
Year: 2020
DOI: 10.1109/ita50056.2020.9244998
Deep neural networks (DNNs) typically have many weights. When errors appear in their weights, which are usually stored in non-volatile memories, their performance can degrade significantly. We review two recently presented approaches that improve the robustness of DNNs in complementary ways. In the first approach, we use error-correcting codes as external redundancy to protect the weights from errors. A deep reinforcement learning algorithm is used to optimize the redundancy-performance tradeoff. In the second approach, internal redundancy is added to neurons via coding. It enables neurons to perform robust inference in noisy environments.https://authors.library.caltech.edu/records/jqw53-1hh11Two Deletion Correcting Codes from Indicator Vectors
https://resolver.caltech.edu/CaltechAUTHORS:20191031-124926783
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/tit.2019.2950290
Construction of capacity achieving deletion correcting codes has been a baffling challenge for decades. A recent breakthrough by Brakensiek et al ., alongside novel applications in DNA storage, have reignited the interest in this longstanding open problem. In spite of recent advances, the amount of redundancy in existing codes is still orders of magnitude away from being optimal. In this paper, a novel approach for constructing binary two-deletion correcting codes is proposed. By this approach, parity symbols are computed from indicator vectors (i.e., vectors that indicate the positions of certain patterns) of the encoded message, rather than from the message itself. Most interestingly, the parity symbols and the proof of correctness are a direct generalization of their counterparts in the Varshamov-Tenengolts construction. Our techniques require 7log(n)+o(log(n)) redundant bits to encode an n-bit message, which is closer to optimal than previous constructions. Moreover, the encoding and decoding algorithms have O(n) time complexity.https://authors.library.caltech.edu/records/e5f1z-nwq17Optimal k-Deletion Correcting Codes
https://resolver.caltech.edu/CaltechAUTHORS:20200409-105733198
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2020
Levenshtein introduced the problem of constructing k-deletion correcting codes in 1966, proved that the optimal redundancy
of those codes is O(k log N), and proposed an optimal redundancy single-deletion correcting code (using the so-called VT
construction). However, the problem of constructing optimal redundancy k-deletion correcting codes remained open. Our key
contribution is a solution to this longstanding open problem. We present a k-deletion correcting code that has redundancy 8k log n+
o(log n) and encoding/decoding algorithms of complexity O(n^(2k+1)) for constant k.https://authors.library.caltech.edu/records/c66kf-a5m87Cancer Classification from Blood-Derived DNA
https://resolver.caltech.edu/CaltechAUTHORS:20200423-125742527
Authors: Jain, Siddharth; Mazaheri, Bijan; Raviv, Netanel; Bruck, Jehoshua
Year: 2020
DOI: 10.1101/517839
The genome is traditionally viewed as a time-independent source of information; a paradigm that drives researchers to seek correlations between the presence of certain genes and a patient's risk of disease. This analysis neglects genomic temporal changes, which we believe to be a crucial signal for predicting an individual's susceptibility to cancer. We hypothesize that each individual's genome passes through an evolution channel (The term channel is motivated by the notion of communication channel introduced by Shannon in 1948 and started the area of Information Theory), that is controlled by hereditary, environmental and stochastic factors. This channel differs among individuals, giving rise to varying predispositions to developing cancer. We introduce the concept of mutation profiles that are computed without any comparative analysis, but by analyzing the short tandem repeat regions in a single healthy genome and capturing information about the individual's evolution channel. Using machine learning on data from more than 5,000 TCGA cancer patients, we demonstrate that these mutation profiles can accurately distinguish between patients with various types of cancer. For example, the pairwise validation accuracy of the classifier between PAAD (pancreas) patients and GBM (brain) patients is 93%. Our results show that healthy unaffected cells still contain a cancer-specific signal, which opens the possibility of cancer prediction from a healthy genome.https://authors.library.caltech.edu/records/cmmk8-vm578CodNN - Robust Neural Networks From Coded Classification
https://resolver.caltech.edu/CaltechAUTHORS:20200427-091132325
Authors: Raviv, Netanel; Jain, Siddharth; Upadhyaya, Pulakesh; Bruck, Jehoshua; Jiang, Anxiao (Andrew)
Year: 2020
Deep Neural Networks (DNNs) are a revolutionary force in the ongoing information revolution, and yet their intrinsic properties remain a mystery. In particular, it is widely known that DNNs are highly sensitive to noise, whether adversarial or random. This poses a fundamental challenge for hardware implementations of DNNs, and for their deployment in critical applications such as autonomous driving.
In this paper we construct robust DNNs via error correcting codes. By our approach, either the data or internal layers of the DNN are coded with error correcting codes, and successful computation under noise is guaranteed. Since DNNs can be seen as a layered concatenation of classification tasks, our research begins with the core task of classifying noisy coded inputs, and progresses towards robust DNNs.
We focus on binary data and linear codes. Our main result is that the prevalent parity code can guarantee robustness for a large family of DNNs, which includes the recently popularized binarized neural networks. Further, we show that the coded classification problem has a deep connection to Fourier analysis of Boolean functions.
In contrast to existing solutions in the literature, our results do not rely on altering the training process of the DNN, and provide mathematically rigorous guarantees rather than experimental evidence.https://authors.library.caltech.edu/records/xp0ns-zr574Evolution of k-mer Frequencies and Entropy in Duplication and Substitution Mutation Systems
https://resolver.caltech.edu/CaltechAUTHORS:20191004-142813980
Authors: Lou, Hao; Schwartz, Moshe; Bruck, Jehoshua; Farnoud (Hassanzadeh), Farzad
Year: 2020
DOI: 10.1109/TIT.2019.2946846
Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processes, designing tractable stochastic models and analyzing them are challenging. In this paper, we study two kinds of systems, each representing a set of mutations. In the first system, tandem duplications and substitution mutations are allowed and in the other, interspersed duplications. We provide stochastic models and, via stochastic approximation, study the evolution of substring frequencies for these two systems separately. Specifically, we show that k-mer frequencies converge almost surely and determine the limit set. Furthermore, we present a method for finding upper bounds on entropy for such systems.https://authors.library.caltech.edu/records/5ywab-5d764Coding for Optimized Writing Rate in DNA Storage
https://resolver.caltech.edu/CaltechAUTHORS:20200511-120246633
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2020
A method for encoding information in DNA sequences is described. The method is based on the precision-resolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writing rate. Additionally, the encoding scheme studied here takes into account the existence of multiple copies of the DNA sequence, which are independently distorted. Finally, quantizers for various run-length distributions are designed.https://authors.library.caltech.edu/records/6k379-qfk69What is the Value of Data? on Mathematical Methods for Data Quality Estimation
https://resolver.caltech.edu/CaltechAUTHORS:20200831-142053055
Authors: Raviv, Netanel; Jain, Siddharth; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9174311
Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.https://authors.library.caltech.edu/records/hjtbp-gxn16Syndrome Compression for Optimal Redundancy Codes
https://resolver.caltech.edu/CaltechAUTHORS:20200831-142617575
Authors: Sima, Jin; Gabrys, Ryan; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9174009
We introduce a general technique that we call syndrome compression, for designing low-redundancy error correcting codes. The technique allows us to boost the redundancy efficiency of hash/labeling-based codes by further compressing the labeling. We apply syndrome compression to different types of adversarial deletion channels and present code constructions that correct up to a constant number of errors. Our code constructions achieve the redundancy of twice the Gilbert-Varshamov bound, which improve upon the state of art for these channels. The encoding/decoding complexity of our constructions is of order equal to the size of the corresponding deletion balls, namely, it is polynomial in the code length.https://authors.library.caltech.edu/records/hs589-fp998Robust Indexing - Optimal Codes for DNA Storage
https://resolver.caltech.edu/CaltechAUTHORS:20200831-134827466
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9174447
The channel model of encoding data as a set of unordered strings is receiving great attention as it captures the basic features of DNA storage systems. However, the challenge of constructing optimal redundancy codes for this channel remained elusive. In this paper, we solve this open problem and present an order-wise optimal construction of codes that correct multiple substitution errors for this channel model. The key ingredient in the code construction is a technique we call robust indexing: instead of using fixed indices to create order in unordered strings, we use indices that are information dependent and thus eliminate unnecessary redundancy. In addition, our robust indexing technique can be applied to the construction of optimal deletion/insertion codes for this channel.https://authors.library.caltech.edu/records/wxyrd-yh839Optimal Systematic t-Deletion Correcting Codes
https://resolver.caltech.edu/CaltechAUTHORS:20200831-144630883
Authors: Sima, Jin; Gabrys, Ryan; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9173986
Systematic deletion correcting codes play an important role in applications of document exchange. Yet despite a series of recent advances made in deletion correcting codes, most of them are non-systematic. To the best of the authors' knowledge, the only known deterministic systematic t-deletion correcting code constructions with rate approaching 1 achieve O(t log² n) bits of redundancy for constant t, where n is the code length. In this paper, we propose a systematic t-deletion correcting code construction that achieves 4t log n + o(log n) bits of redundancy, which is asymptotically within a factor of 4 from being optimal. Our encoding and decoding algorithms have complexity O(n^(2t+1)), which is polynomial for constant t.https://authors.library.caltech.edu/records/5v2ns-fsm86Optimal Codes for the q-ary Deletion Channel
https://resolver.caltech.edu/CaltechAUTHORS:20200831-150933262
Authors: Sima, Jin; Gabrys, Ryan; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9174241
The problem of constructing optimal multiple deletion correcting codes has long been open until recent break-through for binary cases. Yet comparatively less progress was made in the non-binary counterpart, with the only rate one non-binary deletion codes being Tenengolts' construction that corrects single deletion. In this paper, we present several q-ary t-deletion correcting codes of length n that achieve optimal redundancy up to a factor of a constant, based on the value of the alphabet size q. For small q, our constructions have O(n^(2t) q^t) encoding/decoding complexity. For large q, we take a different approach and the construction has polynomial time complexity.https://authors.library.caltech.edu/records/df556-wb795Coding for Optimized Writing Rate in DNA Storage
https://resolver.caltech.edu/CaltechAUTHORS:20200511-090541146
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/ISIT44484.2020.9174253
A method for encoding information in DNA sequences is described. The method is based on the precisionresolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writing rate. Additionally, the encoding scheme studied here takes into account the existence of multiple copies of the DNA sequence, which are independently distorted. Finally, quantizers for various run-length distributions are designed.https://authors.library.caltech.edu/records/vsvcw-7sn61CodNN – Robust Neural Networks From Coded Classification
https://resolver.caltech.edu/CaltechAUTHORS:20200427-091804171
Authors: Raviv, Netanel; Jain, Siddharth; Upadhyaya, Pulakesh; Bruck, Jehoshua; Jiang, Anxiao (Andrew)
Year: 2020
DOI: 10.1109/ISIT44484.2020.9174480
Deep Neural Networks (DNNs) are a revolutionary force in the ongoing information revolution, and yet their intrinsic properties remain a mystery. In particular, it is widely known that DNNs are highly sensitive to noise, whether adversarial or random. This poses a fundamental challenge for hardware implementations of DNNs, and for their deployment in critical applications such as autonomous driving.In this paper we construct robust DNNs via error correcting codes. By our approach, either the data or internal layers of the DNN are coded with error correcting codes, and successful computation under noise is guaranteed. Since DNNs can be seen as a layered concatenation of classification tasks, our research begins with the core task of classifying noisy coded inputs, and progresses towards robust DNNs.We focus on binary data and linear codes. Our main result is that the prevalent parity code can guarantee robustness for a large family of DNNs, which includes the recently popularized binarized neural networks. Further, we show that the coded classification problem has a deep connection to Fourier analysis of Boolean functions.In contrast to existing solutions in the literature, our results do not rely on altering the training process of the DNN, and provide mathematically rigorous guarantees rather than experimental evidence.https://authors.library.caltech.edu/records/psvjm-vmv70On Coding over Sliced Information
https://resolver.caltech.edu/CaltechAUTHORS:20210315-103437288
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/tit.2021.3063709
The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal up to constants. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is order-wise equivalent to the amount required in the classical error correcting paradigm.https://authors.library.caltech.edu/records/w65dp-23509On Optimal k-Deletion Correcting Codes
https://resolver.caltech.edu/CaltechAUTHORS:20201008-083807800
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/TIT.2020.3028702
Levenshtein introduced the problem of constructing k-deletion correcting codes in 1966, proved that the optimal redundancy of those codes is O(k log N) for constant k, and proposed an optimal redundancy single-deletion correcting code (using the so-called VT construction). However, the problem of constructing optimal redundancy k-deletion correcting codes remained open. Our key contribution is a major step towards a complete solution to this longstanding open problem for constant k. We present a k-deletion correcting code that has redundancy 8k log N + o(log N) when k = o(√log log N) and encoding/decoding algorithms of complexity O(n^(2k+1)).https://authors.library.caltech.edu/records/2b9rp-5tt90Synthesizing New Expertise via Collaboration
https://resolver.caltech.edu/CaltechAUTHORS:20210624-214158214
Authors: Mazaheri, Bijan; Jain, Siddharth; Bruck, Jehoshua
Year: 2021
Consider a set of classes and an uncertain input. Suppose, we do not have access to data and only have knowledge of perfect experts between a few classes in the set. What constitutes a consistent set of opinions? How can we use this to predict the opinions of experts on missing sub-domains? In this paper, we define a framework to analyze this problem. In particular, we define an expert graph where vertices represent classes and edges represent binary experts on the topics of their vertices. We derive necessary conditions for an expert graph to be valid. Further, we show that these conditions are also sufficient if the graph is a cycle, which can yield unintuitive results. Using these conditions, we provide an algorithm to obtain upper and lower bounds on the weights of unknown edges in an expert graph.https://authors.library.caltech.edu/records/bkec1-w3e38Robust Correction of Sampling Bias Using Cumulative Distribution Functions
https://resolver.caltech.edu/CaltechAUTHORS:20210624-211933517
Authors: Mazaheri, Bijan; Jain, Siddharth; Bruck, Jehoshua
Year: 2021
Varying domains and biased datasets can lead to differences between the training and the target distributions, known as covariate shift. Current approaches for alleviating this often rely on estimating the ratio of training and target probability density functions. These techniques require parameter tuning and can be unstable across different datasets. We present a new method for handling covariate shift using the empirical cumulative distribution function estimates of the target distribution by a rigorous generalization of a recent idea proposed by Vapnik and Izmailov. Further, we show experimentally that our method is more robust in its predictions, is not reliant on parameter tuning and shows similar classification performance compared to the current state-of-the-art techniques on synthetic and real datasets.https://authors.library.caltech.edu/records/dp1g8-cv171Trace Reconstruction with Bounded Edit Distance
https://resolver.caltech.edu/CaltechAUTHORS:20210624-215307865
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2021
The trace reconstruction problem studies the number of noisy samples needed to recover an unknown string x ∈ {0, 1}^n with high probability, where the samples are independently obtained by passing x through a random deletion channel with deletion probability p. The problem is receiving significant attention recently due to its applications in DNA sequencing and DNA storage. Yet, there is still an exponential gap between upper and lower bounds for the trace reconstruction problem. In this paper we study the trace reconstruction problem when x is confined to an edit distance ball of radius k, which is essentially equivalent to distinguishing two strings with edit distance at most k. It is shown that n
O(k) samples suffice to achieve this task with high probability.https://authors.library.caltech.edu/records/gyvxr-1zd71Neural Networks Computations with DOMINATION Functions
https://resolver.caltech.edu/CaltechAUTHORS:20210624-214748102
Authors: Kilic, Kordag Mehmet; Bruck, Jehoshua
Year: 2021
We study a new representation of neural networks based on DOMINATION functions. Specifically, we show that a threshold function can be computed by its variables connected via an unweighted bipartite graph to a universal gate computing a DOMINATION function. The DOMINATION function consists of fixed weights that are ascending powers of 2. We derive circuit-size upper and lower bounds for circuits with small weights that compute DOMINATION functions. Interestingly, the circuit-size bounds are dependent on the sparsity of the
bipartite graph. In particular, functions with sparsity 1 (like the EQUALITY function) can be implemented by small-size constant-weight circuits.https://authors.library.caltech.edu/records/zy6qg-qnq04Trace Reconstruction with Bounded Edit Distance
https://resolver.caltech.edu/CaltechAUTHORS:20211110-153719711
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/isit45174.2021.9518244
The trace reconstruction problem studies the number of noisy samples needed to recover an unknown string x ∈ {0,1}^n with high probability, where the samples are independently obtained by passing x through a random deletion channel with deletion probability q. The problem is receiving significant attention recently due to its applications in DNA sequencing and DNA storage. Yet, there is still an exponential gap between upper and lower bounds for the trace reconstruction problem. In this paper we study the trace reconstruction problem when x is confined to an edit distance ball of radius k, which is essentially equivalent to distinguishing two strings with edit distance at most k. It is shown that n^(O(k)) samples suffice to achieve this task with high probability.https://authors.library.caltech.edu/records/2hnx9-xmk44Synthesizing New Expertise via Collaboration
https://resolver.caltech.edu/CaltechAUTHORS:20211110-153150519
Authors: Mazaheri, Bijan; Jain, Siddharth; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/isit45174.2021.9517822
Consider a set of classes and an uncertain input. Suppose, we do not have access to data and only have knowledge of perfect experts between a few classes in the set. What constitutes a consistent set of opinions? How can we use this to predict the opinions of experts on missing sub-domains? In this paper, we define a framework to analyze this problem. In particular, we define an expert graph where vertices represent classes and edges represent binary experts on the topics of their vertices. We derive necessary conditions for an expert graph to be valid. Further, we show that these conditions are also sufficient if the graph is a cycle, which can yield unintuitive results. Using these conditions, we provide an algorithm to obtain upper and lower bounds on the weights of unknown edges in an expert graph.https://authors.library.caltech.edu/records/458ja-2qy93Neural Network Computations with DOMINATION Functions
https://resolver.caltech.edu/CaltechAUTHORS:20211110-155100881
Authors: Kilic, Kordag Mehmet; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/isit45174.2021.9517872
We study a new representation of neural networks based on DOMINATION functions. Specifically, we show that a threshold function can be computed by its variables connected via an unweighted bipartite graph to a universal gate computing a DOMINATION function. The DOMINATION function consists of fixed weights that are ascending powers of 2. We derive circuit-size upper and lower bounds for circuits with small weights that compute DOMINATION functions. Interestingly, the circuit-size bounds are dependent on the sparsity of the bipartite graph. In particular, functions with sparsity 1 (like the EQUALITY function) can be implemented by small-size constant-weight circuits.https://authors.library.caltech.edu/records/93r5d-6xs30Glioblastoma signature in the DNA of blood-derived cells
https://resolver.caltech.edu/CaltechAUTHORS:20211007-150341511
Authors: Jain, Siddharth; Mazaheri, Bijan; Raviv, Netanel; Bruck, Jehoshua
Year: 2021
DOI: 10.1371/journal.pone.0256831
PMCID: PMC8425531
Current approach for the detection of cancer is based on identifying genetic mutations typical to tumor cells. This approach is effective only when cancer has already emerged, however, it might be in a stage too advanced for effective treatment. Cancer is caused by the continuous accumulation of mutations; is it possible to measure the time-dependent information of mutation accumulation and predict the emergence of cancer? We hypothesize that the mutation history derived from the tandem repeat regions in blood-derived DNA carries information about the accumulation of the cancer driver mutations in other tissues. To validate our hypothesis, we computed the mutation histories from the tandem repeat regions in blood-derived exomic DNA of 3874 TCGA patients with different cancer types and found a statistically significant signal with specificity ranging from 66% to 93% differentiating Glioblastoma patients from other cancer patients. Our approach and findings offer a new direction for future cancer prediction and early cancer detection based on information derived from blood-derived DNA.https://authors.library.caltech.edu/records/a4n5r-1f829Generator based approach to analyze mutations in genomic datasets
https://resolver.caltech.edu/CaltechAUTHORS:20200728-093329251
Authors: Jain, Siddharth; Xiao, Xiongye; Bogdan, Paul; Bruck, Jehoshua
Year: 2021
DOI: 10.1038/s41598-021-00609-8
PMCID: PMC8548350
In contrast to the conventional approach of directly comparing genomic sequences using sequence alignment tools, we propose a computational approach that performs comparisons between sequence generators. These sequence generators are learned via a data-driven approach that empirically computes the state machine generating the genomic sequence of interest. As the state machine based generator of the sequence is independent of the sequence length, it provides us with an efficient method to compute the statistical distance between large sets of genomic sequences. Moreover, our technique provides a fast and efficient method to cluster large datasets of genomic sequences, characterize their temporal and spatial evolution in a continuous manner, get insights into the locality sensitive information about the sequences without any need for alignment. Furthermore, we show that the technique can be used to detect local regions with mutation activity, which can then be applied to aid alignment techniques for the fast discovery of mutations. To demonstrate the efficacy of our technique on real genomic data, we cluster different strains of SARS-CoV-2 viral sequences, characterize their evolution and identify regions of the viral sequence with mutations.https://authors.library.caltech.edu/records/vemqv-kzm85Iterative Programming of Noisy Memory Cells
https://resolver.caltech.edu/CaltechAUTHORS:20220104-235424700
Authors: Horovitz, Michal; Yaakobi, Eitan; Gad, Eyal En; Bruck, Jehoshua
Year: 2022
DOI: 10.1109/tcomm.2021.3130660
In this paper, we study a model that mimics the programming operation of memory cells. This model was first introduced by Lastras-Montano et al. for continuous-alphabet channels, and later by Bunte and Lapidoth for discrete memoryless channels (DMC). Under this paradigm we assume that cells are programmed sequentially and individually. The programming process is modeled as transmission over a channel, such that it is possible to read the cell state in order to determine its programming success, and in case of programming failure, to reprogram the cell again. Reprogramming a cell can reduce the bit error rate, however this comes with the price of increasing the overall programming time and thereby affecting the writing speed of the memory. An iterative programming scheme is an algorithm which specifies the number of attempts to program each cell. Given the programming channel and constraints on the average and maximum number of attempts to program a cell, we study programming schemes which maximize the number of bits that can be reliably stored in the memory. We extend the results by Bunte and Lapidoth and study this problem when the programming channel is either discrete-input memoryless symmetric channel (including the BSC,BEC, BI-AWGN) or the Z channel. For the BSC and the BEC our analysis is also extended for the case where the error probabilities on consecutive writes are not necessarily the same. Lastly, we also study a related model which is motivated by the synthesis process of DNA molecules.https://authors.library.caltech.edu/records/7s6dj-4z436On Algebraic Constructions of Neural Networks with Small Weights
https://resolver.caltech.edu/CaltechAUTHORS:20220804-765672000
Authors: Kilic, Kordag Mehmet; Sima, Jin; Bruck, Jehoshua
Year: 2022
DOI: 10.1109/isit50566.2022.9834401
Neural gates compute functions based on weighted sums of the input variables. The expressive power of neural gates (number of distinct functions it can compute) depends on the weight sizes and, in general, large weights (exponential in the number of inputs) are required. Studying the trade-offs among the weight sizes, circuit size and depth is a well-studied topic both in circuit complexity theory and the practice of neural computation. We propose a new approach for studying these complexity trade-offs by considering a related algebraic framework. Specifically, given a single linear equation with arbitrary coefficients, we would like to express it using a system of linear equations with smaller (even constant) coefficients. The techniques we developed are based on Siegel's Lemma for the bounds, anti-concentration inequalities for the existential results and extensions of Sylvester-type Hadamard matrices for the constructions.We explicitly construct a constant weight, optimal size matrix to compute the EQUALITY function (checking if two integers expressed in binary are equal). Computing EQUALITY with a single linear equation requires exponentially large weights. In addition, we prove the existence of the best-known weight size (linear) matrices to compute the COMPARISON function (comparing between two integers expressed in binary). In the context of the circuit complexity theory, our results improve the upper bounds on the weight sizes for the best-known circuit sizes for EQUALITY and COMPARISON.https://authors.library.caltech.edu/records/39fbv-c0b37Expert Graphs: Synthesizing New Expertise via Collaboration
https://resolver.caltech.edu/CaltechAUTHORS:20220804-201308000
Authors: Mazaheri, Bijan; Jain, Siddharth; Bruck, Jehoshua
Year: 2022
DOI: 10.48550/arXiv.2107.07054
Consider multiple experts with overlapping expertise working on a classification problem under uncertain input. What constitutes a consistent set of opinions? How can we predict the opinions of experts on missing sub-domains? In this paper, we define a framework of to analyze this problem, termed "expert graphs." In an expert graph, vertices represent classes and edges represent binary opinions on the topics of their vertices. We derive necessary conditions for expert graph validity and use them to create "synthetic experts" which describe opinions consistent with the observed opinions of other experts. We show this framework to be equivalent to the well-studied linear ordering polytope. We show our conditions are not sufficient for describing all expert graphs on cliques, but are sufficient for cycles.https://authors.library.caltech.edu/records/kd0zb-ppq34Correcting k Deletions and Insertions in Racetrack Memory
https://resolver.caltech.edu/CaltechAUTHORS:20220804-201302433
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2022
DOI: 10.48550/arXiv.2207.08372
One of the main challenges in developing racetrack memory systems is the limited precision in controlling the track shifts, that in turn affects the reliability of reading and writing the data. A current proposal for combating deletions in racetrack memories is to use redundant heads per-track resulting in multiple copies (potentially erroneous) and recovering the data by solving a specialized version of a sequence reconstruction problem. Using this approach, k-deletion correcting codes of length n, with d≥2 heads per-track, with redundancy loglogn+4 were constructed. However, the known approach requires that k≤d, namely, that the number of heads (d) is larger than or equal to the number of correctable deletions (k). Here we address the question: What is the best redundancy that can be achieved for a k-deletion code (k is a constant) if the number of heads is fixed at d (due to implementation constraints)? One of our key results is an answer to this question, namely, we construct codes that can correct k deletions, for any k beyond the known limit of d. The code has 4k log log n + o(log log n) redundancy for k ≤ 2d − 1. In addition, when k ≥ 2d, our codes have 2⌊k/d⌋log n + o (log n) redundancy, that we prove it is order-wise optimal, specifically, we prove that the redundancy required for correcting k deletions is at least ⌊k/d⌋log n + o(log n). The encoding/decoding complexity of our codes is O(n log²ᵏ n). Finally, we ask a general question: What is the optimal redundancy for codes correcting a combination of at most k deletions and insertions in a d-head racetrack memory? We prove that the redundancy sufficient to correct a combination of k deletion and insertion errors is similar to the case of k deletion errors.https://authors.library.caltech.edu/records/ardhk-tej30Timing Analysis of Cyclic Combinational Circuits
https://resolver.caltech.edu/CaltechPARADISE:2004.ETR060.1159
Authors: Reidel, Marc D.; Bruck, Jehoshua
Year: 2023
The accepted wisdom is that combinational circuits must have acyclic (i.e., loop-free or feed-forward) topologies. And yet simple examples suggest that this need not be so. In previous work, we advocated the design of cyclic combinational circuits (i.e., circuits with loops or feedback paths). We proposed a methodology for analyzing and synthesizing such circuits, with an emphasis on the optimization of area.
In this paper, we extend our methodology into the temporal realm. We characterize the true delay of cyclic circuits through symbolic event propagation in the floating mode of operation, according to the up-bounded inertial delay model. We present analysis results for circuits optimized with our program CYCLIFY. Some benchmark circuits were optimized significantly, with simultaneous improvements of up to 10% in the area and 25% in the delay.https://authors.library.caltech.edu/records/we1qr-hkc81