Book Section records
https://feeds.library.caltech.edu/people/Bruck-J/book_section.rss
A Caltech Library Repository Feedhttp://www.rssboard.org/rss-specificationpython-feedgenenThu, 30 Nov 2023 17:51:58 +0000Some new EC/AUED codes
https://resolver.caltech.edu/CaltechAUTHORS:20120524-150825080
Authors: Bruck, Jehoshua; Blaum, Mario
Year: 1989
DOI: 10.1109/FTCS.1989.105568
A novel construction that differs from the traditional way of constructing systematic EC/AUED/(error-correcting/all unidirectional error-detecting) codes is presented. The usual method is to take a systematic t-error-correcting code and then append a tail so that the code can detect more than t errors when they are unidirectional. In the authors' construction, the t-error-correcting code is modified in such a way that the weight distribution of the original code is reduced. The authors then have to add a smaller tail. Frequently they have less redundancy than the best available systematic t-EC/AUED codes.https://authors.library.caltech.edu/records/9e9r1-rdr73Polynomial Threshold Elements?
https://resolver.caltech.edu/CaltechAUTHORS:20120524-151412678
Authors: Bruck, Jehoshua
Year: 1989
DOI: 10.1109/ITW.1989.761437https://authors.library.caltech.edu/records/6aary-38x86Harmonic analysis of neural networks
https://resolver.caltech.edu/CaltechAUTHORS:20120524-090912809
Authors: Bruck, Jehoshua
Year: 1989
DOI: 10.1109/ACSSC.1989.1200767
Neural networks models have attracted a lot of
interest in recent years mainly because there
were perceived as a new idea for computing.
These models can be described as a network in
which every node computes a linear threshold
function. One of the main difficulties in analyzing
the properties of these networks is the fact
that they consist of nonlinear elements. I will
present a novel approach, based on harmonic
analysis of Boolean functions, to analyze neural
networks. In particular I will show how this
technique can be applied to answer the following
two fundamental questions (i) what is the computational
power of a polynomial threshold element
with respect to linear threshold elements?
(ii) Is it possible to get exponentially many spurious
memories when we use the outer-product
method for programming the Hopfield model?https://authors.library.caltech.edu/records/aw7d3-cej07Fast arithmetic computing with neural networks
https://resolver.caltech.edu/CaltechAUTHORS:20120509-132855976
Authors: Siu, Kai-Yeung; Bruck, Jehoshua
Year: 1990
DOI: 10.1109/TENCON.1990.152559
The authors introduce a restricted model of a neuron which is more practical as a model of computation then the classical model of a neuron. The authors define a model of neural networks as a feedforward network of such neurons. Whereas any logic circuit of polynomial size (in n) that computes the product of two n-bit numbers requires unbounded delay, such computations can be done in a neural network with constant delay. The authors improve some known results by showing that the product of two n-bit numbers and sorting of n n-bit numbers can both be computed by a polynomial size neural network using only four unit delays, independent of n . Moreover, the weights of each threshold element in the neural networks require only O(log n)-bit (instead of n-bit) accuracy.https://authors.library.caltech.edu/records/f1tk4-a1h18Polynomial Threshold Functions, AC^0 Functions and Spectral Norms
https://resolver.caltech.edu/CaltechAUTHORS:20120425-065829076
Authors: Bruck, Jehoshua; Smolensky, Roman
Year: 1990
DOI: 10.1109/FSCS.1990.89585
The class of polynomial-threshold functions is studied using harmonic analysis, and the results are used to derive lower bounds related to AC^0 functions. A Boolean function is polynomial threshold if it can be represented as a sign function of a sparse polynomial (one that consists of a polynomial number of terms). The main result is that polynomial-threshold functions can be characterized by means of their spectral representation. In particular, it is proved that a Boolean function whose L_1 spectral norm is bounded by a polynomial in n is a polynomial-threshold function, and that a Boolean function whose L_∞^(-1) spectral norm is not bounded by a polynomial in n is not a polynomial-threshold function. Some results for AC^0 functions are derived.https://authors.library.caltech.edu/records/ebydz-kpv88On the Power of Threshold Circuits with Small Weights
https://resolver.caltech.edu/CaltechAUTHORS:20120424-103938302
Authors: Siu, Kai-Yeung; Bruck, Jehoshua
Year: 1991
DOI: 10.1109/ISIT.1991.695138
Linear threshold elements (LTEs) are the basic processing elements in artificial neural networks. An LTE computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually they can be very big integers-exponential in the number of input variables. However, in practice, it is very difficult to implement big weights. So the natural question that one can ask is whether there is an efficient way to simulate a network of LTEs with big weights by a network of LTEs with small weights. We prove the following results: 1) every LTE with big weights can be simulated by a depth-3, polynomial size network of LTEs with small weights, 2) every depth-d polynomial size network of LTEs with big weights can be simulated by a depth-(2d+1), polynomial size network of LTEs with small weights. To prove these results, we use tools from harmonic analysis of Boolean functions. Our technique is quite general, it provides insights to some other problems. For example, we were able to improve the best known results on the depth of a network of threshold elements that computes the COMPARISON, ADDITION and PRODUCT of two n-bits numbers, and the MAXIMUM and the SORTING of n n-bit numbers.https://authors.library.caltech.edu/records/cfm27-k7g38New Techniques For Constructing EC/AUED Codes
https://resolver.caltech.edu/CaltechAUTHORS:20120418-090627310
Authors: Bruck, Jehoshua; Blaum, Mario
Year: 1991
DOI: 10.1109/ISIT.1991.695194
We present two new techniques for constructing t-EC/AUED codes. The combination of the two techniques reduces the total redundancy of the best constructions by one bit or more in many cases.https://authors.library.caltech.edu/records/10mas-ed427Harmonic Analysis And The Complexity Of Computing With Threshold (Neural) Elements
https://resolver.caltech.edu/CaltechAUTHORS:20120417-094637010
Authors: Bruck, Jehoshua; Smolensky, Roman
Year: 1991
DOI: 10.1109/ISIT.1991.695142
The main purpose of this talk is to introduce a
useful tool for the analysis of discrete neural networks
in which every node is a Boolean threshold
gate. The difficulty in the analysis of neural
networks arises from the fact that the basic
processing elements (linear threshold gates) are
nonlinear. The key idea in harmonic analysis
of threshold functions is to represent the functions
as polynomials over the field of real numbers.
Answering different questions regarding
neural networks becomes equivalent to answering
questions related to the coefficients of these
polynomials. We have applied these techniques
and obtained many interesting and surprising results
[1, 2, 3, 4]. The focus of this talk will
be on presenting a theorem that characterizes-using
spetral norms-the complexity of computing
a Boolean function with threshold circuits
[2, 3]. This result establishes the first known link
between harmonic analysis and the complexity of
computing with neural networks.https://authors.library.caltech.edu/records/34v3g-9e396Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs
https://resolver.caltech.edu/CaltechAUTHORS:ALOisit91
Authors: Alon, Noga; Bruck, Jehoshua; Naor, Joseph; Naor, Moni; Roth, Ron M.
Year: 1991
A new technique, based on the pseudo-random properties of certain graphs, known as expanders, is used to obtain new simple explicit constructions of asymptotically good codes.https://authors.library.caltech.edu/records/erbp5-vqp12Fault-tolerant meshes with minimal numbers of spares
https://resolver.caltech.edu/CaltechAUTHORS:BRUispdp91
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1991
DOI: 10.1109/SPDP.1991.218267
This paper presents several techniques for adding fault-tolerance to distributed memory parallel computers. More formally, given a target graph with n nodes, we create a fault-tolerant graph with n + k nodes such that given any set of k or fewer faulty nodes, the remaining graph is guaranteed to contain the target graph as a fault-free subgraph. As a result, any algorithm designed for the target graph will run with no slowdown in the presence of k or fewer node faults, regardless of their distribution. We present fault-tolerant graphs for target graphs which are 2-dimensional meshes, tori, eight-connected meshes and hexagonal meshes. In all cases our fault-tolerant graphs have smaller degree than any previously known graphs with the same properties.https://authors.library.caltech.edu/records/6475m-mqy50Fault tolerant graphs, perfect hash functions and disjoint paths
https://resolver.caltech.edu/CaltechAUTHORS:ATJfocs92
Authors: Ajtai, M.; Alon, N.; Bruck, J.; Cypher, R.; Ho, C.T.; Naor, M.; Szemerédi, E.
Year: 1992
DOI: 10.1109/SFCS.1992.267781
Given a graph G on n nodes the authors say that a graph T on n + k nodes is a k-fault tolerant version of G, if one can embed G in any n node induced subgraph of T. Thus T can sustain k faults and still emulate G without any performance degradation. They show that for a wide range of values of n, k and d, for any graph on n nodes with maximum degree d there is a k-fault tolerant graph with maximum degree O(kd). They provide lower bounds as well: there are graphs G with maximum degree d such that any k-fault tolerant version of them has maximum degree at least Ω(d√k)https://authors.library.caltech.edu/records/nbkty-71x39Tolerating faults in a mesh with a row of spare nodes
https://resolver.caltech.edu/CaltechAUTHORS:BRUispdp92a
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1992
DOI: 10.1109/SPDP.1992.242768
We present an efficient method for tolerating faults in a two-dimensional mesh architecture. Our approach is based on adding spare components (nodes) and extra links (edges) such that the resulting architecture can be reconfigured as a mesh in the presence of faults. We optimize the cost of the fault-tolerant mesh architecture by adding about one row of redundant nodes in addition to a set of k spare nodes (while tolerating up to k node faults) and minimizing the number of links per node. Our results are surprisingly efficient and seem to be practical for small values of k. The degree of the fault-tolerant architecture is k + 5 for odd k, and k + 6 for even k. Our results can be generalized to d-dimensional meshes such that the number of spare nodes is less than the length of the shortest axis plus k, and the degree of the fault-tolerant mesh is (d-1)k+d+3 when k is odd and (d-1)k+2d+2 when k is even.https://authors.library.caltech.edu/records/5ngvx-9te20Multiple message broadcasting with generalized Fibonacci trees
https://resolver.caltech.edu/CaltechAUTHORS:BRUispdp92b
Authors: Bruck, Jehoshua; Cypher, Robert; Ho, Ching-Tien
Year: 1992
DOI: 10.1109/SPDP.1992.242714
We present efficient algorithms for broadcasting multiple messages. We assume n processors, one of which contains m packets that it must broadcast to each of the remaining n - 1 processors. The processors communicate in rounds. In one round each processor is able to send one packet to any other processor and receive one packet from any other processor. We give a broadcasting algorithm which requires m + log n + 3 log log n + 15 rounds. In addition, we show a simple lower bound of m +[log n] - 1 rounds for broadcasting in this model.https://authors.library.caltech.edu/records/qhpp8-43j98Unordered Error-Correcting Codes and their Applications
https://resolver.caltech.edu/CaltechAUTHORS:20120309-145816573
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1993
DOI: 10.1109/FTCS.1992.243585
We give efficient constructions for error correcting
unordered {ECU) codes, i.e., codes such that any
pair of codewords are at a certain minimal distance
apart and at the same time they are unordered. These
codes are used for detecting a predetermined number
of (symmetric) errors and for detecting all unidirectional
errors. We also give an application in parallel
asynchronous communications.https://authors.library.caltech.edu/records/pka3c-chr23Efficient checkpointing over local area networks
https://resolver.caltech.edu/CaltechAUTHORS:ZIVftpds94
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1994
DOI: 10.1109/FTPDS.1994.494471
Parallel and distributed computing on clusters of workstations is becoming very popular as it provides a cost effective way for high performance computing. In these systems, the bandwidth of the communication subsystem (Using Ethernet technology) is about an order of magnitude smaller compared to the bandwidth of the storage subsystem. Hence, storing a state in a checkpoint is much more efficient than comparing states over the network.
In this paper we present a novel checkpointing approach that enables efficient performance over local area networks. The main idea is that we use two types of checkpoints: compare-checkpoints (comparing the states of the redundant processes to detect faults) and store-checkpoints (where the state is only stored). The store-checkpoints reduce the rollback needed after a fault is detected, without performing many unnecessary comparisons.
As a particular example of this approach we analyzed the DMR checkpointing scheme with store-checkpoints. Our main result is that the overhead of the execution time can be significantly reduced when store-checkpoints are introduced. We have implemented a prototype of the new DMR scheme and run it on workstations connected by a LAN. The experimental results we obtained match the analytical results and show that in some cases the overhead of the DMR checkpointing schemes over LAN's can be improved by as much as 20%.https://authors.library.caltech.edu/records/1sc9z-98g70Analysis of checkpointing schemes for multiprocessor systems
https://resolver.caltech.edu/CaltechAUTHORS:ZIVreldis94
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1994
DOI: 10.1109/RELDIS.1994.336909
Parallel computing systems provide hardware redundancy that helps to achieve low cost fault-tolerance, by duplicating the task into more than a single processor, and comparing the states of the processors at checkpoints. This paper suggests a novel technique, based on a Markov Reward Model (MRM), for analyzing the performance of checkpointing schemes with task duplication. We show how this technique can be used to derive the average execution time of a task and other important parameters related to the performance of checkpointing schemes. Our analytical results match well the values we obtained using a simulation program. We compare the average task execution time and total work of four checkpointing schemes, and show that generally increasing the number of processors reduces the average execution time, but increases the total work done by the processors. However, in cases where there is a big difference between the time it takes to perform different operations, those results can change.https://authors.library.caltech.edu/records/f7ez8-2s723PCODE: an efficient and reliable collective communication protocol for unreliable broadcast domain
https://resolver.caltech.edu/CaltechAUTHORS:BRUipps95
Authors: Bruck, Jehoshua; Dolev, Danny; Ho, Ching-Tien; Orni, Rimon; Strong, Ray
Year: 1995
DOI: 10.1109/IPPS.1995.395924
Existing programming environments for clusters are typically built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. For example, a broadcast that is implemented using a TCP/IP protocol (which is a point-to-point protocol) over a LAN is obviously inefficient as it is not utilizing the fact that the LAN is a broadcast medium. We have observed that the main difference between a distributed computing paradigm and a message passing parallel computing paradigm is that, in a distributed environment the activity of every processor is independent while in a parallel environment the collection of the user-communication layers in the processors can be modeled as a single global program. We have formalized the requirements by defining the notion of a correct global program. This notion provides a precise specification of the interface between the transport layer and the user-communication layer. We have developed PCODE, a new communication protocol that is driven by a global program and proved its correctness.
We have implemented the PCODE protocol on a collection of IBM RS/6000 workstations and on a collection of Silicon Graphics Indigo workstations, both communicating via UDP broadcast. The experimental results we obtained indicate that the performance advantage of PCODE over the current point-to-point approach (TCP) can be as high as an order of magnitude on a cluster of 16 workstations.https://authors.library.caltech.edu/records/2rat0-mbt87Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations
https://resolver.caltech.edu/CaltechAUTHORS:20160811-162638038
Authors: Bruck, Jehoshua; Dolev, Danny; Ho, Ching-Tien; Roşu, Marcel-Cătălin
Year: 1995
DOI: 10.1145/215399.215421
Parallel computing on clusters of workstations and personal
computers has very high potential, since it leverages existing hardware and software. Parallel programming environments offer the user a convenient way to express parallel computation and communication. In fact, recently, a Message Passing Interface (MPI) has been proposed as an industrial standard for writing "portable" message-passing parallel programs. The communication part of MPI consists of
the usual point-to-point communication as well as collective
communication. However, existing implementations of programming environments for clusters are built on top of a
point-to-point communication layer (send and receive) over
local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part.
In this paper, we present an efficient design and implementation of the collective communication part in MPI that is optimized for clusters of workstations. Our system consists of two main components: the MPI-CCL layer that includes the collective communication functionality of MPI
and a User-level Reliable Transport Protocol (URTP) that
interfaces with the LAN Data-link layer and leverages the
fact that the LAN is a broadcast medium. Our system is
integrated with the operating system via an efficient kernel
extension mechanism that we developed. The kernel
extension significantly improves the performance of our implementation as it can handle part of the communication
overhead without involving user space.
We have implemented our system on a collection of IBM
RS/6000 workstations connected via a lOMbit Ethernet LAN.
Our performance measurements are taken from typical scientific programs that run in a parallel mode by means of
the MPI. The hypothesis behind our design is that system's
performance will be bounded by interactions between the
kernel and user space rather than by the bandwidth delivered
by the LAN Data-Link Layer. Our results indicate that
the performance of our MPI Broadcast (on top of Ethernet)
is about twice as fast as a recently published software implementation of broadcast on top of ATM.https://authors.library.caltech.edu/records/y0fve-83x68MDS Array Codes with Independent Parity Symbols
https://resolver.caltech.edu/CaltechAUTHORS:20120216-070547466
Authors: Blaum, Mario; Bruck, Jehoshua; Vardy, Alexander
Year: 1995
DOI: 10.1109/ISIT.1995.535761
A new family of maximum distance separable (MDS) array codes is presented. The code arrays contain p information columns and r independent parity columns, where p is a prime. We give necessary and sufficient conditions for our codes to be MDS, and then prove that if p belongs to a certain class of primes these conditions are satisfied up to r⩽8. We also develop efficient decoding procedures for the case of two and three column errors, and any number of column erasures. Finally, we present upper and lower bounds on the average number of parity bits which have to be updated in an MDS code over GF(2^m), following an update in a single information bit. We show that the upper bound obtained from our codes is close to the lower bound and does not depend on the size of the code symbols.https://authors.library.caltech.edu/records/rpfxf-ddw17On Neural Networks with Minimal Weights
https://resolver.caltech.edu/CaltechAUTHORS:20160223-114401229
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 1996
Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential
in the number of the input variables. However, in
practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights. The main contribution of
this paper is to fill up the gap by further refining that separation. Namely, we prove that the class of linear threshold functions with polynomial-size weights can be divided into subclasses according to the degree of the polynomial. In fact, we prove a more general result- that there exists a minimal weight linear threshold function
for any arbitrary number of inputs and any weight size. To prove those results we have developed a novel technique for constructing linear threshold functions with minimal weights.https://authors.library.caltech.edu/records/fn4q8-j9y21An on-line algorithm for checkpoint placement
https://resolver.caltech.edu/CaltechAUTHORS:ZIVissre96
Authors: Ziv, Avi; Bruck, Jehoshua
Year: 1996
DOI: 10.1109/ISSRE.1996.558869
Checkpointing is a common technique for reducing the time to recover from faults in computer systems. By saving intermediate states of programs in a reliable storage, checkpointing enables to reduce the lost processing time caused by faults. The length of the intervals between checkpoints affects the execution time of programs. Long intervals lead to long re-processing time, while too frequent checkpointing leads to high checkpointing overhead. In this paper we present an on-line algorithm for placement of checkpoints. The algorithm uses on-line knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. We show how the execution time of a program using this algorithm can be analyzed. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only on-line knowledge about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost.https://authors.library.caltech.edu/records/mn9hj-yks12On Optimal Placements of Processors in Tori Networks
https://resolver.caltech.edu/CaltechAUTHORS:20120207-113452642
Authors: Blaum, Mario; Bruck, Jehoshua; Pifarré, Gustavo D.; Sanz, Jorge L. C.
Year: 1996
DOI: 10.1109/SPDP.1996.570382
Two and three dimensional k-tori are among the most used topologies in the designs of new parallel computers. Traditionally (with the exception of the Tera parallel computer), these networks have been used as fully-populated networks, in the sense that every routing node in the topology is subjected to message injection. However, fully populated tori and meshes exhibit a theoretical throughput which degrades as the network size increases. In contrast, multistage networks (that are partially populated) scale well with the network size. Introducing slackness in fully populated tori, i.e., reducing the number of processors, and studying optimal routing strategies for the resulting interconnections are the central subjects of the paper. The key concept is the placement of the processors in a network together with a routing algorithm between them, where a placement is the subset of the nodes in the interconnection network that are attached to processors. The main contribution is the construction of optimal placements for d-dimensional k-tori networks, of sizes k and k^2 and the corresponding routing algorithms for the cases d=2 and d=3, respectively.https://authors.library.caltech.edu/records/xvv2d-73504Array Codes for Correction of Criss-Cross Errors
https://resolver.caltech.edu/CaltechAUTHORS:20120119-113410577
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1997
DOI: 10.1109/ISIT.1997.613349
We present MDS array codes of size (p-1)×(p-1), p is a prime number, that can correct any row or column in error without a priori knowledge of what type of error that has occurred. The complexity of the encoding and decoding algorithms is lower than that of known codes with the same error-correcting power, since our algorithms are based on exclusive-OR operations over lines of different slopes, as opposed to algebraic operations over a finite field.https://authors.library.caltech.edu/records/npsne-szw82Partial-sum queries in OLAP data cubes using covering codes
https://resolver.caltech.edu/CaltechAUTHORS:20161103-134218465
Authors: Ho, Ching-Tien; Bruck, Jehoshua; Agrawal, Rakesh
Year: 1997
DOI: 10.1145/263661.263686
A partial-sum query obtains the summation over a set of specified cells of a data cube. We establish a connection between the covering problem in the theory of covering codes and the partial-sum problem and use this connection to devise algorithms for the partial-sum problem with efficient space-time trade-offs. For example, using our algorithms, with 44% additional storage, the query response time can be improved by about 12%; by roughly doubling the storage requirement, the query response time can be improved by about 34%.https://authors.library.caltech.edu/records/5asp5-sky26Two-dimensional interleaving schemes with repetitions
https://resolver.caltech.edu/CaltechAUTHORS:20120119-135511706
Authors: BIaum, Mario; Bruck, Jehoshua; Farrell, Patrick G.
Year: 1997
DOI: 10.1109/ISIT.1997.613272
We present 2-dimensional interleaving schemes, with repetition, for correcting 2-dimensional bursts (or clusters) of errors, where a cluster of errors is characterized by its area. Known interleaving schemes are based on arrays of integers with the property that every connected component of area t consists of distinct integers. Namely, they are based on the use of 1-error-correcting codes. We extend this concept by allowing repetitions within the arrays, hence, providing a trade-off between the error-correcting capability of the codes and the degree of the interleaving schemes.https://authors.library.caltech.edu/records/cn0x9-6er65Programmable neural logic
https://resolver.caltech.edu/CaltechAUTHORS:BOHiciss97
Authors: Bohossian, Vasken; Hasler, Paul; Bruck, Jehoshua
Year: 1997
DOI: 10.1109/ICISS.1997.630242
Circuits of threshold elements ( Boolean input, Boolean output neurons ) have been shown to be surprisingly powerful. Useful functions such as XOR, ADD and MULTIPLY can be implemented by such circuits more efficiently than by traditional AND/OR circuits. In view of that, we have designed and built a programmable threshold element. The weights are stored on polysilicon floating gates, providing long-term retention without refresh. The weight value is increased using tunneling and decreased via hot electron injection. A weight is stored on a single transistor allowing the development of dense arrays of threshold elements. A 16-input programmable neuron was fabricated in the standard 2 μm double - poly, analog process available from MOSIS. A long term goal of this research is to incorporate programmable threshold elements, as building blocks in Field Programmable Gate Arrays.https://authors.library.caltech.edu/records/6fmjr-vpz17Multiple Threshold Neural Logic
https://resolver.caltech.edu/CaltechAUTHORS:20160224-141437128
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 1998
We introduce a new Boolean computing element related to the Linear Threshold element, which is the Boolean version of the neuron. Instead of the sign function, it computes an arbitrary (with polynomialy many transitions) Boolean function of the weighted sum of its inputs. We call the new computing element an LT M element, which stands for Linear Threshold with Multiple transitions.
The paper consists of the following main contributions related to our study of LTM circuits: (i) the creation of efficient designs of LTM circuits for the addition of a multiple number of integers and the product of two integers. In particular, we show how to compute
the addition of m integers with a single layer of LT M elements. (ii) a proof that the area of the VLSI layout is reduced from O(n^2) in LT circuits to O(n) in LTM circuits, for n inputs symmetric Boolean functions, and (iii) the characterization of the computing power of LT M relative to LT circuits.https://authors.library.caltech.edu/records/tb591-xtq92Fault-tolerant switched local area networks
https://resolver.caltech.edu/CaltechAUTHORS:20111215-115455804
Authors: LeMahieu, Paul; Bohossian, Vasken; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/IPPS.1998.670011
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal computers, workstations and interconnect technologies. In particular the issue of reliable communication is addressed by introducing redundancy in the form of multiple network interfaces per compute node. When using compute nodes with multiple network connections the question of how to best connect these nodes to a given network of switches arises. We examine networks of switches (e.g. based on Myrinet technology) and focus on degree-two compute nodes (two network adaptor cards per node). Our primary goal is to create networks that are as resistant as possible to partitioning. Our main contributions are: (i) a construction for degree-2 compute nodes connected by a ring network of switches of degree 4 that can tolerate any 3 switch failures without partitioning the nodes into disjoint sets; (ii) a proof that this construction is optimal in the sense that no construction can tolerate more switch failures while avoiding partitioning; and (ii) generalizations of this construction to arbitrary switch and node degrees and to other switch networks, in particular to a fully-connected network of switches.https://authors.library.caltech.edu/records/8ev4h-nkh42A consistent history link connectivity protocol
https://resolver.caltech.edu/CaltechAUTHORS:20161122-142619200
Authors: LeMahieu, Paul; Bruck, Jehoshua
Year: 1998
DOI: 10.1145/277697.277757
Given the prevalence of powerful personal workstations
connected over local area networks, it is only natural that
people are exploring distributed computing over such systems. Whenever systems become distributed the issue of
fault tolerance becomes an important consideration. In the
context of the RAIN project (Reliable Arrays of Independent
Nodes) at Caltech, we have been looking into fault tolerance
in several elements of the distributed system. One
important aspect of this is the introduction of fault tolerance into the communication system by introducing redundant network elements and redundant network interfaces.https://authors.library.caltech.edu/records/a7akm-zzc65Efficient digital to analog encoding
https://resolver.caltech.edu/CaltechAUTHORS:20111215-111208543
Authors: Gibson, Michael; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/ISIT.1998.708930
An important issue in analog circuit design is the problem of digital to analog conversion, namely, the encoding of Boolean variables into a single analog value which contains enough information to reconstruct the values of the Boolean variables. Wegener (1996) proved that [3n-1/2] 2-input arithmetic gates are necessary and sufficient for implementing the encoding function of n Boolean variables. However, the proof of the upper bound is not constructive. We present an explicit construction of a digital to analog encoder that is optimal in the number of 2-input arithmetic gates.https://authors.library.caltech.edu/records/ffgb2-xdf10Coding for skew correcting and detecting in parallel asynchronous communications
https://resolver.caltech.edu/CaltechAUTHORS:20120112-112036174
Authors: Blaum, Mario; Bruck, Jehoshua
Year: 1998
DOI: 10.1109/ISIT.1998.708659
We study the problem of pipelined transmission in parallel asynchronous communications allowing a certain amount of skew. We redefine the concept of skew in a way that extends previously known results in this area. Using the new definition of skew, we derive the necessary and sufficient conditions for codes that can tolerate a certain amount of skew and detect a larger amount of skew when the tolerating threshold is exceeded.https://authors.library.caltech.edu/records/n6axc-apn80Low density MDS codes and factors of complete graphs
https://resolver.caltech.edu/CaltechAUTHORS:XULisit98
Authors: Xu, Lihao; Bohossian, Vasken; Bruck, Jehoshua; Wagner, David G.
Year: 1998
DOI: 10.1109/ISIT.1998.708599
We reveal an equivalence relation between the construction of a new class of low density MDS array codes, that we call B-Code, and a combinatorial problem known as perfect one-factorization of complete graphs. We use known perfect one-factors of complete graphs to create constructions and decoding algorithms for both B-Code and its dual code. B-Code and its dual are optimal in the sense that (i) they are MDS, (ii) they have an optimal encoding property, i.e., the number of the parity bits that are affected by change of a single information bit is minimal and (iii) they have optimal length. The existence of perfect one-factorizations for every complete graph with an even number of nodes is a 35 years long conjecture in graph theory. The construction of B-codes of arbitrary odd length will provide an affirmative answer to the conjecture.https://authors.library.caltech.edu/records/21bjn-9hj28Highly available distributed storage systems
https://resolver.caltech.edu/CaltechAUTHORS:20200709-082541313
Authors: Xu, Lihao; Bruck, Jehoshua
Year: 1999
DOI: 10.1007/bfb0110096
Information is generated, processed, transmitted and stored in various forms: text, voice, image, video and multimedia types. Here all these forms will be treated as general data. As the need for data increases exponentially with the passage of time and the increase of computing power, data storage becomes more and more important. From scientific computing to business transactions, data is the most precious part. How to store the data reliably and efficiently is the essential issue, that is the focus of this chapter.https://authors.library.caltech.edu/records/ssd41-n0y60A Consistent History Link Connectivity Protocol
https://resolver.caltech.edu/CaltechAUTHORS:20111207-091413566
Authors: LeMahieu, Paul; Bruck, Jehoshua
Year: 1999
DOI: 10.1109/IPPS.1999.760448
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating reliable distributed systems by leveraging commercially available personal computers and interconnect technologies. Fault-tolerance is introduced into the communication infrastructure by using multiple network interfaces per compute node. When using multiple network connections per compute node, the question of how to monitor connectivity between nodes arises. We examine a connectivity protocol that guarantees that each side of a point-to-point connection sees the same history of activity over the communication channel. In other words, we maintain a consistent history of the state of the channel. The history of channel-state is guaranteed to be identical at each endpoint within some bounded slack. Our main contributions are: (i) a simple, stable protocol for monitoring connectivity that maintains a consistent history with bounded slack, and (ii) proofs that this protocol exhibits correctness, bounded slack, and stability.https://authors.library.caltech.edu/records/8k1s5-da382Tolerating Faults in Counting Networks
https://resolver.caltech.edu/CaltechAUTHORS:20190830-101628653
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2000
DOI: 10.1007/978-1-4615-4549-1_12
Counting networks were proposed by Aspnes, Herlihy and Shavit [3] as a low-contention concurrent data structure for multiprocessor coordination. We address the issue of tolerating faults in counting networks. In our fault model, balancer objects experience responsive crash failures: they behave correctly until they fail, and thereafter they are inaccessible. We propose two methods for tolerating such faults. The first is based on a construction of a k-fault-tolerant balancer with 2(K + 1) bits of memory. All balancers in a counting network are replaced by fault-tolerant ones. Thus, a counting network with depth O(log2 n), where n is the width, is transformed into a k-fault-tolerant counting network with depth O(k log^2 n).
We also consider the case where inaccessible balancers can be remapped to spare balancers. We present a bound on the error in the output token distribution of counting networks with remapped faulty balancers (a generalization of the error bound for sorting networks with faulty comparators presented by Yao & Yao [10]).
Our second method for tolerating faults is based on the construction of a correction network. Given a token distribution with a bounded error, the correction network produces a token distribution that is smooth (i.e., the number of tokens on each output wire differs by at most one — a weaker condition than the step property of counting networks). The correction network is constructed with fault-tolerant balancers. It is appended to a counting network in which faulty balancers are remapped to spare balancers. In order to tolerate k faults, the correction network has depth 2k(k + l)(log n + 1), for a network of width n. Therefore, this method results in a network with a smaller depth provided that O(k) < O(log n). However, it is only applicable if it is possible to remap faulty balancers.https://authors.library.caltech.edu/records/gd1gn-tsn15On the Possibility of Group Membership Protocols
https://resolver.caltech.edu/CaltechAUTHORS:20200127-124216616
Authors: Franceschetti, Massimo; Bruck, Jehoshua
Year: 2000
DOI: 10.1007/978-1-4615-4549-1_4
Chandra et al. [5] showed that the group membership problem cannot be solved in asynchronous systems with crash failures. We identify the main assumptions required for their proof and show how to circumvent this impossibility result building a weaker, yet non trivial specification. We provide an algorithm that solves this specification and show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification.https://authors.library.caltech.edu/records/cmjqd-6vz06Computing in the RAIN: A Reliable Array of Independent Nodes
https://resolver.caltech.edu/CaltechAUTHORS:20190828-102317828
Authors: Bohossian, Vasken; Fan, Charles C.; LeMahieu, Paul S.; Riedel, Marc D.; Xu, Lihao; Bruck, Jehoshua
Year: 2000
DOI: 10.1007/3-540-45591-4_167
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through softw are-implemented fault tolerance, the system tolerates multiplenode, link, and switch failures, with no single point of failure. The RAIN technology has been transfered to RAIN finity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures; 2) fault management techniques based on group membership; and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: highly available video and web servers, and a distributed checkpointing system.https://authors.library.caltech.edu/records/20fm6-qhq29Splitting the Scheduling Headache
https://resolver.caltech.edu/CaltechAUTHORS:20111117-110955967
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2000
DOI: 10.1109/ISIT.2000.866787
The broadcast disk provides an effective way to transmit information from a server to many clients. Information is broadcast cyclically and clients pick the information they need out of the broadcast. An example of such a system is a wireless Web service where Web servers broadcast to browsing clients. Work has been done to schedule the information broadcast so as to minimize the expected waiting time of the clients. This work has treated the information as indivisible blocks that are transmitted in their entirety. We propose a new way to schedule the broadcast of information, which involves splitting items into smaller sub-items, which need not be broadcast consecutively. This relaxes the restrictions on scheduling and allows for better schedules. We look at the case of two items of the same length, each split into two halves, and show that we can achieve optimal performance by choosing the appropriate schedule from a small set of scheduleshttps://authors.library.caltech.edu/records/2bykj-fba55The Raincore Distributed Session Service for Networking Elements
https://resolver.caltech.edu/CaltechAUTHORS:20111110-152519883
Authors: Fan, Chenggong Charles; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/IPDPS.2001.925154
Motivated by the explosive growth of the Internet, we study efficient and fault-tolerant distributed session layer
protocols for networking elements. These protocols are
designed to enable a network cluster to share the state
information necessary for balancing network traffic and
computation load among a group of networking elements.
In addition, in the presence of failures, they allow
network traffic to fail-over from failed networking
elements to healthy ones. To maximize the overall
network throughput of the networking cluster, we assume a unicast communication medium for these protocols. The Raincore Distributed Session Service is based on a fault-tolerant token protocol, and provides group membership, reliable multicast and mutual exclusion services in a networking environment. We show that this service provides atomic reliable multicast with consistent ordering. We also show that Raincore token protocol consumes less overhead than a broadcast-based protocol in this environment in terms of CPU task-switching. The Raincore technology was transferred to Rainfinity, a startup company that is focusing on software for Internet reliability and performance. Rainwall, Rainfinity's first product, was developed using the Raincore Distributed Session Service. We present initial performance results of the Rainwall product that validates our design assumptions and goals.https://authors.library.caltech.edu/records/2hcmq-02r44Time Division is Better Than Frequency Division for Periodic Internet Broadcast of Dynamic Data
https://resolver.caltech.edu/CaltechAUTHORS:20111117-083940997
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2001
DOI: 10.1109/ISIT.2001.936021
We consider two ways to send items over a broadcast channel and compare them using the metric of expected waiting time. The first is frequency division, where each item is broadcast on its own subchannel of lower bandwidth. We find the optimal allocation of bandwidth to the subchannels for this method. Then we look at time division, where items are sent sequentially on a single full-bandwidth channel. We show that for any frequency division broadcast schedule, we can find a better time division schedule. Thus time division is better than frequency division.https://authors.library.caltech.edu/records/gx3g7-23g16Robustness of Time-Division Schedules for Internet Broadcast
https://resolver.caltech.edu/CaltechAUTHORS:20111102-132442999
Authors: Foltz, Kevin; Bruck, Jehoshua
Year: 2002
DOI: 10.1109/ISIT.2002.1023655
The model we consider consists of a server and many clients. The clients have a large incoming bandwidth and little or no outgoing bandwidth. The server repeatedly broadcasts information through the air to the clients. There are two information items with lengths l_1 and l_2, and demand probabilities p_1 and p_2. The demand probability of an item is simply the relative frequency of requests for that item by the clients, scaled such that the sum of the p_i's is 1. These items contain static data. This allows us to receive data out of order and use parts of different broadcasts to reassemble items. The metric we use to evaluate broadcast schedules is expected waiting time. This is the expected time a client must wait for an item, averaged over all items and clients, with weight p_i for item i.https://authors.library.caltech.edu/records/yj1sf-v7v74Power requirements for connectivity in clustered wireless networks
https://resolver.caltech.edu/CaltechAUTHORS:20111102-090203478
Authors: Booth, L.; Bruck, J.; Franceschetti, M.; Meester, R.
Year: 2002
DOI: 10.1109/ISIT.2002.1023625
We consider wireless networks in which a subset of the nodes provide coverage to clusters of clients and route data packets from source to destination. We generalize previous work of Gilbert (1961), deriving conditions on the communication range of the nodes and on the placement of the covering stations to provide, with probability one, some long distance multi-hop communication. One key result is that the network can almost surely (as.) provide some
long distance multi-hop communication, regardless of
the algorithm used to place the covering stations, if
the density of the clients is high enough and their
communication range is less than half the communication
range of the base stations. As the ratio between the two communication ranges becomes greater than half, a malicious covering algorithm that never provides long distance, multi-hop communication in the network, exists even if we constrain the base station to be placed at the vertices of a fixed grid-which is the typical scenario in the case of commercial networks.https://authors.library.caltech.edu/records/xhx3n-c9340Interval modulation coding
https://resolver.caltech.edu/CaltechAUTHORS:20111027-155844420
Authors: Mukhtar, Saleem; Bruck, Jehoshua
Year: 2002
DOI: 10.1109/ISIT.2002.1023599
We propose a new modulation scheme and a new architecture for the design of communication and storage systems. The modulation scheme is based on modulating pulse width and the architecture is based on time measurement circuitry.https://authors.library.caltech.edu/records/y20w9-mmk20Diversity Coloring for information storage in networks
https://resolver.caltech.edu/CaltechAUTHORS:20111019-133944822
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2002
DOI: 10.1109/ISIT.2002.1023653
We propose a new file placement scheme using MDS codes, and formulate it as the diversity coloring Problem. We then present an optimal diversity coloring algorithm for trees.https://authors.library.caltech.edu/records/kbvpn-at979The synthesis of cyclic combinational circuits
https://resolver.caltech.edu/CaltechAUTHORS:20111012-143707754
Authors: Riedel, Marc D.; Bruck, Jehoshua
Year: 2003
Digital circuits are called combinational if they are memoryless: they have outputs that depend only on the current values of the inputs. Combinational circuits are generally thought of as acyclic (i.e., feed-forward) structures. And yet, cyclic circuits can be combinational. Cycles sometimes occur in designs synthesized from high-level descriptions. Feedback in such cases is carefully contrived, typically occurring when functional units axe connected in a cyclic topology. Although the premise of cycles in combinational circuits has been accepted, and analysis techniques have been proposed, no one has attempted the synthesis of circuits with feedback at the logic level.
We propose a general methodology for the synthesis of multilevel combinational circuits with cyclic topologies. Our approach is to introduce feedback in the substitution / minimization phase, optimizing a multilevel network description for area. In trials with benchmark circuits, many were optimized significantly, with improvements of up to 30% in the area.
We argue the case for radically rethinking the concept of "combinational" in circuit design: we should no longer think of combinational logic as acyclic in theory or in practice, since nearly all combinational circuits are best designed with cycles.https://authors.library.caltech.edu/records/0pjae-t9h73Optimal Content Placement for En-Route Web Caching
https://resolver.caltech.edu/CaltechPARADISE:ETR050
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2003
DOI: 10.1109/NCA.2003.1201132
This paper studies the optimal placement of web files for en-route web caching. It is shown that existing placement policies are all solving restricted partial problems of the file placement problem, and therefore give only sub-optimal solutions. A dynamic programming algorithm of low complexity which computes the optimal solution is presented. It is shown both analytically and experimentally that the file-placement solution output by our algorithm outperforms existing en-route caching policies. The optimal placement of web files can be implemented with a reasonable level of cache coordination and management overhead for en-route caching; and importantly, it can be achieved with or without using data prefetching.https://authors.library.caltech.edu/records/t4bc1-d9a83Coding and scheduling for efficient loss-resilient data broadcasting
https://resolver.caltech.edu/CaltechAUTHORS:20111005-113409987
Authors: Foltz, Kevin; Xu, Lihao; Bruck, Jehoshua
Year: 2003
DOI: 10.1109/ISIT.2003.1228430
We examine the problem of sending data to clients over a broadcast channel in a way that minimizes the clients' expected waiting time for this data. This channel, however, is not completely reliable, and packets are occasionally lost. If items consist of k packets, k large, the loss of even a single packet can increase the expected waiting time by 167%. We propose and analyze two solutions that use coding to reduce this degradation. The resulting degradation is 67% for the first solution and less than 1% for the second. The second solution is extended to combat up to t packet losses per data item for any t≪k. This solution maintains near-optimal performance even with packet losses.https://authors.library.caltech.edu/records/x2fmn-vkh31Ad hoc wireless networks with noisy links
https://resolver.caltech.edu/CaltechAUTHORS:20111005-091318608
Authors: Booth, Lorna; Bruck, Jehoshua; Cook, Matthew; Franceschetti, Massimo
Year: 2003
DOI: 10.1109/ISIT.2003.1228402
Models of ad-hoc wireless networks are often based on the geometric disc abstraction: transmission is assumed to be isotropic, and reliable communication channels are assumed to exist (apart from interference) between nodes closer than a given distance. In reality communication channels are unreliable and communication range is generally not rotationally symmetric. In this paper we examine how
these issues affect network connectivity.https://authors.library.caltech.edu/records/f2npx-a7660Bridging Paradigm Gaps Between Biology and Engineering
https://resolver.caltech.edu/CaltechAUTHORS:20111025-143359852
Authors: Bruck, Johoshua
Year: 2003
DOI: 10.1109/CSB.2003.1227290
Computing and communications are well understood topics in engineering. However, we are very much at the beginning of the road to understanding those mechanisms in biological systems. I'll argue that progress in biology will require better understanding of biologically inspired paradigms for computing and communications. In particular, I'll discuss some initial results related to asynchronous circuits with feedback and to delay insensitive communications.https://authors.library.caltech.edu/records/7trtc-y3788Scheduling for Efficient Data Broadcast over Two Channels
https://resolver.caltech.edu/CaltechAUTHORS:20110921-122835869
Authors: Foltz, Kevin; Xu, Lihao; Bruck, Jehoshua
Year: 2004
DOI: 10.1109/ISIT.2004.1365147
As wireless computer networks grow more popular, we are
faced with the problem of providing scalable,
high-bandwidth service to a growing number of users. In the wireless domain, "data push" promises to provide superior performance for many applications [1]. The broadcast domain that is typical of wireless communication is very effective in distributing information to large audiences. Work has been done to schedule data broadcast from a server to many clients using the broadcast disk model [3]. However, little of it has looked at methods for more than one channel. We examine a simple two-channel broadcast model and present some interesting scheduling results for this model.https://authors.library.caltech.edu/records/61fev-rtp87Optimal t-Interleaving on Tori
https://resolver.caltech.edu/CaltechAUTHORS:20110818-083929592
Authors: Jiang, Anxiao (Andrew); Cook, Matthew; Bruck, Jehoshua
Year: 2004
DOI: 10.1109/ISIT.2004.1365060
The number of integers needed to t-interleave
a 2-dimensional torus has a sphere-packing
lower bound. We present the necessary and sufficient
conditions for tori to meet that lower bound. We
prove that for tori sufficiently large in both dimensions,
their t-interleaving numbers exceed the lower
bound by at most 1. We then show upper bounds on
t-interleaving numbers for other cases, completing a
general picture for the problem of t-interleaving on
2-dimensional tori. Efficient t-interleaving algorithms
are also presented.https://authors.library.caltech.edu/records/h1xvx-0qj37Miscorrection probability beyond the minimum distance
https://resolver.caltech.edu/CaltechAUTHORS:CASisit04
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2004
DOI: 10.1109/ISIT.2004.1365561
The miscorrection probability of a list decoder is the probability that the decoder will have at least one non-causal codeword in its decoding sphere. Evaluating this probability is important when using a list-decoder as a conventional decoder since in that case we require the list to contain at most one codeword for most of the errors. A lower bound on the miscorrection is the main result. The key ingredient in the proof is a new combinatorial upper bound on the list-size for a general q−ary block code. This bound is tighter than the best known on large alphabets, and it is shown to be very close to the algebraic bound for Reed-Solomon codes. Finally we discuss two known upper bounds on the miscorrection probability and unify them for linear MDS codes.https://authors.library.caltech.edu/records/rjbbz-vxg40The encoding complexity of network coding
https://resolver.caltech.edu/CaltechAUTHORS:LANisit05b
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2005
In the multicast network coding problem, a source s needs to deliver h packets to a set of k terminals over an underlying network G. The nodes of the coding network can be broadly categorized into two groups. The first group includes encoding nodes, i.e., nodes that generate new packets by combining data received from two or more incoming links. The second group includes forwarding nodes that can only duplicate and forward the incoming packets. Encoding nodes are, in general, more expensive due to the need to equip them with encoding capabilities. In addition, encoding nodes incur delay and increase the overall complexity of the network. Accordingly, in this paper we study the design of multicast coding networks with a limited number of encoding nodes. We prove that in an acyclic coding network, the number of encoding nodes required to achieve the capacity of the network is bounded by h^3k^2. Namely, we present (efficiently constructible) network codes that achieve capacity in which the total number of encoding nodes is independent of the size of the network and is bounded by h^3k^2. We show that the number of encoding nodes may depend both on h and k as we present acyclic instances of the multicast network coding problem in which [Omega](h^2k) encoding nodes are needed. In the general case of coding networks with cycles, we show that the number of encoding nodes is limited by the size of the feedback link set, i.e., the minimum number of links that must be removed from the network in order to eliminate cycles. Specifically, we prove that the number of encoding nodes is bounded by (2B+1)h^3k^2, where B is the minimum size of the feedback link set. Finally, we observe that determining or even crudely approximating the minimum number of encoding nodes needed to achieve the capacity for a given instance of the network coding problem is NP-hard.https://authors.library.caltech.edu/records/qa637-30s54Staleness vs. waiting time in universal discrete broadcast
https://resolver.caltech.edu/CaltechAUTHORS:LANisit05a
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2005
In this paper we study the distribution of dynamic data over a broadcast channel to a large number of passive clients. The data is simultaneously distributed to clients in the form of discrete packets, each packet captures the most recent state of the information source. Clients obtain the information by accessing the channel and listening for the next available packet. This scenario, referred to as discrete broadcast, has many practical applications such as the distribution of stock information to wireless mobile devices and downloading up-to-date battle information in military networks.
Our goal is minimize the amount of time a client has to wait in order to obtain a new data packet, i.e., the waiting time of the client. We show that we can significantly reduce the waiting time by adding redundancy to the schedule. We identify universal schedules that guarantee low waiting time for any client, regardless of the access pattern.
A key point in the design of data distribution systems is to ensure that the transmitted information is always up-to-date. Accordingly, we introduce the notion of staleness that captures the amount of time that passes from the moment the information is generated, until it is delivered to the client. We investigate the fundamental trade-off between the staleness and the waiting time. In particular, we present schedules that yield lowest possible waiting time for any given staleness constraint.https://authors.library.caltech.edu/records/km3eg-16a62Optimal universal schedules for discrete broadcast
https://resolver.caltech.edu/CaltechAUTHORS:LANisit04
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2005
DOI: 10.1109/ISIT.2004.1365148
This paper investigates an efficient scheduling for sending dynamic data over lossless broadcast channels. A server transmits dynamic data periodically to a number of passive clients and thus the updated discrete packets are sent into a separate packet. The objective of this paper is to design universal schedules that minimize the time that passes between a client's request and the broadcast of a new item, independently of the client's behavior. From the results the optimal scheduling of high transmission rate for discrete broadcast data is obtained by considering adaptive clients.https://authors.library.caltech.edu/records/9x5h3-q5j37Monotone Percolation and The Topology Control of Wireless Networks
https://resolver.caltech.edu/CaltechAUTHORS:20110818-114857462
Authors: Jiang, Anxiao; Bruck, Jehoshua
Year: 2005
DOI: 10.1109/INFCOM.2005.1497903
This paper addresses the topology control problem for large wireless networks that are modelled by an infinite point process on a two-dimensional plane. Topology control is the process of determining the edges in the network by adjusting the transmission radii of the nodes. Topology control algorithms should be based on local decisions, be adaptive to changes, guarantee full connectivity and support efficient routing. We present a family of topology control algorithms that, respectively, achieve some or all of these requirements efficiently. The key idea in our algorithms is a concept that we call monotone percolation. In classical percolation theory, we are interested in the emergence of an infinitely large connected component. In contrast, in monotone percolation we are interested in the existence of a relatively short path that makes monotonic progress between any pair of source and destination nodes. Our key contribution is that we demonstrate how local decisions on the transmission radii can lead to monotone percolation and in turn to efficient topology control algorithms.https://authors.library.caltech.edu/records/3zha7-xnt45Localization and routing in sensor networks by local angle information
https://resolver.caltech.edu/CaltechAUTHORS:20160811-163730860
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2005
DOI: 10.1145/1062689.1062713
Location information is very useful in the design of sensor network infrastructures. In this paper, we study the anchor-free 2D localization problem by using local angle measurements in a sensor network. We prove that given a unit disk graph and the angles between adjacent edges, it is NP-hard to find a valid embedding in the plane such that neighboring nodes are within distance 1 from
each other and non-neighboring nodes are at least distance 1 away. Despite the negative results, however, one can find a planar spanner of a unit disk graph by using only local angles. The planar spanner can be used to generate a set of virtual coordinates that enable efficient and local routing schemes such as geographical routing or approximate shortest path routing. We also proposed a practical
anchor-free embedding scheme by solving a linear program.
We show by simulation that not only does it give very good local embedding, i.e., neighboring nodes are close and non-neighboring nodes are far away, but it also gives a quite accurate global view such that geographical routing and approximate shortest path routing on the embedded graph are almost identical to those on the original
(true) embedding. The embedding algorithm can be adapted to
other models of wireless sensor networks and is robust to measurement noise.https://authors.library.caltech.edu/records/gbnzz-4by52Network coding for non-uniform demands
https://resolver.caltech.edu/CaltechAUTHORS:CASisit05
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2005
DOI: 10.1109/ISIT.2005.1523639
Non-uniform demand networks are defined as a useful connection model, in between multicasts and general connections. In these networks, each sink demands a certain number of messages, without specifying their identities. We study the solvability of such networks and give a tight bound on the number of sinks for which the min cut condition is sufficient. This sufficiency result is unique to the non-uniform demand model and does not apply to general connection networks. We propose constructions to solve networks at, or slightly below capacity, and investigate the effect large alphabets have on the solvability of such networks. We also show that our efficient constructions are suboptimal when used in networks with more sinks, yet this comes with little surprise considering the fact that the general problem is shown to be NP-hard.https://authors.library.caltech.edu/records/26cx3-qxz51MAP: medial axis based geometric routing in sensor networks
https://resolver.caltech.edu/CaltechAUTHORS:20160811-164254714
Authors: Bruck, Jehoshua; Gao, Jie; Jiang, Anxiao (Andrew)
Year: 2005
DOI: 10.1145/1080829.1080839
One of the challenging tasks in the deployment of dense wireless networks (like sensor networks) is in devising a routing scheme for node to node communication. Important consideration includes scalability, routing complexity, the length of the communication paths and the load sharing of the routes. In this paper, we show that a compact and expressive abstraction of network connectivity by
the medial axis enables efficient and localized routing. We propose MAP, a Medial Axis based naming and routing Protocol that does not require locations, makes routing decisions locally, and achieves good load balancing. In its preprocessing phase, MAP constructs the medial axis of the sensor field, defined as the set of nodes with
at least two closest boundary nodes. The medial axis of the network captures both the complex geometry and non-trivial topology of the sensor field. It can be represented compactly by a graph whose size is comparable with the complexity of the geometric features (e.g., the number of holes). Each node is then given a name related to
its position with respect to the medial axis. The routing scheme is derived through local decisions based on the names of the source and destination nodes and guarantees delivery with reasonable and natural routes. We show by both theoretical analysis and simulations
that our medial axis based geometric routing scheme is scalable, produces short routes, achieves excellent load balancing, and is very robust to variations in the network model.https://authors.library.caltech.edu/records/2m2sg-avn58Network Coding: A Computational Perspective
https://resolver.caltech.edu/CaltechAUTHORS:20110630-145653337
Authors: Langberg, Michael; Sprintson, Alexander; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/CISS.2006.286590
In this work, we study the computational perspective of network coding, focusing on two issues. First, we address the computational complexity of finding a network code for acyclic multicast networks. Second, we address the issue of reducing the amount of computation performed by the network nodes. In particular, we consider the problem of finding a network code with the minimum possible number of encoding nodes, i.e., nodes that generate new packets by combining the packets received over incoming links. We present a deterministic algorithm that finds a feasible network code for a multicast network over an underlying graph G(V, E) in time O(|E|kh+|V|k^2h^2+h^4k^3(k+h)), where k is the number of destinations and h is the number of packets. This improves the best known running time of O(|E|kh+|V|k^2h^2(k+h)) of Jaggi et al. (2005) in the typical case of large communication graphs. In addition, our algorithm guarantees that the number of encoding nodes in the obtained network code is bounded by O(h^3k^2). Next, we address the problem of finding a network code with the minimum number of encoding nodes in both integer and fractional coding networks. We prove that in the majority of settings this problem is NP-hard. However, we show that if h=O(1) and k=O(1) and the underlying communication graph is acyclic, then there exists an algorithm that solves this problem in polynomial time.https://authors.library.caltech.edu/records/4nwht-fpn50Shortening Array Codes and the Perfect 1-Factorization Conjecture
https://resolver.caltech.edu/CaltechAUTHORS:20170516-150511939
Authors: Bohossian, Vasken; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/ISIT.2006.261572
The existence of a perfect 1-factorization of the complete graph K n, for arbitrary n, is a 40-year old open problem in graph theory. Two infinite families of perfect 1-factorizations are known for K_(2p) and K_(p+1), where p is a prime. It was shown in L. Xu et al. (1999) that finding a perfect 1-factorization of K_n can be reduced to a problem in coding, i.e. to constructing an MDS, lowest density array code of length n. In this paper, a new method for shortening arbitrary array codes is introduced. It is then used to derive the K_(p+1) family of perfect 1-factorizations from the K_(2p) family, by applying the reduction mentioned above. Namely, techniques from coding theory are used to prove a new result in graph theory.https://authors.library.caltech.edu/records/xqx6j-7xc33On the Capacity of Precision-Resolution Constrained Systems
https://resolver.caltech.edu/CaltechAUTHORS:20170509-172831834
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/ISIT.2006.262110
Arguably, the most famous constrained system is the (d, k)-RLL (run-length limited), in which a stream of bits obeys the constraint that every two 1's are separated by at least d 0's, and there are no more than k consecutive 0's anywhere in the stream. The motivation for this scheme comes from the fact that certain sensor characteristics restrict the minimum time between adjacent 1's or else the two will be merged in the receiver, while a clock drift between transmitter and receiver may cause spurious 0's or missing 0's at the receiver if too many appear consecutively.
The interval-modulation scheme introduced by Mukhtar and Bruck extends the RLL constraint and implicitly suggests away of taking advantage of higher-precision clocks. Their work however, deals only with an encoder/decoder construction.
In this work we introduce a more general framework which we call the precision-resolution (PR) constrained system. In PR systems, the encoder has precision constraints, while the decoder has resolution constraints. We examine the capacity of PR systems and show the gain in the presence of a high-precision encoder (thus, we place the PR system with integral encoder, (p=1, ɑ, θ)-PR, which turns out to be a simple extension of RLL, and the PR system with infinite-precision encoder, (∞, ɑ, θ)-PR, on two ends of a continuum). We derive an exact expression for their capacity in terms of the precision p, the minimal resolvable measurement at the decoder alpha, and the decoder resolution factor thetas. In an analogy to the RLL terminology these are the clock precision, the minimal time between peaks, and the clock drift. Surprisingly, even with an infinite-precision encoder, the capacity is finite.https://authors.library.caltech.edu/records/v2vf1-7kn44Cyclic Low-Density MDS Array Codes
https://resolver.caltech.edu/CaltechAUTHORS:20170516-163950291
Authors: Cassuto, Yuval; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/ISIT.2006.261571
We construct two infinite families of low density MDS array codes which are also cyclic. One of these families includes the first such sub-family with redundancy parameter r > 2. The two constructions have different algebraic formulations, though they both have the same indirect structure. First MDS codes that are not cyclic are constructed and then by applying a certain mapping to their parity check matrices, non-equivalent cyclic codes with the same distance and density properties are obtained. Using the same proof techniques, a third infinite family of quasi-cyclic codes can be constructed.https://authors.library.caltech.edu/records/d2y00-smf23Anti-Jamming Schedules for Wireless Data Broadcast Systems
https://resolver.caltech.edu/CaltechAUTHORS:20170510-171516844
Authors: Codenotti, Paolo; Sprintson, Alexander; Bruck, Jehoshua
Year: 2006
DOI: 10.1109/ISIT.2006.261756
Modern society is heavily dependent on wireless networks for providing voice and data communications. Wireless data broadcast has recently emerged as an attractive way to disseminate dynamic data to a large number of clients. In data broadcast systems, the server proactively transmits the information on a downlink channel; the clients access the data by listening to the channel. Wireless data broadcast systems can serve a large number of heterogeneous clients, minimizing power consumption as well as protecting the privacy of the clients' locations. The availability and relatively low cost of antennas resulted in a number of potential threats to the integrity of the wireless infrastructure. In particular, the data broadcast systems are vulnerable to jamming, i.e., the use of active signals to prevent data broadcast. The goal of jammers is to cause disruption, resulting in long waiting times and excessive power consumption. In this paper we investigate efficient schedules for wireless data broadcast that perform well in the presence of a jammer. We show that the waiting time of client can be reduced by adding redundancy to the schedule and establish upper and lower bounds on the achievable minimum waiting time under different requirements on the staleness of the transmitted data.https://authors.library.caltech.edu/records/ggkc8-7gh46Synthesizing stochasticity in biochemical systems
https://resolver.caltech.edu/CaltechAUTHORS:20161019-153750816
Authors: Fett, Brian; Bruck, Jehoshua; Riedel, Marc D.
Year: 2007
DOI: 10.1145/1278480.1278643
Randomness is inherent to biochemistry: at each instant, the sequence of reactions that fires is a matter of chance. Some biological systems exploit such randomness, choosing between different outcomes stochastically - in effect, hedging their bets with a portfolio of responses for different environmental conditions. In this paper, we discuss techniques for synthesizing such stochastic behavior in engineered biochemical systems. We propose a general method for designing a set of biochemical reactions that produces different combinations of molecular types according to a specified probability distribution. The response is precise and robust to perturbations. Furthermore, it is programmable: the probability distribution is a function of the quantities of input types. The method is modular and extensible. We discuss strategies for implementing various functional dependencies: linear, logarithmic, exponential, etc. This work has potential applications in domains such as biochemical sensing, drug production, and disease treatment. Moreover, it provides a framework for analyzing and characterizing the stochastic dynamics in natural biochemical systems such as the lysis/lysogeny switch of the lambda bacteriophage.https://authors.library.caltech.edu/records/7g96g-byv50Floating Codes for Joint Information Storage in Write Asymmetric Memories
https://resolver.caltech.edu/CaltechAUTHORS:20170419-152416338
Authors: Jiang, Anxiao (Andrew); Bohossian, Vasken; Bruck, Jehoshua
Year: 2007
DOI: 10.1109/ISIT.2007.4557381
Memories whose storage cells transit irreversibly between states have been common since the start of the data storage technology. In recent years, flash memories and other non-volatile memories based on floating-gate cells have become a very important family of such memories. We model them by the write asymmetric memory (WAM), a memory where each cell is in one of q states - state 0, 1, middotmiddotmiddot, q - 1 - and can only transit from a lower state to a higher state. Data stored in a WAM can be rewritten by shifting the cells to higher states. Since the state transition is irreversible, the number of times of rewriting is limited. When multiple variables are stored in a WAM, we study codes, which we call floating codes, that maximize the total number of times the variables can be written and rewritten. In this paper, we present several families of floating codes that either are optimal, or approach optimality as the codes get longer. We also present bounds to the performance of general floating codes. The results show that floating codes can integrate the rewriting capabilities of different variables to a surprisingly high degree.https://authors.library.caltech.edu/records/192v0-rz826Constrained Codes as Networks of Relations
https://resolver.caltech.edu/CaltechAUTHORS:20170424-171247108
Authors: Schwartz, Moshe; Bruck, Jehoshua
Year: 2007
DOI: 10.1109/ISIT.2007.4557416
We revisit the well-known problem of determining the capacity of constrained systems. While the one-dimensional case is well understood, the capacity of two-dimensional systems is mostly unknown. When it is non-zero, except for the (1,∞ )- RLL system on the hexagonal lattice, there are no closed-form analytical solutions known. Furthermore, for the related problem of counting the exact number of constrained arrays of any given size, only exponential-time algorithms are known.
We present a novel approach to finding the exact capacity of two-dimensional constrained systems, as well as efficiently counting the exact number of constrained arrays of any given size. To that end, we borrow graph-theoretic tools originally developed for the field of statistical mechanics, tools for efficiently simulating quantum circuits, as well as tools from the theory of the spectral distribution of Toeplitz matrices.https://authors.library.caltech.edu/records/jfxhx-hrc09Codes for Multi-Level Flash Memories: Correcting Asymmetric Limited-Magnitude Errors
https://resolver.caltech.edu/CaltechAUTHORS:20170426-165849521
Authors: Cassuto, Yuval; Schwartz, Moshe; Bohossian, Vasken; Bruck, Jehoshua
Year: 2007
DOI: 10.1109/ISIT.2007.4557123
Several physical effects that limit the reliability and performance of Multilevel Flash memories induce errors that have low magnitude and are dominantly asymmetric. This paper studies block codes for asymmetric limited-magnitude errors over q-ary channels. We propose code constructions for such channels when the number of errors is bounded by t. The construction uses known codes for symmetric errors over small alphabets to protect large-alphabet symbols from asymmetric limited-magnitude errors. The encoding and decoding of these codes are performed over the small alphabet whose size depends only on the maximum error magnitude and is independent of the alphabet size of the outer code. An extension of the construction is proposed to include systematic codes as a benefit to practical implementation.https://authors.library.caltech.edu/records/r5wsg-xt282Buffer Coding for Asymmetric Multi-Level Memory
https://resolver.caltech.edu/CaltechAUTHORS:20170426-152709376
Authors: Bohossian, Vasken; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2007
DOI: 10.1109/ISIT.2007.4557384
Certain storage media such as flash memories use write-asymmetric, multi-level storage elements. In such media, data is stored in a multi-level memory cell the contents of which can only be increased, or reset. The reset operation is expensive and should be delayed as much as possible. Mathematically, we consider the problem of writing a binary sequence into write-asymmetric q-ary cells, while recording the last r bits written. We want to maximize t, the number of possible writes, before a reset is needed. We introduce the term Buffer Code, to describe the solution to this problem. A buffer code is a code that remembers the r most recent values of a variable. We present the construction of a single-cell (n = 1) buffer code that can store a binary (l = 2) variable with t = [q/2^(r - 1)] + r - 2 and a universal upper bound to the number of rewrites that a single-cell buffer code can have: ..... We also show a binary buffer code with arbitrary n, q, r, namely, the code uses n q-ary cells to remember the r most recent values of one binary variable. The code can rewrite the variable times, which is asymptotically optimal in q and n. . We then extend the code construction for the case r = 2, and obtain a code that can rewrite the variable t = (q - 1)(n - 2) + 1 times. When q = 2, the code is strictly optimal.https://authors.library.caltech.edu/records/j7xa3-njj15Distributed broadcasting and mapping protocols in directed anonymous networks
https://resolver.caltech.edu/CaltechAUTHORS:20161121-163644968
Authors: Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2007
DOI: 10.1145/1281100.1281184
In this work we study the fundamental problems of broad-
casting and mapping (label assignment and topology extraction) in directed anonymous networks. In such a network G, processors do not have unique identifiers, they execute identical protocols, and they have no knowledge of the topology of the network (even the size or bounds on it are unknown). The only knowledge available to a vertex is its own degree.https://authors.library.caltech.edu/records/tdksz-0tn97Stochastic switching circuit synthesis
https://resolver.caltech.edu/CaltechAUTHORS:WILisit08
Authors: Wilhelm, Daniel; Bruck, Jehoshua
Year: 2008
DOI: 10.1109/ISIT.2008.4595215
Shannon in his 1938 Masterpsilas Thesis demonstrated that any Boolean function can be realized by a switching relay circuit, leading to the development of deterministic digital logic. Here, we replace each classical switch with a probabilistic switch (pswitch). We present algorithms for synthesizing circuits closed with a desired probability, including an algorithm that generates optimal size circuits for any binary fraction. We also introduce a new duality property for series-parallel stochastic switching circuits. Finally, we construct a universal probability generator which maps deterministic inputs to arbitrary probabilistic outputs. Potential applications exist in the analysis and design of stochastic networks in biology and engineering.https://authors.library.caltech.edu/records/8p2vh-7cv44Universal rewriting in constrained memories
https://resolver.caltech.edu/CaltechAUTHORS:20170321-172544029
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ISIT.2009.5205981
A constrained memory is a storage device whose elements change their states under some constraints. A typical example is flash memories, in which cell levels are easy to increase but hard to decrease. In a general rewriting model, the stored data changes with some pattern determined by the application. In a constrained memory, an appropriate representation is needed for the stored data to enable efficient rewriting.
In this paper, we define the general rewriting problem using a graph model. This model generalizes many known rewriting models such as floating codes, WOM codes, buffer codes, etc. We present a novel rewriting scheme for the flash-memory model and prove it is asymptotically optimal in a wide range of scenarios.
We further study randomization and probability distributions to data rewriting and study the expected performance. We present a randomized code for all rewriting sequences and a deterministic code for rewriting following any i.i.d, distribution. Both codes are shown to be optimal asymptotically.https://authors.library.caltech.edu/records/4k7ny-2dc62Programmability of Chemical Reaction Networks
https://resolver.caltech.edu/CaltechAUTHORS:20111020-103016495
Authors: Cook, Matthew; Soloveichik, David; Winfree, Erik; Bruck, Jehoshua
Year: 2009
DOI: 10.1007/978-3-540-88869-7_27
Motivated by the intriguing complexity of biochemical circuitry within individual cells we study Stochastic Chemical Reaction Networks (SCRNs), a formal model that considers a set of chemical reactions acting on a finite number of molecules in a well-stirred solution according to standard chemical kinetics equations. SCRNs have been widely used for describing naturally occurring (bio)chemical systems, and with the advent of synthetic biology they become a promising language for the design of artificial biochemical circuits. Our interest here is the computational power of SCRNs and how they relate to more conventional models of computation. We survey known connections and give new connections between SCRNs and Boolean Logic Circuits, Vector Addition Systems, Petri nets, Gate Implementability, Primitive Recursive Functions, Register Machines, Fractran, and Turing Machines. A theme to these investigations is the thin line between decidable and undecidable questions about SCRN behavior.https://authors.library.caltech.edu/records/pgm0a-wzh97The robustness of stochastic switching networks
https://resolver.caltech.edu/CaltechAUTHORS:20100816-134022210
Authors: Loh, Po-Ling; Zhou, Hongchao; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ISIT.2009.5205379
Many natural systems, including chemical and biological systems, can be modeled using stochastic switching circuits. These circuits consist of stochastic switches, called pswitches, which operate with a fixed probability of being open or closed. We study the effect caused by introducing an error of size. to each pswitch in a stochastic circuit. We analyze two constructions--simple series-parallel and general series-parallel circuits--and prove that simple series-parallel circuits are robust to small error perturbations, while general series-parallel circuits are not. Specifically, the total error introduced by perturbations of size less than ε is bounded by a constant multiple of ε in a simple series-parallel circuit, independent of the size of the circuit. However, the same result does not hold in the case of more general series-parallel circuits. In the case of a general stochastic circuit, we prove that the overall error probability is bounded by a linear function of the number of pswitches.https://authors.library.caltech.edu/records/t7g6r-05h46On the expressibility of stochastic switching circuits
https://resolver.caltech.edu/CaltechAUTHORS:20100816-150432698
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ISIT.2009.5205401
Stochastic switching circuits are relay circuits that consist of stochastic switches (that we call pswitches). We study the expressive power of these circuits; in particular, we address the following basic question: given an arbitrary integer q, and a pswitch set {1/q, 2/q, ..., q-1/q}, can we realize any rational probability with denominator q^n (for arbitrary n) by a simple series-parallel stochastic switching circuit? In this paper, we generalized previous results and prove that when q is a multiple of 2 or 3 the answer is positive. We also show that when q is a prime number the answer is negative. In addition, we prove that any desired probability can be approximated well by a linear in n size circuit, with error less than q^(-n).https://authors.library.caltech.edu/records/kwnj6-a3h97On the capacity of bounded rank modulation for flash memories
https://resolver.caltech.edu/CaltechAUTHORS:20100816-142932373
Authors: Wang, Zhiying; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ISIT.2009.5205972
Rank modulation has been introduced as a new information representation scheme for flash memories. Given the charge levels of a group of flash cells, sorting is used to induce a permutation, which in turn represents data. Motivated by the lower sorting complexity of smaller cell groups, we consider bounded rank modulation, where a sequence of permutations of given sizes are used to represent data. We study the capacity of bounded rank modulation under the condition that permutations can overlap for higher capacity.https://authors.library.caltech.edu/records/7f210-bjx91Data movement in flash memories
https://resolver.caltech.edu/CaltechAUTHORS:20170321-173656746
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Mateescu, Robert; Bruck, Jehoshua
Year: 2009
DOI: 10.1109/ALLERTON.2009.5394879
NAND flash memories are the most widely used non-volatile memories, and data movement is common in flash storage systems. We study data movement solutions that minimize the number of block erasures, which are very important for the efficiency and longevity of flash memories. To move data among n blocks with the help of Δ auxiliary blocks, where every block contains m pages, we present algorithms that use θ(n · min{m, log_Δ n}) erasures without the tool of coding. We prove this is almost the best possible for non-coding solutions by presenting a nearly matching lower bound. Optimal data movement can be achieved using coding, where only θ(n) erasures are needed. We present a coding-based algorithm, which has very low coding complexity, for optimal data movement. We further show the NP hardness of both coding-based and non-coding schemes when the objective is to optimize data movement on a per instance basis.https://authors.library.caltech.edu/records/wtbra-ykx96Partial Rank Modulation for Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20110331-130545474
Authors: Wang, Zhiying; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/ISIT.2010.5513597
Rank modulation was recently proposed as an information representation for multilevel flash memories, using permutations or ranks of n flash cells. The current decoding process finds the cell with the i-th highest charge level at iteration i, for i = 1, 2,...,n - 1. Motivated by the need to reduce the number of such iterations, we consider k-partial permutations, where only the highest k cell levels are considered for information representation. We propose a generalization of Gray codes for k-partial permutations such that information is updated efficiently.https://authors.library.caltech.edu/records/dzswn-2ar83Generalizing the Blum-Elias Method for Generating
Random Bits from Markov Chains
https://resolver.caltech.edu/CaltechAUTHORS:20110331-095348080
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/ISIT.2010.5513679
The problem of random number generation from
an uncorrelated random source (of unknown probability distribution)
dates back to von Neumann's 1951 work. Elias (1972)
generalized von Neumann's scheme and showed how to achieve
optimal efficiency in unbiased random bits generation. Hence, a
natural question is what if the sources are correlated? Both Elias
and Samueleson proposed methods for generating unbiased random
bits in the case of correlated sources (of unknown probability
distribution), specifically, they considered finite Markov chains.
However, their proposed methods are not efficient (Samueleson)
or have implementation difficulties (Elias). Blum (1986) devised
an algorithm for efficiently generating random bits from degree-
2 finite Markov chains in expected linear time, however, his
beautiful method is still far from optimality. In this paper, we
generalize Blum's algorithm to arbitrary degree finite Markov
chains and combine it with Elias's method for efficient generation
of unbiased bits. As a result, we provide the first known algorithm
that generates unbiased random bits from an arbitrary finite
Markov chain, operates in expected linear time and achieves the
information-theoretic upper bound on efficiency.https://authors.library.caltech.edu/records/wwjn6-psy10A Modular Voting Architecture ("Frog Voting")
https://resolver.caltech.edu/CaltechAUTHORS:20100805-151724246
Authors: Bruck, Shuki; Jefferson, David; Rivest, Ronald L.
Year: 2010
DOI: 10.1007/978-3-642-12980-3_5
This paper presents a new framework-a reference architecture-for
voting that we feel has many attractive features. It is not a machine
design, but rather a framework that will stimulate innovation and design.
It is potentially the standard architecture for all future voting equipment.
The ideas expressed here are subject to improvement and further
research.
(An early version of this paper appeared in [2, Part III]. This version of
the paper is very similar, but contains a postscript (Section 8) providing
commentary and discussion of perspectives on this proposal generated
during the intervening years between 2001 and 2008.)https://authors.library.caltech.edu/records/0v6sy-g8e79Data movement and aggregation in flash memories
https://resolver.caltech.edu/CaltechAUTHORS:20170309-135756699
Authors: Jiang, Anxiao (Andrew); Langberg, Michael; Mateescu, Robert; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/ISIT.2010.5513391
NAND flash memories have become the most widely used type of non-volatile memories. In a NAND flash memory, every block of memory cells consists of numerous pages, and rewriting a single page requires the whole block to be erased. As block erasures significantly reduce the longevity, speed and power efficiency of flash memories, it is critical to minimize the number of erasures when data are reorganized. This leads to the data movement problem, where data need to be switched in blocks, and the objective is to minimize the number of block erasures. It has been shown that optimal solutions can be obtained by coding. However, coding-based algorithms with the minimum coding complexity still remain an important topic to study.
In this paper, we present a very efficient data movement algorithm with coding over GF(2) and with the minimum storage requirement. We also study data movement with more auxiliary blocks and present its corresponding solution. Furthermore, we extend the study to the data aggregation problem, where data can not only be moved but also aggregated. We present both non-coding and coding-based solutions, and rigorously prove the performance gain by using coding.https://authors.library.caltech.edu/records/kbrtc-fah11On the Synthesis of Stochastic Flow Networks
https://resolver.caltech.edu/CaltechAUTHORS:20110331-132532031
Authors: Zhou, Hongchao; Chen, Ho-Lin; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/ISIT.2010.5513754
A stochastic flow network is a directed graph with incoming edges (inputs) and outgoing edges (outputs), tokens enter through the input edges, travel stochastically in the network and can exit the network through the output edges. Each node in the network is a splitter, namely, a token can enter a node through an incoming edge and exit on one of the output edges according to a predefined probability distribution. We address the following synthesis question: Given a finite set of possible splitters and an arbitrary rational probability distribution, design a stochastic flow network, such that every token that enters the input edge will exit the outputs with the prescribed probability distribution. The problem of probability synthesis dates back to von Neummann's 1951 work and was followed, among others, by Knuth and Yao in 1976, who demonstrated that arbitrary rational probabilities can be generated with tree networks; where minimizing the expected path length, the expected number of coin tosses in their paradigm, is the key consideration. Motivated by the synthesis of stochastic DNA based molecular systems, we focus on designing optimal-sized stochastic flow networks (the size of a network is the number of splitters). We assume that each splitter has two outgoing edges and is unbiased (probability 1/2 per output edge). We show that an arbitrary rational probability a/b with a ≤ b ≤ 2^n can be realized by a stochastic flow network of size n, we also show that this is optimal. We note that our stochastic flow networks have feedback (cycles in the network), in fact, we demonstrate that feedback improves the expressibility of stochastic flow networks, since without feedback only probabilities of the form ^a/_2^n) (a an integer) can be realized.https://authors.library.caltech.edu/records/hccfy-ah066On a construction for constant-weight Gray codes for local rank modulation
https://resolver.caltech.edu/CaltechAUTHORS:20170309-151553099
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/EEEI.2010.5661923
We consider the local rank-modulation scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. The local rank-modulation, as a generalization of the rank-modulation scheme, has been recently suggested as a way of storing information in flash memory. We study constant-weight Gray codes for the local rank-modulation scheme in order to simulate conventional multilevel flash cells while retaining the benefits of rank modulation. We describe a construction for a codes of rate tending to 1.https://authors.library.caltech.edu/records/hc713-y6h87Rebuilding for Array Codes in Distributed Storage Systems
https://resolver.caltech.edu/CaltechAUTHORS:20110707-082718436
Authors: Wang, Zhiying; Dimakis, Alexandros G.; Bruck, Jehoshua
Year: 2010
DOI: 10.1109/GLOCOMW.2010.5700274
In distributed storage systems that use coding, the issue of minimizing the communication required to rebuild a storage node after a failure arises. We consider the problem of repairing an erased node in a distributed storage system that uses an EVENODD code. EVENODD codes are maximum distance separable (MDS) array codes that are used to protect against erasures, and only require XOR operations for encoding and decoding. We show that when there are two redundancy nodes, to rebuild one erased systematic node, only 3/4 of the information needs to be transmitted. Interestingly, in many cases, the required disk I/O is also minimized.https://authors.library.caltech.edu/records/fd7qm-72d07Patterned cells for phase change memories
https://resolver.caltech.edu/CaltechAUTHORS:20170213-160905267
Authors: Jiang, Anxiao (Andrew); Zhou, Hongchao; Wang, Zhiying; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033979
Phase-change memory (PCM) is an emerging nonvolatile memory technology that promises very high performance. It currently uses discrete cell levels to represent data, controlled by a single amorphous/crystalline domain in a cell. To improve data density, more levels per cell are needed. There exist a number of challenges, including cell programming noise, drifting of cell levels, and the high power requirement for cell programming. In this paper, we present a new cell structure called patterned cell, and explore its data representation schemes. Multiple domains per cell are used, and their connectivity is used to store data. We analyze its storage capacity, and study its error-correction capability and the construction of error-control codes.https://authors.library.caltech.edu/records/dpzzx-7bf24Patterned cells for phase change memories
https://resolver.caltech.edu/CaltechAUTHORS:20170213-160905267
Authors: Jiang, Anxiao (Andrew); Zhou, Hongchao; Wang, Zhiying; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033979
Phase-change memory (PCM) is an emerging nonvolatile memory technology that promises very high performance. It currently uses discrete cell levels to represent data, controlled by a single amorphous/crystalline domain in a cell. To improve data density, more levels per cell are needed. There exist a number of challenges, including cell programming noise, drifting of cell levels, and the high power requirement for cell programming. In this paper, we present a new cell structure called patterned cell, and explore its data representation schemes. Multiple domains per cell are used, and their connectivity is used to store data. We analyze its storage capacity, and study its error-correction capability and the construction of error-control codes.https://authors.library.caltech.edu/records/77cta-jap09Nonuniform Codes for Correcting Asymmetric Errors
https://resolver.caltech.edu/CaltechAUTHORS:20120406-093123448
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033689
Codes that correct asymmetric errors have important applications in storage systems, including optical disks and Read Only Memories. The construction of asymmetric error correcting codes is a topic that was studied extensively, however, the existing approach for code construction assumes that every codeword could sustain t asymmetric errors. Our main observation is that in contrast to symmetric errors, where the error probability of a codeword is context independent (since the error probability for 1s and 0s is identical), asymmetric errors are context dependent. For example, the all-1 codeword has a higher error probability than the all-0 codeword (since the only errors are 1 → 0). We call the existing codes uniform codes while we focus on the notion of nonuniform codes, namely, codes whose codewords can tolerate different numbers of asymmetric errors depending on their Hamming weights. The goal of nonuniform codes is to guarantee the reliability of every codeword, which is important in data storage to retrieve whatever one wrote in. We prove an almost explicit upper bound on the size of nonuniform asymmetric error correcting codes and present two general constructions. We also study the rate of nonuniform codes compared to uniform codes and show that there is a potential performance gain.https://authors.library.caltech.edu/records/bk5h1-qa850MDS Array Codes with Optimal Rebuilding
https://resolver.caltech.edu/CaltechAUTHORS:20120406-093959188
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033733
MDS array codes are widely used in storage systems to protect data against erasures. We address the rebuilding ratio problem, namely, in the case of erasures, what is the the fraction of the remaining information that needs to be accessed in order to rebuild exactly the lost information? It is clear that when the number of erasures equals the maximum number of erasures that an MDS code can correct then the rebuilding ratio is 1 (access all the remaining information). However, the interesting (and more practical) case is when the number of erasures is smaller than the erasure correcting capability of the code. For example, consider an MDS code that can correct two erasures: What is the smallest amount of information that one needs to access in order to correct a single erasure? Previous work showed that the rebuilding ratio is bounded between 1/2 and 3/4, however, the exact value was left as an open problem. In this paper, we solve this open problem and prove that for the case of a single erasure with a 2-erasure correcting code, the rebuilding ratio is 1/2. In general, we construct a new family of r-erasure correcting MDS array codes that has optimal rebuilding ratio of 1/r in the case of a single erasure. Our array codes have efficient encoding and decoding algorithms (for the case r = 2 they use a finite field of size 3) and an optimal update property.https://authors.library.caltech.edu/records/2rrs6-0d438Generalized Gray Codes for Local Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20120405-102509210
Authors: En Gad, Eyal; Langberg, Michael; Schwartz, Moshe; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6034262
We consider the local rank-modulation scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. Local rank-modulation is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study Gray codes for the local rank-modulation scheme in order to simulate conventional multi-level flash cells while retaining the benefits of rank modulation. Unlike the limited scope of previous works, we consider code constructions for the entire range of parameters including the code length, sliding window size, and overlap between adjacent windows. We show our constructed codes have asymptotically-optimal rate. We also provide efficient encoding, decoding, and next-state algorithms.https://authors.library.caltech.edu/records/e70ns-vgr95Error-Correcting Schemes with Dynamic Thresholds in Nonvolatile Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120406-094817699
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033936
Predetermined fixed thresholds are commonly used in nonvolatile memories for reading binary sequences, but they usually result in significant asymmetric errors after a long duration, due to voltage or resistance drift. This motivates us to construct error-correcting schemes with dynamic reading thresholds, so that the asymmetric component of errors are minimized. In this paper, we discuss how to select dynamic reading thresholds without knowing cell level distributions, and present several error-correcting schemes. Analysis based on Gaussian noise models reveals that bit error probabilities can be significantly reduced by using dynamic thresholds instead of fixed thresholds, hence leading to a higher information rate.https://authors.library.caltech.edu/records/mct4d-gk305Compressed Encoding for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20120405-104551517
Authors: En Gad, Eyal; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6034264
Rank modulation has been recently proposed as a scheme for storing information in flash memories. While rank modulation has advantages in improving write speed and endurance, the current encoding approach is based on the "push to the top" operation that is not efficient in the general case. We propose a new encoding procedure where a cell level is raised to be higher than the minimal necessary subset -instead of all - of the other cell levels. This new procedure leads to a significantly more compressed (lower charge levels) encoding. We derive an upper bound for a family of codes that utilize the proposed encoding procedure, and consider code constructions that achieve that bound for several special cases.https://authors.library.caltech.edu/records/6snc1-5vh55Linear extractors for extracting randomness from noisy sources
https://resolver.caltech.edu/CaltechAUTHORS:20120330-134402245
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2011
DOI: 10.1109/ISIT.2011.6033845
Linear transformations have many applications in information theory, like data compression and error-correcting codes design. In this paper, we study the power of linear transformations in randomness extraction, namely linear extractors, as another important application. Comparing to most existing methods for randomness extraction, linear extractors (especially those constructed with sparse matrices) are computationally fast and can be simply implemented with hardware like FPGAs, which makes them very attractive in practical use. We mainly focus on simple, efficient and sparse constructions of linear extractors. Specifically, we demonstrate that random matrices can generate random bits very efficiently from a variety of noisy sources, including noisy coin sources, bit-fixing sources, noisy (hidden) Markov sources, as well as their mixtures. It shows that low-density random matrices have almost the same efficiency as high-density random matrices when the input sequence is long, which provides a way to simplify hardware/software implementation. Note that although we constructed matrices with randomness, they are deterministic (seedless) extractors - once we constructed them, the same construction can be used for any number of times without using any seeds. Another way to construct linear extractors is based on generator matrices of primitive BCH codes. This method is more explicit, but less practical due to its computational complexity and dimensional constraints.https://authors.library.caltech.edu/records/2p13s-m0316Variable-Length Extractors
https://resolver.caltech.edu/CaltechAUTHORS:20120828-165227181
Authors: Zhou, Hongchao; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283024
We study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from non-ideal stochastic processes. The related
interval algorithm proposed by Han and Hoshi has asymptotically optimal performance; however, it assumes that the distribution of the input stochastic process is known. The motivation for our work is the fact that, in practice, sources of randomness have inherent correlations and are affected by measurement's noise. Namely, it is hard to obtain an accurate estimation of the distribution. This challenge was addressed by the concepts of seeded and seedless extractors that can handle general random sources with unknown distributions. However, known seeded and seedless extractors provide extraction efficiencies that are
substantially smaller than Shannon's entropy limit. Our main
contribution is the design of extractors that have a variable input length and a fixed output length, are efficient in the consumption of symbols from the source, are capable of generating random bits from general stochastic processes and approach the information theoretic upper bound on efficiency.https://authors.library.caltech.edu/records/sd3m9-v3h14Trade-offs between Instantaneous and Total Capacity in Multi-Cell Flash Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120828-151832251
Authors: En Gad, Eyal; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6284712
The limited endurance of flash memories is a major design concern for enterprise storage systems. We propose a method to increase it by using relative (as opposed to fixed) cell levels and by representing the information with Write Asymmetric Memory (WAM) codes. Overall, our new method enables faster writes, improved reliability as well as improved endurance by allowing multiple writes between block erasures. We study the capacity of the new WAM codes with relative levels, where the information is represented by multiset permutations induced by the charge levels, and show that it achieves the capacity of any other WAM codes with the same number of writes. Specifically, we prove that it has the potential to double the total capacity of the memory. Since capacity can be achieved only with cells that have a large number of levels, we propose a new architecture that consists of multi-cells — each an aggregation of a number of floating gate transistors.https://authors.library.caltech.edu/records/n5h8x-1m320Systematic Error-Correcting Codes for Rank Modulation
https://resolver.caltech.edu/CaltechAUTHORS:20120828-151501177
Authors: Zhou, Hongchao; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6284106
The rank modulation scheme has been proposed recently for efficiently writing and storing data in nonvolatile memories. Error-correcting codes are very important for rank modulation, and they have attracted interest among researchers. In this work, we explore a new approach, systematic error-correcting codes for rank modulation. In an (n,k) systematic code, we use the permutation induced by the levels of n cells to store data, and the permutation induced by the first k cells (k < n) has a one-to-one mapping to information bits. Systematic codes have the benefits of enabling efficient information retrieval and potentially supporting more efficient encoding and decoding procedures. We study systematic codes for rank modulation equipped with the Kendall's τ-distance. We present (k + 2, k) systematic codes for correcting one error, which have optimal sizes unless perfect codes exist. We also study the design of multi-error-correcting codes, and prove that for any 2 ≤ k < n, there always exists an (n, k) systematic code of minimum distance n-k. Furthermore, we prove that for rank modulation, systematic codes achieve the same capacity as general error-correcting codes.https://authors.library.caltech.edu/records/92gz1-39x98On the Uncertainty of Information Retrieval in Associative Memories
https://resolver.caltech.edu/CaltechAUTHORS:20120828-144523977
Authors: Yaakobi, Eitan; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283016
We (people) are memory machines. Our decision processes, emotions and interactions with the world around us are based on and driven by associations to our memories. This
natural association paradigm will become critical in future memory systems, namely, the key question will not be "How do I store more information?" but rather, "Do I have the relevant information? How do I retrieve it?" The focus of this paper is to make a first step in this direction.
We define and solve a very basic problem in associative
retrieval. Given a word W, the words in the memory that are
t-associated with W are the words in the ball of radius t around W. In general, given a set of words, say W, X and Y, the words that are t-associated with {W, X, Y} are those in the memory that are within distance t from all the three words. Our main goal is to study the maximum size of the t-associated set as a function of the number of input words and the minimum distance of the words in memory - we call this value the uncertainty of an associative memory. We derive the uncertainty of the associative memory that consists of all the binary vectors with an arbitrary number of input words. In addition, we study the retrieval
problem, namely, how do we get the t-associated set given
the inputs? We note that this paradigm is a generalization of the sequences reconstruction problem that was proposed by Levenshtein (2001). In this model, a word is transmitted over multiple channels. A decoder receives all the channel outputs and decodes the transmitted word. Levenshtein computed the minimum number of channels that guarantee a successful decoder - this value happens to be the uncertainty of an associative memory with two input words.https://authors.library.caltech.edu/records/bw7jd-3r945Long MDS Codes for Optimal Repair Bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20120829-103740126
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283041
MDS codes are erasure-correcting codes that can correct the maximum number of erasures given the number of redundancy or parity symbols. If an MDS code has r parities and no more than r erasures occur, then by transmitting all the remaining data in the code one can recover the original information. However, it was shown that in order to recover a single symbol erasure, only a fraction of 1/r of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column, then the code forms a 2D array and such codes are especially widely used in storage systems. In this paper, we ask the following question: given the length of the column l, can we construct high-rate MDS array codes with optimal repair bandwidth of 1/r, whose code length is as long as possible? In this paper, we give code constructions such that the code length is (r + l)logr l.https://authors.library.caltech.edu/records/hgwxz-m6n08Long MDS Codes for Optimal Repair Bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20130204-132322886
Authors: Wang, Zhiying; Tamo, Itzhak; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283041
MDS codes are erasure-correcting codes that can correct the maximum number of erasures given the number of redundancy or parity symbols. If an MDS code has r parities and no more than r erasures occur, then by transmitting all the remaining data in the code one can recover the original information. However, it was shown that in order to recover a single symbol erasure, only a fraction of 1/r of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column, then the code forms a 2D array and such codes are especially widely used in storage systems. In this paper, we ask the following question: given the length of the column l, can we construct high-rate MDS array codes with optimal repair bandwidth of 1/r, whose code length is as long as possible? In this paper, we give code constructions such that the code length is (r + l)log_r l.https://authors.library.caltech.edu/records/gktdd-nq646Decoding of Cyclic Codes over Symbol-Pair Read Channels
https://resolver.caltech.edu/CaltechAUTHORS:20120828-151322448
Authors: Yaakobi, Eitan; Bruck, Jehoshua; Siegel, Paul H.
Year: 2012
DOI: 10.1109/ISIT.2012.6284053
Symbol-pair read channels, in which the outputs of the read process are pairs of consecutive symbols, were recently studied by Cassuto and Blaum. This new paradigm is motivated by the limitations of the reading process in high density data storage systems. They studied error correction in this new paradigm, specifically, the relationship between the minimum Hamming distance of an error correcting code and the minimum pair distance, which is the minimum Hamming distance between symbol-pair vectors derived from codewords of the code. It was proved that for a linear cyclic code with minimum Hamming distance d_H, the corresponding minimum pair distance is at least d_H + 3. Our main contribution is proving that, for a given linear cyclic code with a minimum Hamming distance d_H, the minimum pair distance is at least d_H + [dH/2]. We also describe decoding algorithms, based upon bounded distance decoders for the cyclic code, whose pair-symbol error correcting capabilities reflects the larger minimum pair distance. In addition, we consider the case where a read channel output is a prescribed number, b > 2, of consecutive symbols and provide some generalizations of our results. We note that the symbol-pair read channel problem is a special case of the sequence reconstruction problem that was introduced by Levenshtein.https://authors.library.caltech.edu/records/7pmwy-6c076Access vs. Bandwidth in Codes for Storage
https://resolver.caltech.edu/CaltechAUTHORS:20120829-092120549
Authors: Tamo, Itzhak; Wang, Zhiying; Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ISIT.2012.6283042
Maximum distance separable (MDS) codes are widely used in storage systems to protect against disks (nodes) failures. An (n, k, l) MDS code uses n nodes of capacity l to store k information nodes. The MDS property guarantees the resiliency to any n − k node failures. An optimal bandwidth (resp. optimal access) MDS code communicates (resp. accesses) the minimum amount of data during the recovery process of a single failed node. It was shown that this amount equals a fraction of 1/(n − k) of data stored in each node. In previous optimal bandwidth constructions, l scaled polynomially with k in codes with asymptotic rate < 1. Moreover, in constructions with constant number of parities, i.e. rate approaches 1, l scaled exponentially w.r.t. k. In this paper we focus on the practical case of n − k = 2, and ask the following question: Given the capacity of a node l what is the largest (w.r.t. k) optimal bandwidth (resp. access) (k + 2, k, l) MDS code. We give an upper bound for the general case, and two tight bounds in the special cases of two important families of codes.https://authors.library.caltech.edu/records/8ne9e-4q567Content-assisted file decoding for nonvolatile memories
https://resolver.caltech.edu/CaltechAUTHORS:20170207-175141968
Authors: Li, Yue; Wang, Yue; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2012
DOI: 10.1109/ACSSC.2012.6489154
Nonvolatile memories (NVMs) such as flash memories play a significant role in meeting the data storage requirements of today's computation activities. The rapid increase of storage density for NVMs however brings reliability issues due to closer alignment of adjacent cells on chip, and more levels that are programmed into a cell. We propose a new method for error correction, which uses the random access capability of NVMs and the redundancy that inherently exists in information content. Although it is theoretically possible to remove the redundancy via data compression, existing source coding algorithms do not remove all of it for efficient computation. We propose a method that can be combined with existing storage solutions for text files, namely content-assisted decoding. Using the statistical properties of words and phrases in the text of a given language, our decoder identifies the location of each subcodeword representing some word in a given input noisy codeword, and flips the bits to compute a most likely word sequence. The decoder can be adapted to work together with traditional ECC decoders to keep the number of errors within the correction capability of traditional decoders. The combined decoding framework is evaluated with a set of benchmark files.https://authors.library.caltech.edu/records/53g4r-0wb20Sequence reconstruction for Grassmann graphs and permutations
https://resolver.caltech.edu/CaltechAUTHORS:20170125-143159155
Authors: Yaakobi, Eitan; Schwartz, Moshe; Langberg, Michael; Bruck, Jehoshua
Year: 2013
DOI: 10.1109/ISIT.2013.6620351
The sequence-reconstruction problem was first proposed by Levenshtein in 2001. This problem studies the model where the same word is transmitted over multiple channels. If the transmitted word belongs to some code of minimum distance d and there are at most r errors in every channel, then the minimum number of channels that guarantees a successful decoder (under the assumption that all channel outputs are distinct) has to be greater than the largest intersection of two balls of radius r and with distance at least d between their centers.
This paper studies the combinatorial problem of computing the largest intersection of two balls for two cases. In the first part we solve this problem in the Grassmann graph for all values of d and r. In the second part we derive similar results for permutations under Kendall's τ-metric for some special cases of d and r.https://authors.library.caltech.edu/records/qn4yr-2yw53Approximate Sorting of Data Streams with Limited Storage
https://resolver.caltech.edu/CaltechAUTHORS:20141203-103210856
Authors: Farnoud (Hassanzadeh), Farzad; Yaakobi, Eitan; Bruck, Jehoshua
Year: 2014
DOI: 10.1007/978-3-319-08783-2_40
We consider the problem of approximate sorting of a data stream (in one pass) with limited internal storage where the goal is not to rearrange data but to output a permutation that reflects the ordering of the elements of the data stream as closely as possible. Our main objective is to study the relationship between the quality of the sorting and the amount of available storage. To measure quality, we use permutation distortion metrics, namely the Kendall tau and Chebyshev metrics, as well as mutual information, between the output permutation and the true ordering of data elements. We provide bounds on the performance of algorithms with limited storage and present a simple algorithm that asymptotically requires a constant factor as much storage as an optimal algorithm in terms of mutual information and average Kendall tau distortion.https://authors.library.caltech.edu/records/k2hzs-wx406The capacity of string-duplication systems
https://resolver.caltech.edu/CaltechAUTHORS:20150227-082940148
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2014
DOI: 10.1109/ISIT.2014.6875043
It is known that the majority of the human genome consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple duplication rules, including those resembling genomic duplication processes. In other words, our goal is to find out the capacity, or the expressive power, of these string-duplication systems. Our results include the exact capacities, and bounds on the capacities, of four fundamental string-duplication systems.https://authors.library.caltech.edu/records/v7rpj-t4x19Polar coding for noisy write-once memories
https://resolver.caltech.edu/CaltechAUTHORS:20150227-084706095
Authors: En Gad, Eyal; Li, Yue; Kliewer, Joerg; Langberg, Michael; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2014
DOI: 10.1109/ISIT.2014.6875111
We consider the noisy write-once memory (WOM) model to capture the behavior of data-storage devices such as flash memories. The noisy WOM is an asymmetric channel model with non-causal state information at the encoder. We show that a nesting of non-linear polar codes achieves the corresponding Gelfand-Pinsker bound with polynomial complexity.https://authors.library.caltech.edu/records/ggxqm-zd693Bounds for Permutation Rate-Distortion
https://resolver.caltech.edu/CaltechAUTHORS:20150227-075642886
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2014
DOI: 10.1109/ISIT.2014.6874784
We study the rate-distortion relationship in the set of permutations endowed with the Kendall t-metric and the Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case distortion analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall τ-metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.https://authors.library.caltech.edu/records/2bt4c-2y871Capacity and expressiveness of genomic tandem duplication
https://resolver.caltech.edu/CaltechAUTHORS:20151012-144840366
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ISIT.2015.7282795
The majority of the human genome consists of
repeated sequences. An important type of repeats common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence AGTCTGTGC,TGTG is a tandem repeat, namely, generated from AGTCTGC by a tandem duplication of length 2. In this work, we investigate the possibility of generating a large number of sequences from a small initial string (called the seed) by tandem duplications of bounded length. Our results include exact capacity values for certain tandem duplication string systems with alphabet sizes 2, 3,
and 4. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the expressiveness of a tandem duplication system, as the feasibility of expressing arbitrary substrings. We then completely
characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. Noticing that a system with capacity = 1 is expressive, we prove that for an alphabet size ≥ 4, the capacity is strictly smaller than 1, independent of the seed and the duplication lengths. The proof of this limit on the capacity (note that the genomic alphabet size
is 4), is related to an interesting result by Axel Thue from 1906 which states that there exist arbitrary length sequences with no tandem repeats (square-free) for alphabet size ≥ 3. Finally, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for
these systems.https://authors.library.caltech.edu/records/j1tbj-7vr53Is there a new way to correct errors
https://resolver.caltech.edu/CaltechAUTHORS:20161111-153306808
Authors: Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ITA.2015.7308985
The classic approach for error correction is to add controlled external redundancy to data. This approach, called error-correcting codes, has been studied extensively. And the rates of ECCs are approaching theoretical limits. We explore a second approach for error correction in this work, which is to use the redundancy inside data, even if it is just the residual redundancy after data compression. We focus on text data, and show that this approach based on language processing can significantly improve the error correction performance.https://authors.library.caltech.edu/records/s1kqa-sqd05Error correction through language processing
https://resolver.caltech.edu/CaltechAUTHORS:20160915-122649767
Authors: Jiang, Anxiao (Andrew); Li, Yue; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ITW.2015.7133145
There are two fundamental approaches for error correction. One approach is to add external redundancy to data. The other approach is to use the redundancy inside data, even if it is only the residual redundancy after a data compression algorithm. The first approach, namely error-correcting codes (ECCs), has been studied actively over the past seventy years. In this work, we explore the second approach, and show that it can substantially enhance the error-correction performance. This work focuses on error correction of texts in English as a case study. It proposes a scheme that combines language-based decoding with ECC decoding. Both analysis and experimental results are presented. The scheme can be extended to content-based decoding for more types of data with rich structures.https://authors.library.caltech.edu/records/me5vr-pzd35Rewriting Flash Memories by Message Passing
https://resolver.caltech.edu/CaltechAUTHORS:20151012-142447290
Authors: En Gad, Eyal; Huang, Wentao; Li, Yue; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ISIT.2015.7282534
This paper constructs WOM codes that combine rewriting and error correction for mitigating the reliability and the endurance problems in flash memory.We consider a rewriting model that is of practical interest to flash applications where only the second write uses WOM codes. Our WOM code construction is based on binary erasure quantization with LDGM codes, where the rewriting uses message passing and has potential to share the efficient hardware implementations with LDPC codes in practice. We show that the coding scheme achieves the capacity of the rewriting model. Extensive simulations show that the rewriting performance of our scheme compares favorably with that of polar WOM code in the rate region where high rewriting success probability is desired. We further augment our coding schemes with error correction capability. By drawing a connection to the conjugate code pairs studied in the context of quantum error correction, we develop a general framework for constructing error-correction WOM codes. Under this framework, we give an explicit construction of WOM codes whose codewords are contained in BCH codes.https://authors.library.caltech.edu/records/6vsed-x7y38A Stochastic Model for Genomic Interspersed Duplication
https://resolver.caltech.edu/CaltechAUTHORS:20151012-143650853
Authors: Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/ISIT.2015.7282586
Mutation processes such as point mutation, insertion, deletion, and duplication (including tandem and interspersed duplication) have an important role in evolution, as they lead to genomic diversity, and thus to phenotypic variation. In this work, we study the expressive power of interspersed duplication, i.e., its ability to generate diversity, via a simple but fundamental stochastic model, where the length and the location of the substring that is duplicated and the point of insertion of the copy are chosen randomly. We investigate the properties of the set of high-probability sequences in these stochastic systems. In particular we provide results regarding the asymptotic behavior of frequencies of symbols and strings in a sequence evolving through interspersed duplication. The study of such systems is an important step towards the design and analysis of more realistic and sophisticated models of genomic mutation processes.https://authors.library.caltech.edu/records/gwm2s-gj497Reliability and Hardware Implementation of Rank Modulation Flash Memory
https://resolver.caltech.edu/CaltechAUTHORS:20160909-113811678
Authors: Ma, Yanjun; Li, Yue; Kan, Edwin Chihchuan; Bruck, Jehoshua
Year: 2015
DOI: 10.1109/NVMTS.2015.7457493
We review a novel data representation scheme for NAND flash memory named rank modulation (RM), and discuss its hardware implementation. We show that under the normal threshold voltage (Vth) variations, RM has intrinsic read reliability advantage over conventional multiple-level cells. Test results demonstrating superior reliability using commercial flash chips are reviewed and discussed. We then present a read method based on relative sensing time, which can obtain the rank of all cells in the group in one read cycle. The improvement in reliability and read speed enable similar program-and-verify time in RM as that of conventional MLC flash.https://authors.library.caltech.edu/records/dvrcy-z2n42Error Characterization and Mitigation for 16nm MLC NAND Flash Memory under Total Ionizing Dose Effect
https://resolver.caltech.edu/CaltechAUTHORS:20161004-114313111
Authors: Li, Yue; Sheldon, Douglas J.; Ramos, Andre S.; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/IRPS.2016.7574638
This paper studies the system-level reliability of 16nm MLC NAND flash memories under total ionizing dose (TID) effect. Errors that occur in the parts under TID effect are characterized at multiple levels. Results show that faithful data recovery only lasts until 9k rad. Data errors observed in irradiated flash samples are strongly asymmetric. To improve the reliability of the parts, we study error mitigation methods that consider the specific properties of TID errors. First, we implement a novel data representation scheme that stores data using the relative order of cell voltages. The representation is more robust against uniform asymmetric threshold voltage shift of floating gates. Experimental results show that the scheme reduces errors at least by 50% for blocks with less than 3k program/erase cycles and 10k rad. Second, we conduct empirical evaluations of memory scrubbing schemes. Based on the results, we identify a scheme that refreshes cells without doing block erasure. Evaluation results show that parts under this scrubbing scheme survive up to 8k PECs and 57k rad total doses.https://authors.library.caltech.edu/records/yzxhn-5qe35Data archiving in 1x-nm NAND flash memories: Enabling long-term storage using rank modulation and scrubbing
https://resolver.caltech.edu/CaltechAUTHORS:20161003-155557445
Authors: Li, Yue; Gad, Eyal En; Jiang, Anxiao (Andrew); Bruck, Jehoshua
Year: 2016
DOI: 10.1109/IRPS.2016.7574572
The challenge of using inexpensive and high-density NAND flash for archival storage was posed recently for reducing data center costs. However, such flash memory is becoming more susceptible to noise, and its reliability issues has become the major concern for its adoption by long-term storage systems. This paper studies the system-level reliability of archival storage that uses 1x-nm NAND flash memory. We analyze retention error behavior, and show that 1x-nm MLC and TLC flash do not immediately qualify for long-term storage. We then implement the rank modulation (RM) scheme and memory scrubbing (MS) for retention period (RP) enhancement. The RM scheme provides a new data representation using the relative order of cell voltages, which provides higher reliability against uniform asymmetric threshold voltage shift due to charge leakage. Results show that the new representation reduces raw bit error rate (RBER) by 45% on average, and using RM and MS together provides up to 196, 171, 146 and 121 years of RPs for blocks with 0, 25, 50 and 75 program/erase cycles, respectively.https://authors.library.caltech.edu/records/0w2v4-pqq54The capacity of some Pólya string models
https://resolver.caltech.edu/CaltechAUTHORS:20160824-102815029
Authors: Elishco, Ohad; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/ISIT.2016.7541303
We study random string-duplication systems, called Pólya string models, motivated by certain random mutation processes in the genome of living organisms. Unlike previous works that study the combinatorial capacity of string-duplication systems, or peripheral properties such as symbol frequency, this work provides exact capacity or bounds on it, for several probabilistic models. In particular, we give the exact capacity of the random tandem-duplication system, and the end-duplication system, and bound the capacity of the complement tandem-duplication system. Interesting connections are drawn between the former and the beta distribution common to population genetics, as well as between the latter system and signatures of random permutations.https://authors.library.caltech.edu/records/fqgw6-9mh41Secure RAID Schemes for Distributed Storage
https://resolver.caltech.edu/CaltechAUTHORS:20160823-165433889
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/ISIT.2016.7541529
We propose secure RAID, i.e., low-complexity
schemes to store information in a distributed manner that is
resilient to node failures and resistant to node eavesdropping. We generalize the concept of systematic encoding to secure RAID and show that systematic schemes have significant advantages in the efficiencies of encoding, decoding and random access. For
the practical high rate regime, we construct three XOR-based
systematic secure RAID schemes with optimal encoding and
decoding complexities, from the EVENODD codes and B codes,
which are array codes widely used in the RAID architecture.
These schemes optimally tolerate two node failures and two
eavesdropping nodes. For more general parameters, we construct efficient systematic secure RAID schemes from Reed-Solomon codes. Our results suggest that building "keyless", information-theoretic security into the RAID architecture is practical.https://authors.library.caltech.edu/records/5q6h9-6cr80On the duplication distance of binary strings
https://resolver.caltech.edu/CaltechAUTHORS:20160824-101618060
Authors: Alon, Noga; Bruck, Jehoshua; Farnoud (Hassanzadeh), Farzad; Jain, Siddharth
Year: 2016
DOI: 10.1109/ISIT.2016.7541301
We study the tandem duplication distance between binary sequences and their roots. This distance is motivated by genomic tandem duplication mutations and counts the smallest number of tandem duplication events that are required to take one sequence to another. We consider both exact and approximate tandem duplications, the latter leading to a combined duplication/Hamming distance. The paper focuses on the maximum value of the duplication distance to the root. For exact duplication, denoting the maximum distance to the root of a sequence of length n by f(n), we prove that f(n) = Θ(n). For the case of approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show using the Plotkin bound that the maximum distance has a sharp transition from linear to logarithmic in n at β = 1/2.https://authors.library.caltech.edu/records/hzp2x-rjp69Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms
https://resolver.caltech.edu/CaltechAUTHORS:20160823-165024070
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2016
DOI: 10.1109/ISIT.2016.7541455
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present a family of codes for correcting errors due to tandem-duplications of a fixed length and any number of errors. We also study codes for correcting tandem duplications of length up to a given constant k, where we are primarily focused on the cases of k = 2, 3.https://authors.library.caltech.edu/records/q6hxk-ejv90Correcting errors by natural redundancy
https://resolver.caltech.edu/CaltechAUTHORS:20170907-081956775
Authors: Jiang, Anxiao (Andrew); Upadhyaya, Pulakesh; Haratsch, Erich F.; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ITA.2017.8023455
For the storage of big data, there are significant challenges with its long-term reliability. This paper studies how to use the natural redundancy in data for error correction, and how to combine it with error-correcting codes to effectively improve data reliability. It explores several aspects of natural redundancy, including the discovery of natural redundancy in compressed data, the efficient decoding of codes with random structures, the capacity of error-correcting codes that contain natural redundancy, and the time-complexity tradeoff between source coding and channel coding.https://authors.library.caltech.edu/records/bm7m7-z9t46Secure RAID schemes from EVENODD and STAR codes
https://resolver.caltech.edu/CaltechAUTHORS:20170816-162125720
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ISIT.2017.8006600
We study secure RAID, i.e., low-complexity schemes to store information in a distributed manner that is resilient to node failures and resistant to node eavesdropping. We describe a technique to shorten the secure EVENODD scheme in [6], which can optimally tolerate 2 node failures and 2 eavesdropping nodes. The shortening technique allows us to obtain secure EVENODD schemes of arbitrary lengths, which is important for practical application. We also construct a new secure RAID scheme from the STAR code. The scheme can tolerate 3 node failures and 3 eavesdropping nodes with optimal encoding/decoding and random access complexity.https://authors.library.caltech.edu/records/62pqt-f5y63Secret sharing with optimal decoding and repair bandwidth
https://resolver.caltech.edu/CaltechAUTHORS:20170816-153318334
Authors: Huang, Wentao; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ISIT.2017.8006842
This paper studies the communication efficiency of threshold secret sharing schemes. We construct a family of Shamir's schemes with asymptotically optimal decoding bandwidth for arbitrary parameters. We also construct a family of secret sharing schemes with both optimal decoding and optimal repair bandwidth for arbitrary parameters. The construction leads to a family of regenerating codes allowing centralized repair of multiple node failures with small sub-packetization.https://authors.library.caltech.edu/records/6v3tf-gx561Noise and uncertainty in string-duplication systems
https://resolver.caltech.edu/CaltechAUTHORS:20170816-165117076
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ISIT.2017.8007104
Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations of a constant length, the generation capacity of these mutations is zero. However, if the number of point mutations increases to a constant fraction of the number of duplications, then the capacity is nonzero. Lower and upper bounds for this capacity are also presented. Another problem that we study is concerned with the mismatch between code design and channel in data storage in the DNA of living organisms with respect to duplication mutations. In this context, we consider the uncertainty of such a mismatched coding scheme measured as the maximum number of input codewords that can lead to the same output.https://authors.library.caltech.edu/records/vqckw-85z25Stopping Set Elimination for LDPC Codes
https://resolver.caltech.edu/CaltechAUTHORS:20180125-132316726
Authors: Jiang, Anxiao (Andrew); Upadhyaya, Pulakesh; Wang, Ying; Narayanan, Krishna R.; Zhou, Hongchao; Sima, Jin; Bruck, Jehoshua
Year: 2017
DOI: 10.1109/ALLERTON.2017.8262806
This work studies the Stopping-Set Elimination Problem, namely, given a stopping set, how to remove the fewest erasures so that the remaining erasures can be decoded by belief propagation in k iterations (including k =∞). The NP-hardness of the problem is proven. An approximation algorithm is presented for k = 1. And efficient exact algorithms are presented for general k when the stopping sets form trees.https://authors.library.caltech.edu/records/vnf04-m3829Stash in a Flash
https://resolver.caltech.edu/CaltechAUTHORS:20180308-133517936
Authors: Zuck, Aviad; Li, Yue; Bruck, Jehoshua; Porter, Donald E.; Tsafrir, Dan
Year: 2018
Encryption is a useful tool to protect data confidentiality. Yet it is still challenging to hide the very presence of encrypted, secret data from a powerful adversary. This paper presents a new technique to hide data in flash by manipulating the voltage level of pseudo-randomlyselected flash cells to encode two bits (rather than one) in the cell. In this model, we have one "public" bit interpreted using an SLC-style encoding, and extract a private bit using an MLC-style encoding. The locations of cells that encode hidden data is based on a secret key known only to the hiding user.
Intuitively, this technique requires that the voltage level in a cell encoding data must be (1) not statistically distinguishable from a cell only storing public data, and (2) the user must be able to reliably read the hidden data from this cell. Our key insight is that there is a wide enough variation in the range of voltage levels in a typical flash device to obscure the presence of fine-grained changes to a small fraction of the cells, and that the variation is wide enough to support reliably re-reading hidden data. We demonstrate that our hidden data and underlying voltage manipulations go undetected by support vector machine based supervised learning which performs similarly to a random guess. The error rates of our scheme are low enough that the data is recoverable months after being stored. Compared to prior work, our technique provides 24x and 50x higher encoding and decoding throughput and doubles the capacity, while being 37x more power efficient.https://authors.library.caltech.edu/records/msxyd-8x553Two Deletion Correcting Codes from Indicator Vectors
https://resolver.caltech.edu/CaltechAUTHORS:20180709-103747373
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2018
DOI: 10.1109/ISIT.2018.8437868
Construction of capacity achieving deletion correcting codes has been a baffling challenge for decades. A recent breakthrough by Brakensiek et al., alongside novel applications in DNA storage, have reignited the interest in this longstanding open problem. In spite of recent advances, the amount of redundancy in existing codes is still orders of magnitude away from being optimal. In this paper, a novel approach for constructing binary two-deletion correcting codes is proposed. By this approach, parity symbols are computed from indicator vectors (i.e., vectors that indicate the positions of certain patterns) of the encoded message, rather than from the message itself. Most interestingly, the parity symbols and the proof of correctness are a direct generalization of their counterparts in the Varshamov- Tenengolts construction. Our techniques require 7log(n)+o(log(n) redundant bits to encode an n-bit message, which is near-optimal.https://authors.library.caltech.edu/records/g7qc8-w6s47Stash in a Flash
https://resolver.caltech.edu/CaltechAUTHORS:20190328-165908772
Authors: Zuck, Aviad; Li, Yue; Bruck, Jehoshua; Porter, Donald E.; Tsafrir, Dan
Year: 2018
DOI: 10.1145/3211890.3211906
[no abstract]https://authors.library.caltech.edu/records/56bqy-q0610How to Best Share a Big Secret
https://resolver.caltech.edu/CaltechAUTHORS:20180828-142513016
Authors: Shor, Roman; Yadgar, Gala; Huang, Wentao; Yaakobi, Eitan; Bruck, Jehoshua
Year: 2018
DOI: 10.1145/3211890.3211896
When sensitive data is stored in the cloud, the only way to ensure its secrecy is by encrypting it before it is uploaded. The emerging multi-cloud model, in which data is stored redundantly in two or more independent clouds, provides an opportunity to protect sensitive data with secret-sharing schemes. Both data-protection approaches are considered computationally expensive, but recent advances reduce their costs considerably: (1) Hardware acceleration methods promise to eliminate the computational complexity of encryption, but leave clients with the challenge of securely managing encryption keys. (2) Secure RAID, a recently proposed scheme, minimizes the computational overheads of secret sharing, but requires non-negligible storage overhead and random data generation. Each data-protection approach offers different tradeoffs and security guarantees. However, when comparing them, it is difficult to determine which approach will provide the best application-perceived performance, because previous studies were performed before their recent advances were introduced.
To bridge this gap, we present the first end-to-end comparison of state-of-the-art encryption-based and secret sharing data protection approaches. Our evaluation on a local cluster and on a multi-cloud prototype identifies the tipping point at which the bottleneck of data protection shifts from the computational overhead of encoding and random data generation to storage and network bandwidth and global availability.https://authors.library.caltech.edu/records/q4798-q1h04Attaining the 2nd Chargaff Rule by Tandem Duplications
https://resolver.caltech.edu/CaltechAUTHORS:20181126-143839274
Authors: Jain, Siddharth; Raviv, Netanel; Bruck, Jehoshua
Year: 2018
DOI: 10.1109/ISIT.2018.8437526
Erwin Chargaff in 1950 made an experimental observation that the count of A is equal to the count of T and the count of C is equal to the count of G in DNA. This observation played a crucial role in the discovery of the double stranded helix structure by Watson and Crick. However, this symmetry was also observed in single stranded DNA. This phenomenon was termed as the 2nd Chargaff Rule. This symmetry has been verified experimentally in genomes of several different species not only for mononucleotides but also for reverse complement pairs of larger lengths upto a small error. While the symmetry in double stranded DNA is related to base pairing and replication mechanisms, the symmetry in a single stranded DNA is still a mystery in its function and source. In this work, we define a sequence generation model based on reverse complement tandem duplications. We show that this model generates sequences that satisfy the 2nd Chargaff Rule even when the duplication lengths are very small when compared to the length of sequences. We also provide estimates on the number of generations that are needed by this model to generate sequences that satisfy the 2nd Chargaff Rule. We provide theoretical bounds on the disruption in symmetry for different values of duplication lengths under this model. Moreover, we experimentally compare the disruption in the symmetry incurred by our model with what is observed in human genome data.https://authors.library.caltech.edu/records/fa2gc-evf17Optimal k-Deletion Correcting Codes
https://resolver.caltech.edu/CaltechAUTHORS:20190826-143512243
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/ISIT.2019.8849750
Levenshtein introduced the problem of constructing k-deletion correcting codes in 1966, proved that the optimal redundancy of those codes is O(k log N), and proposed an optimal redundancy single-deletion correcting code (using the so-called VT construction). However, the problem of constructing optimal redundancy k-deletion correcting codes remained open. Our key contribution is a solution to this longstanding open problem. We present a k-deletion correcting code that has redundancy 8k log n + o(log n) and encoding/decoding algorithms of complexity O(n^(2k+1)) for constant k.https://authors.library.caltech.edu/records/b8mh8-1ja32On Coding Over Sliced Information
https://resolver.caltech.edu/CaltechAUTHORS:20191004-100333511
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/isit.2019.8849596
The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide a code construction for a single substitution that is shown to be asymptotically optimal up to constants. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is orderwise equivalent to the amount required in the classical error correcting paradigm.https://authors.library.caltech.edu/records/7mswn-cjs38Download and Access Trade-offs in Lagrange Coded Computing
https://resolver.caltech.edu/CaltechAUTHORS:20191004-100332096
Authors: Raviv, Netanel; Yu, Qian; Bruck, Jehoshua; Avestimehr, Salman
Year: 2019
DOI: 10.1109/isit.2019.8849547
Lagrange Coded Computing (LCC) is a recently proposed technique for resilient, secure, and private computation of arbitrary polynomials in distributed environments. By mapping such computations to composition of polynomials, LCC allows the master node to complete the computation by accessing a minimal number of workers and downloading all of their content, thus providing resiliency to the remaining stragglers. However, in the most common case in which the number of stragglers is less than in the worst case scenario, much of the computational power of the system remains unexploited. To amend this issue, in this paper we expand LCC by studying a fundamental trade-off between download and access, and present two contributions. In the first contribution, it is shown that without any modification to the encoding process, the master can decode the computations by accessing a larger number of nodes, however downloading less information from each node in comparison with LCC (i.e., trading access for download). This scheme relies on decoding a particular polynomial in the ideal that is generated by the polynomials of interest, a technique we call Ideal Decoding. This new scheme also improves LCC in the sense that for systems with adversaries, the overall downloaded bandwidth is smaller than in LCC. In the second contribution we study a real-time model of this trade-off, in which the data from the workers is downloaded sequentially. By clustering nodes of similar delays and encoding the function with Universally Decodable Matrices, the master can decode once sufficient data is downloaded from every cluster, regardless of the internal delays within that cluster. This allows the master to utilize the partial work that is done by stragglers, rather than to ignore it, a feature that most past works in coded computing are lacking.https://authors.library.caltech.edu/records/8grfh-99r86Correcting Deletions in Multiple-Heads Racetrack Memories
https://resolver.caltech.edu/CaltechAUTHORS:20191004-100332823
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/isit.2019.8849783
One of the main challenges in developing racetrack memory systems is the limited precision in controlling the track shifts, that in turn affects the reliability of reading and writing the data. The current proposal for combating deletions in racetrack memories is to use redundant heads per-track resulting in multiple copies (potentially erroneous) and solving a specialized version of a sequence reconstruction problem. Using this approach, k-deletion correcting codes of length n, with d heads per-track, with redundancy log log n + 4 were constructed. However, the code construction requires that k ≤ d. For k > d, the best known construction improves slightly over the classic one head deletion code. Here we address the question: What is the best redundancy that can be achieved for a k-deletion code (k is a constant) if the number of heads is fixed at d (due to area limitations)? Our key result is an answer to this question, namely, we construct codes that can correct k deletions, for any k beyond the known limit of d. The code has O(k^4 dlog log n) redundancy for the case when k ≤ 2d − 1. In addition, when k ≥ 2d, the code has 2⌊k/d⌋ log n + o(log n) redundancy.https://authors.library.caltech.edu/records/ewcv1-ny846Iterative Programming of Noisy Memory Cells
https://resolver.caltech.edu/CaltechAUTHORS:20191004-104451577
Authors: Horovitz, Michal; Yaakobi, Eitan; Gad, Eyal En; Bruck, Jehoshua
Year: 2019
DOI: 10.1109/ITW44776.2019.8989404
In this paper, we study a model, which was first presented by Bunte and Lapidoth, that mimics the programming operation of memory cells. Under this paradigm we assume that cells are programmed sequentially and individually. The programming process is modeled as transmission over a channel, while it is possible to read the cell state in order to determine its programming success, and in case of programming failure, to reprogram the cell again. Reprogramming a cell can reduce the bit error rate, however this comes with the price of increasing the overall programming time and thereby affecting the writing speed of the memory. An iterative programming scheme is an algorithm which specifies the number of attempts to program each cell. Given the programming channel and constraints on the average and maximum number of attempts to program a cell, we study programming schemes which maximize the number of bits that can be reliably stored in the memory. We extend the results by Bunte and Lapidoth and study this problem when the programming channel is either the BSC, BEC, or Z channel. For the BSC and the BEC our analysis is also extended for the case where the error probabilities on consecutive writes are not necessarily the same. Lastly, we also study a related model which is motivated by the synthesis process of DNA molecules.https://authors.library.caltech.edu/records/gj67s-2vh47Improve Robustness of Deep Neural Networks by Coding
https://resolver.caltech.edu/CaltechAUTHORS:20201209-153308085
Authors: Huang, Kunping; Raviv, Netanel; Jain, Siddharth; Upadhyaya, Pulakesh; Bruck, Jehoshua; Siegel, Paul H.; Jiang, Anxiao (Andrew)
Year: 2020
DOI: 10.1109/ita50056.2020.9244998
Deep neural networks (DNNs) typically have many weights. When errors appear in their weights, which are usually stored in non-volatile memories, their performance can degrade significantly. We review two recently presented approaches that improve the robustness of DNNs in complementary ways. In the first approach, we use error-correcting codes as external redundancy to protect the weights from errors. A deep reinforcement learning algorithm is used to optimize the redundancy-performance tradeoff. In the second approach, internal redundancy is added to neurons via coding. It enables neurons to perform robust inference in noisy environments.https://authors.library.caltech.edu/records/jqw53-1hh11What is the Value of Data? on Mathematical Methods for Data Quality Estimation
https://resolver.caltech.edu/CaltechAUTHORS:20200831-142053055
Authors: Raviv, Netanel; Jain, Siddharth; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9174311
Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.https://authors.library.caltech.edu/records/hjtbp-gxn16Syndrome Compression for Optimal Redundancy Codes
https://resolver.caltech.edu/CaltechAUTHORS:20200831-142617575
Authors: Sima, Jin; Gabrys, Ryan; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9174009
We introduce a general technique that we call syndrome compression, for designing low-redundancy error correcting codes. The technique allows us to boost the redundancy efficiency of hash/labeling-based codes by further compressing the labeling. We apply syndrome compression to different types of adversarial deletion channels and present code constructions that correct up to a constant number of errors. Our code constructions achieve the redundancy of twice the Gilbert-Varshamov bound, which improve upon the state of art for these channels. The encoding/decoding complexity of our constructions is of order equal to the size of the corresponding deletion balls, namely, it is polynomial in the code length.https://authors.library.caltech.edu/records/hs589-fp998Robust Indexing - Optimal Codes for DNA Storage
https://resolver.caltech.edu/CaltechAUTHORS:20200831-134827466
Authors: Sima, Jin; Raviv, Netanel; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9174447
The channel model of encoding data as a set of unordered strings is receiving great attention as it captures the basic features of DNA storage systems. However, the challenge of constructing optimal redundancy codes for this channel remained elusive. In this paper, we solve this open problem and present an order-wise optimal construction of codes that correct multiple substitution errors for this channel model. The key ingredient in the code construction is a technique we call robust indexing: instead of using fixed indices to create order in unordered strings, we use indices that are information dependent and thus eliminate unnecessary redundancy. In addition, our robust indexing technique can be applied to the construction of optimal deletion/insertion codes for this channel.https://authors.library.caltech.edu/records/wxyrd-yh839Optimal Systematic t-Deletion Correcting Codes
https://resolver.caltech.edu/CaltechAUTHORS:20200831-144630883
Authors: Sima, Jin; Gabrys, Ryan; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9173986
Systematic deletion correcting codes play an important role in applications of document exchange. Yet despite a series of recent advances made in deletion correcting codes, most of them are non-systematic. To the best of the authors' knowledge, the only known deterministic systematic t-deletion correcting code constructions with rate approaching 1 achieve O(t log² n) bits of redundancy for constant t, where n is the code length. In this paper, we propose a systematic t-deletion correcting code construction that achieves 4t log n + o(log n) bits of redundancy, which is asymptotically within a factor of 4 from being optimal. Our encoding and decoding algorithms have complexity O(n^(2t+1)), which is polynomial for constant t.https://authors.library.caltech.edu/records/5v2ns-fsm86Optimal Codes for the q-ary Deletion Channel
https://resolver.caltech.edu/CaltechAUTHORS:20200831-150933262
Authors: Sima, Jin; Gabrys, Ryan; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/isit44484.2020.9174241
The problem of constructing optimal multiple deletion correcting codes has long been open until recent break-through for binary cases. Yet comparatively less progress was made in the non-binary counterpart, with the only rate one non-binary deletion codes being Tenengolts' construction that corrects single deletion. In this paper, we present several q-ary t-deletion correcting codes of length n that achieve optimal redundancy up to a factor of a constant, based on the value of the alphabet size q. For small q, our constructions have O(n^(2t) q^t) encoding/decoding complexity. For large q, we take a different approach and the construction has polynomial time complexity.https://authors.library.caltech.edu/records/df556-wb795Coding for Optimized Writing Rate in DNA Storage
https://resolver.caltech.edu/CaltechAUTHORS:20200511-090541146
Authors: Jain, Siddharth; Farnoud (Hassanzadeh), Farzad; Schwartz, Moshe; Bruck, Jehoshua
Year: 2020
DOI: 10.1109/ISIT44484.2020.9174253
A method for encoding information in DNA sequences is described. The method is based on the precisionresolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writing rate. Additionally, the encoding scheme studied here takes into account the existence of multiple copies of the DNA sequence, which are independently distorted. Finally, quantizers for various run-length distributions are designed.https://authors.library.caltech.edu/records/vsvcw-7sn61CodNN – Robust Neural Networks From Coded Classification
https://resolver.caltech.edu/CaltechAUTHORS:20200427-091804171
Authors: Raviv, Netanel; Jain, Siddharth; Upadhyaya, Pulakesh; Bruck, Jehoshua; Jiang, Anxiao (Andrew)
Year: 2020
DOI: 10.1109/ISIT44484.2020.9174480
Deep Neural Networks (DNNs) are a revolutionary force in the ongoing information revolution, and yet their intrinsic properties remain a mystery. In particular, it is widely known that DNNs are highly sensitive to noise, whether adversarial or random. This poses a fundamental challenge for hardware implementations of DNNs, and for their deployment in critical applications such as autonomous driving.In this paper we construct robust DNNs via error correcting codes. By our approach, either the data or internal layers of the DNN are coded with error correcting codes, and successful computation under noise is guaranteed. Since DNNs can be seen as a layered concatenation of classification tasks, our research begins with the core task of classifying noisy coded inputs, and progresses towards robust DNNs.We focus on binary data and linear codes. Our main result is that the prevalent parity code can guarantee robustness for a large family of DNNs, which includes the recently popularized binarized neural networks. Further, we show that the coded classification problem has a deep connection to Fourier analysis of Boolean functions.In contrast to existing solutions in the literature, our results do not rely on altering the training process of the DNN, and provide mathematically rigorous guarantees rather than experimental evidence.https://authors.library.caltech.edu/records/psvjm-vmv70Trace Reconstruction with Bounded Edit Distance
https://resolver.caltech.edu/CaltechAUTHORS:20211110-153719711
Authors: Sima, Jin; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/isit45174.2021.9518244
The trace reconstruction problem studies the number of noisy samples needed to recover an unknown string x ∈ {0,1}^n with high probability, where the samples are independently obtained by passing x through a random deletion channel with deletion probability q. The problem is receiving significant attention recently due to its applications in DNA sequencing and DNA storage. Yet, there is still an exponential gap between upper and lower bounds for the trace reconstruction problem. In this paper we study the trace reconstruction problem when x is confined to an edit distance ball of radius k, which is essentially equivalent to distinguishing two strings with edit distance at most k. It is shown that n^(O(k)) samples suffice to achieve this task with high probability.https://authors.library.caltech.edu/records/2hnx9-xmk44Synthesizing New Expertise via Collaboration
https://resolver.caltech.edu/CaltechAUTHORS:20211110-153150519
Authors: Mazaheri, Bijan; Jain, Siddharth; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/isit45174.2021.9517822
Consider a set of classes and an uncertain input. Suppose, we do not have access to data and only have knowledge of perfect experts between a few classes in the set. What constitutes a consistent set of opinions? How can we use this to predict the opinions of experts on missing sub-domains? In this paper, we define a framework to analyze this problem. In particular, we define an expert graph where vertices represent classes and edges represent binary experts on the topics of their vertices. We derive necessary conditions for an expert graph to be valid. Further, we show that these conditions are also sufficient if the graph is a cycle, which can yield unintuitive results. Using these conditions, we provide an algorithm to obtain upper and lower bounds on the weights of unknown edges in an expert graph.https://authors.library.caltech.edu/records/458ja-2qy93Neural Network Computations with DOMINATION Functions
https://resolver.caltech.edu/CaltechAUTHORS:20211110-155100881
Authors: Kilic, Kordag Mehmet; Bruck, Jehoshua
Year: 2021
DOI: 10.1109/isit45174.2021.9517872
We study a new representation of neural networks based on DOMINATION functions. Specifically, we show that a threshold function can be computed by its variables connected via an unweighted bipartite graph to a universal gate computing a DOMINATION function. The DOMINATION function consists of fixed weights that are ascending powers of 2. We derive circuit-size upper and lower bounds for circuits with small weights that compute DOMINATION functions. Interestingly, the circuit-size bounds are dependent on the sparsity of the bipartite graph. In particular, functions with sparsity 1 (like the EQUALITY function) can be implemented by small-size constant-weight circuits.https://authors.library.caltech.edu/records/93r5d-6xs30On Algebraic Constructions of Neural Networks with Small Weights
https://resolver.caltech.edu/CaltechAUTHORS:20220804-765672000
Authors: Kilic, Kordag Mehmet; Sima, Jin; Bruck, Jehoshua
Year: 2022
DOI: 10.1109/isit50566.2022.9834401
Neural gates compute functions based on weighted sums of the input variables. The expressive power of neural gates (number of distinct functions it can compute) depends on the weight sizes and, in general, large weights (exponential in the number of inputs) are required. Studying the trade-offs among the weight sizes, circuit size and depth is a well-studied topic both in circuit complexity theory and the practice of neural computation. We propose a new approach for studying these complexity trade-offs by considering a related algebraic framework. Specifically, given a single linear equation with arbitrary coefficients, we would like to express it using a system of linear equations with smaller (even constant) coefficients. The techniques we developed are based on Siegel's Lemma for the bounds, anti-concentration inequalities for the existential results and extensions of Sylvester-type Hadamard matrices for the constructions.We explicitly construct a constant weight, optimal size matrix to compute the EQUALITY function (checking if two integers expressed in binary are equal). Computing EQUALITY with a single linear equation requires exponentially large weights. In addition, we prove the existence of the best-known weight size (linear) matrices to compute the COMPARISON function (comparing between two integers expressed in binary). In the context of the circuit complexity theory, our results improve the upper bounds on the weight sizes for the best-known circuit sizes for EQUALITY and COMPARISON.https://authors.library.caltech.edu/records/39fbv-c0b37