Abstract: Randomized block Krylov subspace methods form a powerful class of algorithms for computing the extreme eigenvalues of a symmetric matrix or the extreme singular values of a general matrix. The purpose of this paper is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for these problems. For matrices with polynomial spectral decay, the randomized block Krylov method can obtain an accurate spectral norm estimate using only a constant number of steps (that depends on the decay rate and the accuracy). Furthermore, the analysis reveals that the behavior of the algorithm depends in a delicate way on the block size. Numerical evidence confirms these predictions.

Publication: Numerische Mathematik Vol.: 150 No.: 1 ISSN: 0029-599X

ID: CaltechAUTHORS:20220107-890431800

]]>

Abstract: This paper develops a new storage-optimal algorithm that provably solves almost all semidefinite programs (SDPs). This method is particularly effective for weakly constrained SDPs under appropriate regularity conditions. The key idea is to formulate an approximate complementarity principle: Given an approximate solution to the dual SDP, the primal SDP has an approximate solution whose range is contained in the eigenspace with small eigenvalues of the dual slack matrix. For weakly constrained SDPs, this eigenspace has very low dimension, so this observation significantly reduces the search space for the primal solution. This result suggests an algorithmic strategy that can be implemented with minimal storage: (1) solve the dual SDP approximately; (2) compress the primal SDP to the eigenspace with small eigenvalues of the dual slack matrix; (3) solve the compressed primal SDP. The paper also provides numerical experiments showing that this approach is successful for a range of interesting large-scale SDPs.

Publication: SIAM Journal of Optimization Vol.: 31 No.: 4 ISSN: 1052-6234

ID: CaltechAUTHORS:20220204-680252000

]]>

Abstract: This paper develops nonasymptotic growth and concentration bounds for a product of independent random matrices. These results sharpen and generalize recent work of Henriksen–Ward, and they are similar in spirit to the results of Ahlswede–Winter and of Tropp for a sum of independent random matrices. The argument relies on the uniform smoothness properties of the Schatten trace classes.

Publication: Foundations of Computational MathematicsISSN: 1615-3375

ID: CaltechAUTHORS:20201218-154434116

]]>

Abstract: This paper deduces exponential matrix concentration from a Poincaré inequality via a short, conceptual argument. Among other examples, this theory applies to matrix-valued functions of a uniformly log-concave random vector. The proof relies on the subadditivity of Poincaré inequalities and a chain rule inequality for the trace of the matrix Dirichlet form. It also uses a symmetrization technique to avoid difficulties associated with a direct extension of the classic scalar argument.

Publication: Bernoulli Vol.: 27 No.: 3 ISSN: 1350-7265

ID: CaltechAUTHORS:20201218-154430753

]]>

Abstract: Matrix concentration inequalities provide information about the probability that a random matrix is close to its expectation with respect to the ℓ₂ operator norm. This paper uses semigroup methods to derive sharp nonlinear matrix inequalities. The main result is that the classical Bakry–Émery curvature criterion implies subgaussian concentration for “matrix Lipschitz” functions. This argument circumvents the need to develop a matrix version of the log-Sobolev inequality, a technical obstacle that has blocked previous attempts to derive matrix concentration inequalities in this setting. The approach unifies and extends much of the previous work on matrix concentration. When applied to a product measure, the theory reproduces the matrix Efron–Stein inequalities due to Paulin et al. It also handles matrix-valued functions on a Riemannian manifold with uniformly positive Ricci curvature.

Publication: Electronic Journal of Probability Vol.: 26ISSN: 1083-6489

ID: CaltechAUTHORS:20201218-154427353

]]>

Abstract: Projected least squares is an intuitive and numerically cheap technique for quantum state tomography: compute the least-squares estimator and project it onto the space of states. The main result of this paper equips this point estimator with rigorous, non-asymptotic convergence guarantees expressed in terms of the trace distance. The estimator's sample complexity is comparable to the strongest convergence guarantees available in the literature and—in the case of the uniform POVM—saturates fundamental lower bounds. Numerical simulations support these competitive features.

Publication: Journal of Physics A: Mathematical and General Vol.: 53 No.: 20 ISSN: 0305-4470

ID: CaltechAUTHORS:20190212-160252658

]]>

Abstract: This survey describes probabilistic algorithms for linear algebraic computations, such as factorizing matrices and solving linear systems. It focuses on techniques that have a proven track record for real-world problems. The paper treats both the theoretical foundations of the subject and practical computational issues. Topics include norm estimation, matrix approximation by sampling, structured and unstructured random embeddings, linear regression problems, low-rank approximation, subspace iteration and Krylov methods, error estimation and adaptivity, interpolatory and CUR factorizations, Nyström approximation of positive semidefinite matrices, single-view (‘streaming’) algorithms, full rank-revealing factorizations, solvers for linear systems, and approximation of kernel matrices that arise in machine learning and in scientific computing.

Publication: Acta Numerica Vol.: 29ISSN: 0962-4929

ID: CaltechAUTHORS:20201217-104322985

]]>

Abstract: This paper argues that randomized linear sketching is a natural tool for on-the-fly compression of data matrices that arise from large-scale scientific simulations and data collection. The technical contribution consists in a new algorithm for constructing an accurate low-rank approximation of a matrix from streaming data. This method is accompanied by an a priori analysis that allows the user to set algorithm parameters with confidence and an a posteriori error estimator that allows the user to validate the quality of the reconstructed matrix. In comparison to previous techniques, the new method achieves smaller relative approximation errors and is less sensitive to parameter choices. As concrete applications, the paper outlines how the algorithm can be used to compress a Navier--Stokes simulation and a sea surface temperature dataset.

Publication: SIAM Journal on Scientific Computing Vol.: 41 No.: 4 ISSN: 1064-8275

ID: CaltechAUTHORS:20190920-085459909

]]>

Abstract: This paper concerns the facial geometry of the set of n×n correlation matrices. The main result states that almost every set of r vertices generates a simplicial face, provided that r ≤ √cn, where c is an absolute constant. This bound is qualitatively sharp because the set of correlation matrices has no simplicial face generated by more than √2n vertices.

Publication: Discrete and Computational Geometry Vol.: 60 No.: 2 ISSN: 0179-5376

ID: CaltechAUTHORS:20180103-154009197

]]>

Abstract: This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image, or sketch, of the matrix. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.

Publication: SIAM Journal on Matrix Analysis and Applications Vol.: 38 No.: 4 ISSN: 0895-4798

ID: CaltechAUTHORS:20180111-134219270

]]>

Abstract: A mathematical introduction to compressive sensing by Simon Foucart and Holger Rauhut [FR13] is about sparse solutions to systems of random linear equations. To begin, let me describe some striking phenomena that take place in this context. Afterward, I shall try to explain why these facts have captivated so many researchers over the last decade. I shall conclude with some comments on the book.

Publication: Bulletin of the American Mathematical Society Vol.: 54 No.: 1 ISSN: 0273-0979

ID: CaltechAUTHORS:20170913-080352752

]]>

Abstract: This paper establishes new concentration inequalities for random matrices constructed from independent random variables. These results are analogous with the generalized Efron–Stein inequalities developed by Boucheron et al. The proofs rely on the method of exchangeable pairs.

Publication: Annals of Probability Vol.: 44 No.: 5 ISSN: 0091-1798

ID: CaltechAUTHORS:20161103-151616710

]]>

Abstract: This paper establishes that every positive-definite matrix can be written as a positive linear combination of outer products of integer-valued vectors whose entries are bounded by the geometric mean of the condition number and the dimension of the matrix.

Publication: SIAM Journal on Discrete Mathematics Vol.: 29 No.: 4 ISSN: 0895-4801

ID: CaltechAUTHORS:20160115-141425609

]]>

Abstract: This paper proposes a tradeoff between computational time, sample complexity, and statistical accuracy that applies to statistical estimators based on convex optimization. When we have a large amount of data, we can exploit excess samples to decrease statistical risk, to decrease computational cost, or to trade off between the two. We propose to achieve this tradeoff by varying the amount of smoothing applied to the optimization problem. This work uses regularized linear regression as a case study to argue for the existence of this tradeoff both theoretically and experimentally. We also apply our method to describe a tradeoff in an image interpolation problem.

Publication: IEEE Journal of Selected Topics in Signal Processing Vol.: 9 No.: 4 ISSN: 1932-4553

ID: CaltechAUTHORS:20150611-103104163

]]>

Abstract: Random matrices now play a role in many areas of theoretical, applied, and computational mathematics. Therefore, it is desirable to have tools for studying random matrices that are flexible, easy to use, and powerful. Over the last fifteen years, researchers have developed a remarkable family of results, called matrix concentration inequalities, that achieve all of these goals. This monograph offers an invitation to the field of matrix concentration inequalities. It begins with some history of random matrix theory; it describes a flexible model for random matrices that is suitable for many problems; and it discusses the most important matrix concentration results. To demonstrate the value of these techniques, the presentation includes examples drawn from statistics, machine learning, optimization, combinatorics, algorithms, scientific computing, and beyond.

Publication: Foundations and Trends in Machine Learning Vol.: 8 No.: 1-2 ISSN: 1935-8237

ID: CaltechAUTHORS:20150714-140245621

]]>

Abstract: Ptychography is a powerful computational imaging technique that transforms a collection of low-resolution images into a high-resolution sample reconstruction. Unfortunately, algorithms that currently solve this reconstruction problem lack stability, robustness, and theoretical guarantees. Recently, convex optimization algorithms have improved the accuracy and reliability of several related reconstruction efforts. This paper proposes a convex formulation of the ptychography problem. This formulation has no local minima, it can be solved using a wide range of algorithms, it can incorporate appropriate noise models, and it can include multiple a priori constraints. The paper considers a specific algorithm, based on low-rank factorization, whose runtime and memory usage are near-linear in the size of the output image. Experiments demonstrate that this approach offers a 25% lower background variance on average than alternating projections, the ptychographic reconstruction algorithm that is currently in widespread use.

Publication: New Journal of Physics Vol.: 17 No.: 5 ISSN: 1367-2630

ID: CaltechAUTHORS:20150619-160809918

]]>

Abstract: Consider a data set of vector-valued observations that consists of noisy inliers, which are explained well by a low-dimensional subspace, along with some number of outliers. This work describes a convex optimization problem, called reaper, that can reliably fit a low-dimensional model to this type of data. This approach parameterizes linear subspaces using orthogonal projectors and uses a relaxation of the set of orthogonal projectors to reach the convex formulation. The paper provides an efficient algorithm for solving the reaper problem, and it documents numerical experiments that confirm that reaper can dependably find linear structure in synthetic and natural data. In addition, when the inliers lie near a low-dimensional subspace, there is a rigorous theory that describes when reaper can approximate this subspace.

Publication: Foundations of Computational Mathematics Vol.: 15 No.: 2 ISSN: 1615-3375

ID: CaltechAUTHORS:20150416-134303719

]]>

Abstract: Recent research indicates that many convex optimization problems with random constraints exhibit a phase transition as the number of constraints increases. For example, this phenomenon emerges in the ℓ_1 minimization method for identifying a sparse vector from random linear measurements. Indeed, the ℓ_1 approach succeeds with high probability when the number of measurements exceeds a threshold that depends on the sparsity level; otherwise, it fails with high probability. This paper provides the first rigorous analysis that explains why phase transitions are ubiquitous in random convex optimization problems. It also describes tools for making reliable predictions about the quantitative aspects of the transition, including the location and the width of the transition region. These techniques apply to regularized linear inverse problems with random measurements, to demixing problems under a random incoherence model, and also to cone programs with random affine constraints. The applied results depend on foundational research in conic geometry. This paper introduces a summary parameter, called the statistical dimension, that canonically extends the dimension of a linear subspace to the class of convex cones. The main technical result demonstrates that the sequence of intrinsic volumes of a convex cone concentrates sharply around the statistical dimension. This fact leads to accurate bounds on the probability that a randomly rotated cone shares a ray with a fixed cone.

Publication: Information and Inference Vol.: 3 No.: 3 ISSN: 2049-8772

ID: CaltechAUTHORS:20150422-120051110

]]>

Abstract: The intrinsic volumes of a convex cone are geometric functionals that return basic structural information about the cone. Recent research has demonstrated that conic intrinsic volumes are valuable for understanding the behavior of random convex optimization problems. This paper develops a systematic technique for studying conic intrinsic volumes using methods from probability. At the heart of this approach is a general Steiner formula for cones. This result converts questions about the intrinsic volumes into questions about the projection of a Gaussian random vector onto the cone, which can then be resolved using tools from Gaussian analysis. The approach leads to new identities and bounds for the intrinsic volumes of a cone, including a near-optimal concentration inequality.

Publication: Discrete and Computational Geometry Vol.: 51 No.: 4 ISSN: 0179-5376

ID: CaltechAUTHORS:20140703-103651135

]]>

Abstract: Demixing refers to the challenge of identifying two structured signals given only the sum of the two signals and prior information about their structures. Examples include the problem of separating a signal that is sparse with respect to one basis from a signal that is sparse with respect to a second basis, and the problem of decomposing an observed matrix into a low-rank matrix plus a sparse matrix. This paper describes and analyzes a framework, based on convex optimization, for solving these demixing problems, and many others. This work introduces a randomized signal model that ensures that the two structures are incoherent, i.e., generically oriented. For an observation from this model, this approach identifies a summary statistic that reflects the complexity of a particular signal. The difficulty of separating two structured, incoherent signals depends only on the total complexity of the two structures. Some applications include (1) demixing two signals that are sparse in mutually incoherent bases, (2) decoding spread-spectrum transmissions in the presence of impulsive errors, and (3) removing sparse corruptions from a low-rank matrix. In each case, the theoretical analysis of the convex demixing method closely matches its empirical behavior.

Publication: Foundations of Computational Mathematics Vol.: 14 No.: 3 ISSN: 1615-3375

ID: CaltechAUTHORS:20140606-140701312

]]>

Abstract: We combine resolvent-mode decomposition with techniques from convex optimization to optimally approximate velocity spectra in a turbulent channel. The velocity is expressed as a weighted sum of resolvent modes that are dynamically significant, non-empirical, and scalable with Reynolds number. To optimally represent direct numerical simulations (DNS) data at friction Reynolds number 2003, we determine the weights of resolvent modes as the solution of a convex optimization problem. Using only 12 modes per wall-parallel wavenumber pair and temporal frequency, we obtain close agreement with DNS-spectra, reducing the wall-normal and temporal resolutions used in the simulation by three orders of magnitude.

Publication: Physics of Fluids Vol.: 26 No.: 5 ISSN: 1070-6631

ID: CaltechAUTHORS:20140519-153243814

]]>

Abstract: This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein’s method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables.

Publication: Annals of Probability Vol.: 42 No.: 3 ISSN: 0091-1798

ID: CaltechAUTHORS:20140605-070821101

]]>

Abstract: This paper considers a class of entropy functionals defined for random matrices, and it demonstrates that these functionals satisfy a subadditivity property. Several matrix concentration inequalities are derived as an application of this result.

Publication: Electronic Journal of Probability Vol.: 19ISSN: 1083-6489

ID: CaltechAUTHORS:20140918-084717665

]]>

Abstract: The block Kaczmarz method is an iterative scheme for solving overdetermined least-squares problems. At each step, the algorithm projects the current iterate onto the solution space of a subset of the constraints. This paper describes a block Kaczmarz algorithm that uses a randomized control scheme to choose the subset at each step. This algorithm is the first block Kaczmarz method with an (expected) linear rate of convergence that can be expressed in terms of the geometric properties of the matrix and its submatrices. The analysis reveals that the algorithm is most effective when it is given a good row paving of the matrix, a partition of the rows into well-conditioned blocks. The operator theory literature provides detailed information about the existence and construction of good row pavings. Together, these results yield an efficient block Kaczmarz scheme that applies to many overdetermined least-squares problem.

Publication: Linear Algebra and its Applications Vol.: 441ISSN: 0024-3795

ID: CaltechAUTHORS:20140207-094442254

]]>

Abstract: Compressive sampling is well-known to be a useful tool used to resolve the energetic content of signals that admit a sparse representation. The broadband temporal spectrum acquired from point measurements in wall-bounded turbulence has precluded the prior use of compressive sampling in this kind of flow, however it is shown here that the frequency content of flow fields that have been Fourier transformed in the homogeneous spatial (wall-parallel) directions is approximately sparse, giving rise to a compact representation of the velocity field. As such, compressive sampling is an ideal tool for reducing the amount of information required to approximate the velocity field. Further, success of the compressive sampling approach provides strong evidence that this representation is both physically meaningful and indicative of special properties of wall turbulence. Another advantage of compressive sampling over periodic sampling becomes evident at high Reynolds numbers, since the number of samples required to resolve a given bandwidth with compressive sampling scales as the logarithm of the dynamically significant bandwidth instead of linearly for periodic sampling. The combination of the Fourier decomposition in the wall-parallel directions, the approximate sparsity in frequency, and empirical bounds on the convection velocity leads to a compact representation of an otherwise broadband distribution of energy in the space defined by streamwise and spanwise wavenumber, frequency, and wall-normal location. The data storage requirements for reconstruction of the full field using compressive sampling are shown to be significantly less than for periodic sampling, in which the Nyquist criterion limits the maximum frequency that can be resolved. Conversely, compressive sampling maximizes the frequency range that can be recovered if the number of samples is limited, resolving frequencies up to several times higher than the mean sampling rate. It is proposed that the approximate sparsity in frequency and the corresponding structure in the spatial domain can be exploited to design simulation schemes for canonical wall turbulence with significantly reduced computational expense compared with current techniques.

Publication: Physics of Fluids Vol.: 26 No.: 1 ISSN: 1070-6631

ID: CaltechAUTHORS:20140320-104900351

]]>

Abstract: We study the Reynolds-number scaling and the geometric self-similarity of a gainbased, low-rank approximation to turbulent channel flows, determined by the resolvent formulation of McKeon & Sharma (J. Fluid Mech., vol. 658, 2010, pp. 336–382), in order to obtain a description of the streamwise turbulence intensity from direct consideration of the Navier–Stokes equations. Under this formulation, the velocity field is decomposed into propagating waves (with single streamwise and spanwise wavelengths and wave speed) whose wall-normal shapes are determined from the principal singular function of the corresponding resolvent operator. Using the accepted scalings of the mean velocity in wall-bounded turbulent flows, we establish that the resolvent operator admits three classes of wave parameters that induce universal behaviour with Reynolds number in the low-rank model, and which are consistent with scalings proposed throughout the wall turbulence literature. In addition, it is shown that a necessary condition for geometrically self-similar resolvent modes is the presence of a logarithmic turbulent mean velocity. Under the practical assumption that the mean velocity consists of a logarithmic region, we identify the scalings that constitute hierarchies of self-similar modes that are parameterized by the critical wall-normal location where the speed of the mode equals the local turbulent mean velocity. For the rank-1 model subject to broadband forcing, the integrated streamwise energy density takes a universal form which is consistent with the dominant near-wall turbulent motions. When the shape of the forcing is optimized to enforce matching with results from direct numerical simulations at low turbulent Reynolds numbers, further similarity appears. Representation of these weight functions using similarity laws enables prediction of the Reynolds number and wall-normal variations of the streamwise energy intensity at high Reynolds numbers (Re_τ ≈ 10^3–10^(10)). Results from this low-rank model of the Navier–Stokes equations compare favourably with experimental results in the literature.

Publication: Journal of Fluid Mechanics Vol.: 734ISSN: 0022-1120

ID: CaltechAUTHORS:20131121-132351952

]]>

Abstract: This paper establishes the restricted isometry property for a Gabor system generated by n^2 time–frequency shifts of a random window function in n dimensions. The sth order restricted isometry constant of the associated n × n^2 Gabor synthesis matrix is small provided that s ≤ cn^(2/3) / log^2 n. This bound provides a qualitative improvement over previous estimates, which achieve only quadratic scaling of the sparsity s with respect to n. The proof depends on an estimate for the expected supremum of a second-order chaos.

Publication: Probability Theory and Related Fields Vol.: 156 No.: 3-4 ISSN: 0178-8051

ID: CaltechAUTHORS:20130815-130448316

]]>

Abstract: This note demonstrates that it is possible to bound the expectation of an arbitrary norm of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a standard Gaussian matrix with the same dimensions. A related comparison holds for any convex function of a random matrix drawn from the Stiefel manifold. For certain norms, a reversed inequality is also valid.

Publication: Probability Theory and Related Fields Vol.: 153 No.: 3-4 ISSN: 0178-8051

ID: CaltechAUTHORS:20120820-074343297

]]>

Abstract: This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices. These results place simple and easily verifiable hypotheses on the summands, and they deliver strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of random rectangular matrices follow as an immediate corollary. The proof techniques also yield some information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable.

Publication: Foundations of Computational Mathematics Vol.: 12 No.: 4 ISSN: 1615-3375

ID: CaltechAUTHORS:20120821-072332716

]]>

Abstract: This paper provides a succinct proof of a 1973 theorem of Lieb that establishes the concavity of a certain trace function. The development relies on a deep result from quantum information theory, the joint convexity of quantum relative entropy, as well as a recent argument due to Carlen and Lieb.

Publication: Proceedings of the American Mathematical Society Vol.: 140 No.: 5 ISSN: 0002-9939

ID: CaltechAUTHORS:20120515-094709707

]]>

Abstract: In the theory of compressed sensing, restricted isometry analysis has become a standard tool for studying how efficiently a measurement matrix acquires information about sparse and compressible signals. Many recovery algorithms are known to succeed when the restricted isometry constants of the sampling matrix are small. Many potential applications of compressed sensing involve a data-acquisition process that proceeds by convolution with a random pulse followed by (nonrandom) subsampling. At present, the theoretical analysis of this measurement technique is lacking. This paper demonstrates that the sth-order restricted isometry constant is small when the number m of samples satisfies m ≳ (s logn)^(3/2), where n is the length of the pulse. This bound improves on previous estimates, which exhibit quadratic scaling.

Publication: Applied and Computational Harmonic Analysis Vol.: 32 No.: 2 ISSN: 1063-5203

ID: CaltechAUTHORS:20120302-134838002

]]>

Abstract: Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently established a natural extension of Freedman's inequality that provides tail bounds for the maximum singular value of a matrix-valued martingale. This note describes a different proof of the matrix Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other types of matrix martingales. The new techniques are adapted from recent work by the present author.

Publication: Electronic Communications in Probability Vol.: 16ISSN: 1083-589X

ID: CaltechAUTHORS:20120105-135621556

]]>

Abstract: Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently established a natural extension of Freedman's inequality that provides tail bounds for the maximum singular value of a matrix-valued martingale. This note describes a different proof of the matrix Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other types of matrix martingales. The new techniques are adapted from recent work by the present author.

Publication: Electronic Communications in Probability Vol.: 16 No.: 25 ISSN: 1083-589X

ID: CaltechAUTHORS:20110606-111746280

]]>

Abstract: This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers---for the first time---optimal constants in the estimate on the number of dimensions required for the embedding.

Publication: Advances in Adaptive Data Analysis Vol.: 3 No.: 1-2 ISSN: 1793-5369

ID: CaltechAUTHORS:20180831-112120288

]]>

Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

Publication: SIAM Review Vol.: 53 No.: 2 ISSN: 0036-1445

ID: CaltechAUTHORS:20111025-085943917

]]>

Abstract: This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers — for the first time — optimal constants in the estimate on the number of dimensions required for the embedding.

Publication: Advances in Adaptive Data Analysis Vol.: 3 No.: 1-2 ISSN: 1793-5369

ID: CaltechAUTHORS:20120321-092826256

]]>

Abstract: The performance of principal component analysis suffers badly in the presence of outliers. This paper proposes two novel approaches for robust principal component analysis based on semidefinite programming. The first method, maximum mean absolute deviation rounding, seeks directions of large spread in the data while damping the effect of outliers. The second method produces a low-leverage decomposition of the data that attempts to form a low-rank model for the data by separating out corrupted observations. This paper also presents efficient computational methods for solving these semidefinite programs. Numerical experiments confirm the value of these new techniques.

Publication: Electronic Journal of Statistics Vol.: 5ISSN: 1935-7524

ID: CaltechAUTHORS:20111021-161307161

]]>

Abstract: Compressive sampling (CoSa) is a new paradigm for developing data sampling technologies. It is based on the principle that many types of vector-space data are compressible, which is a term of art in mathematical signal processing. The key ideas are that randomized dimension reduction preserves the information in a compressible signal and that it is possible to develop hardware devices that implement this dimension reduction efficiently. The main computational challenge in CoSa is to reconstruct a compressible signal from the reduced representation acquired by the sampling device. This extended abstract describes a recent algorithm, called CoSaMP, that accomplishes the data recovery task. It was the first known method to offer near-optimal guarantees on resource usage.

Publication: Communications of the ACM Vol.: 53 No.: 12 ISSN: 0001-0782

ID: CaltechAUTHORS:20110201-090245482

]]>

Abstract: The goal of the sparse approximation problem is to approximate a target signal using a linear combination of a few elementary signals drawn from a fixed collection. This paper surveys the major practical algorithms for sparse approximation. Specific attention is paid to computational issues, to the circumstances in which individual methods tend to perform well, and to the theoretical guarantees available. Many fundamental questions in electrical engineering, statistics, and applied mathematics can be posed as sparse approximation problems, making these algorithms versatile and relevant to a plethora of applications.

Publication: Proceedings of the IEEE Vol.: 98 No.: 6 ISSN: 0018-9219

ID: CaltechAUTHORS:20100608-080853280

]]>

Abstract: Wideband analog signals push contemporary analog- to-digital conversion (ADC) systems to their performance limits. In many applications, however, sampling at the Nyquist rate is inefficient because the signals of interest contain only a small number of significant frequencies relative to the band limit, although the locations of the frequencies may not be known a priori. For this type of sparse signal, other sampling strategies are possible. This paper describes a new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components. Let K denote the total number of frequencies in the signal, and let W denote its band limit in hertz. Simulations suggest that the random demodulator requires just O(K log (W/K)) samples per second to stably reconstruct the signal. This sampling rate is exponentially lower than the Nyquist rate of $W$ hertz. In contrast to Nyquist sampling, one must use nonlinear methods, such as convex programming, to recover the signal from the samples taken by the random demodulator. This paper provides a detailed theoretical analysis of the system's performance that supports the empirical observations.

Publication: IEEE Transactions on Information Theory Vol.: 56 No.: 1 ISSN: 0018-9448

ID: CaltechAUTHORS:20100119-103356110

]]>

Abstract: Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect to an orthonormal basis. The major algorithmic challenge in compressive sampling is to approximate a compressible signal from noisy samples. This paper describes a new iterative recovery algorithm called CoSaMP that delivers the same guarantees as the best optimization-based approaches. Moreover, this algorithm offers rigorous bounds on computational cost and storage. It is likely to be extremely efficient for practical problems because it requires only matrix–vector multiplies with the sampling matrix. For compressible signals, the running time is just O(Nlog^2N), where N is the length of the signal.

Publication: Applied and Computational Harmonic Analysis Vol.: 26 No.: 3 ISSN: 1063-5203

ID: CaltechAUTHORS:20090706-113128889

]]>

Abstract: Many problems in the theory of sparse approximation require bounds on operator norms of a random submatrix drawn from a fixed matrix. The purpose of this Note is to collect estimates for several different norms that are most important in the analysis of ℓ1 minimization algorithms. Several of these bounds have not appeared in detail.

Publication: Comptes Rendus Mathematique Vol.: 346 No.: 23-24 ISSN: 1631-073X

ID: CaltechAUTHORS:TROcrm08

]]>

Abstract: The purpose of this work is to survey what is known about the linear independence of spikes and sines. The paper provides new results for the case where the locations of the spikes and the frequencies of the sines are chosen at random. This problem is equivalent to studying the spectral norm of a random submatrix drawn from the discrete Fourier transform matrix. The proof depends on an extrapolation argument of Bourgain and Tzafriri.

Publication: Journal of Fourier Analysis and Applications Vol.: 14 No.: 5-6 ISSN: 1069-5869

ID: CaltechAUTHORS:TROjfaa08

]]>

Abstract: Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

Publication: SIAM Journal on Matrix Analysis and Applications Vol.: 30 No.: 1 ISSN: 0895-4798

ID: CaltechAUTHORS:20110513-152152857

]]>

Abstract: This article describes a computational method, called the Fourier sampling algorithm, that exploits this insight [10]. The algorithm takes a small number of (correlated) random samples from a signal and processes them efficiently to produce an approximation of the DFT of the signal. The algorithm offers provable guarantees on the number of samples, the running time, and the amount of storage. As we will see, these requirements are exponentially better than the FFT for some cases of interest. This article describes in detail how to implement a version of Fourier sampling, it presents some evidence of its empirical performance, and it explains the theoretical ideas that underlie the analysis. Our hope is that this tutorial will allow engineers to apply Fourier sampling to their own problems. We also hope that it will stimulate further research on practical implementations and extensions of the algorithm.

Publication: IEEE Signal Processing Magazine Vol.: 25 No.: 2 ISSN: 1053-5888

ID: CaltechAUTHORS:GILieeespm08

]]>

Abstract: This paper describes a numerical method for finding good packings in Grassmannian manifolds equipped with various metrics. This investigation also encompasses packing in projective spaces. In each case, producing a good packing is equivalent to constructing a matrix that has certain structural and spectral properties. By alternately enforcing the structural condition and then the spectral condition, it is often possible to reach a matrix that satisfies both. One may then extract a packing from this matrix. This approach is both powerful and versatile. In cases in which experiments have been performed, the alternating projection method yields packings that compete with the best packings recorded. It also extends to problems that have not been studied numerically. For example, it can be used to produce packings of subspaces in real and complex Grassmannian spaces equipped with the Fubini–Study distance; these packings are valuable in wireless communications. One can prove that some of the novel configurations constructed by the algorithm have packing diameters that are nearly optimal.

Publication: Experimental Mathematics Vol.: 17 No.: 1 ISSN: 1058-6458

ID: CaltechAUTHORS:20180831-112106252

]]>

Abstract: This note presents a new proof of an important result due to Bourgain and Tzafriri that provides a partial solution to the Kadison-Singer problem. The result shows that every unit-norm matrix whose entries are relatively small in comparison with its dimension can be paved by a partition of constant size. That is, the coordinates can be partitioned into a constant number of blocks so that the restriction of the matrix to each block of coordinates has norm less than one half. The original proof of Bourgain and Tzafriri involves a long, delicate calculation. The new proof relies on the systematic use of symmetrization and (noncommutative) Khinchin inequalities to estimate the norms of some random matrices.

Publication: Studia Mathematica Vol.: 185 No.: 1 ISSN: 1730-6337

ID: CaltechAUTHORS:20170408-150838584

]]>

Abstract: This paper demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with $m$ nonzero entries in dimension $d$ given $ {rm O}(m ln d)$ random linear measurements of that signal. This is a massive improvement over previous results, which require ${rm O}(m^{2})$ measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems.

Publication: IEEE Transactions on Information Theory Vol.: 53 No.: 12 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit07

]]>

Abstract: This paper discusses a new class of matrix nearness problems that measure approximation error using a directed distance measure called a Bregman divergence. Bregman divergences offer an important generalization of the squared Frobenius norm and relative entropy, and they all share fundamental geometric properties. In addition, these divergences are intimately connected with exponential families of probability distributions. Therefore, it is natural to study matrix approximation problems with respect to Bregman divergences. This article proposes a framework for studying these problems, discusses some specific matrix nearness problems, and provides algorithms for solving them numerically. These algorithms apply to many classical and novel problems, and they admit a striking geometric interpretation.

Publication: SIAM Journal on Matrix Analysis and Applications Vol.: 29 No.: 4 ISSN: 0895-4798

ID: CaltechAUTHORS:DHIsiamjmaa07

]]>

Abstract: This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that has been contaminated with additive noise, the goal is to identify which elementary signals participated and to approximate their coefficients. Although many algorithms have been proposed, there is little theory which guarantees that these algorithms can accurately and efficiently solve the problem. This paper studies a method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program. This approach is powerful because the optimization can be completed in polynomial time with standard scientific software. The paper provides general conditions which ensure that convex relaxation succeeds. As evidence of the broad impact of these results, the paper describes how convex relaxation can be used for several concrete signal recovery problems. It also describes applications to channel coding, linear regression, and numerical analysis.

Publication: IEEE Transactions on Information Theory Vol.: 52 No.: 3 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit06

]]>

Abstract: In this paper, we present new algorithms that can replace the diagonal entries of a Hermitian matrix by any set of diagonal entries that majorize the original set without altering the eigenvalues of the matrix. They perform this feat by applying a sequence of (N-1) or fewer plane rotations, where N is the dimension of the matrix. Both the Bendel-Mickey and the Chan-Li algorithms are special cases of the proposed procedures. Using the fact that a positive semidefinite matrix can always be factored as $\mtx{X^\adj X}$, we also provide more efficient versions of the algorithms that can directly construct factors with specified singular values and column norms. We conclude with some open problems related to the construction of Hermitian matrices with joint diagonal and spectral properties.

Publication: SIAM Journal on Matrix Analysis and Applications Vol.: 27 No.: 1 ISSN: 0895-4798

ID: CaltechAUTHORS:DHIsiamjmaa05

]]>

Abstract: This note provides a condition under which ℓ1 minimization (also known as basis pursuit) can recover short linear combinations of complex vectors chosen from fixed, overcomplete collection. This condition has already been established in the real setting by Fuchs, who used convex analysis. The proof given here is more direct.

Publication: IEEE Transactions on Information Theory Vol.: 2005 No.: 4 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit05b

]]>

Abstract: Tight frames, also known as general Welch-bound- equality sequences, generalize orthonormal systems. Numerous applications - including communications, coding, and sparse approximation- require finite-dimensional tight frames that possess additional structural properties. This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem. To apply this method, one needs only to solve a matrix nearness problem that arises naturally from the design specifications. Therefore, it is the fast and easy to develop versions of the algorithm that target new design problems. Alternating projection will often succeed even if algebraic constructions are unavailable. To demonstrate that alternating projection is an effective tool for frame design, the paper studies some important structural properties in detail. First, it addresses the most basic design problem: constructing tight frames with prescribed vector norms. Then, it discusses equiangular tight frames, which are natural dictionaries for sparse approximation. Finally, it examines tight frames whose individual vectors have low peak-to-average-power ratio (PAR), which is a valuable property for code-division multiple-access (CDMA) applications. Numerical experiments show that the proposed algorithm succeeds in each of these three cases. The appendices investigate the convergence properties of the algorithm.

Publication: IEEE Transactions on Information Theory Vol.: 51 No.: 1 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit05a

]]>

Abstract: A description of optimal sequences for direct-spread code-division multiple access (DS-CDMA) is a byproduct of recent characterizations of the sum capacity. This paper restates the sequence design problem as an inverse singular value problem and shows that the problem can be solved with finite-step algorithms from matrix theory. It proposes a new one-sided algorithm that is numerically stable and faster than previous methods.

Publication: IEEE Transactions on Information Theory Vol.: 50 No.: 11 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit04b

]]>

Abstract: This article presents new results on using a greedy algorithm, orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries. It provides a sufficient condition under which both OMP and Donoho's basis pursuit (BP) paradigm can recover the optimal representation of an exactly sparse signal. It leverages this theory to show that both OMP and BP succeed for every sparse input signal from a wide class of dictionaries. These quasi-incoherent dictionaries offer a natural generalization of incoherent dictionaries, and the cumulative coherence function is introduced to quantify the level of incoherence. This analysis unifies all the recent results on BP and extends them to OMP. Furthermore, the paper develops a sufficient condition under which OMP can identify atoms from an optimal approximation of a nonsparse signal. From there, it argues that OMP is an approximation algorithm for the sparse problem over a quasi-incoherent dictionary. That is, for every input signal, OMP calculates a sparse approximant whose error is only a small factor worse than the minimal error that can be attained with the same number of terms.

Publication: IEEE Transactions on Information Theory Vol.: 50 No.: 10 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit04a

]]>