Abstract: These lecture notes document ACM 204 as taught in Winter 2022, and they are primarily intended as a reference for students who have taken the class. The notes are prepared by student scribes with feedback from the instructor. The notes have been edited by the instructor to try to correct his own failures of presentation. All remaining errors and omissions are the fault of the instructor. Please be aware that these notes reflect material presented in a classroom, rather than a formal scholarly publication. In some places, the notes may lack appropriate citations to the literature. There is no claim that the arrangement or presentation of the material is primarily due to the instructor. The notes also contain the projects of students who wished to share their work. They received feedback and made revisions, but the projects have not been edited. They represent the students’ individual work.

No.: 2022-01
ID: CaltechAUTHORS:20220412-221559139

]]>

Abstract: Randomized block Krylov subspace methods form a powerful class of algorithms for computing the extreme eigenvalues of a symmetric matrix or the extreme singular values of a general matrix. The purpose of this paper is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for these problems. For matrices with polynomial spectral decay, the randomized block Krylov method can obtain an accurate spectral norm estimate using only a constant number of steps (that depends on the decay rate and the accuracy). Furthermore, the analysis reveals that the behavior of the algorithm depends in a delicate way on the block size. Numerical evidence confirms these predictions.

Publication: Numerische Mathematik Vol.: 150 No.: 1 ISSN: 0029-599X

ID: CaltechAUTHORS:20220107-890431800

]]>

Abstract: This paper develops a new class of algorithms for general linear systems and eigenvalue problems. These algorithms apply fast randomized sketching to accelerate subspace projection methods, such as GMRES and Rayleigh--Ritz. This approach offers great flexibility in designing the basis for the approximation subspace, which can improve scalability in many computational environments. The resulting algorithms outperform the classic methods with minimal loss of accuracy. For model problems, numerical experiments show large advantages over MATLAB's optimized routines, including a 100× speedup over gmres and a 10× speedup over eigs.

No.: 2021-01
ID: CaltechAUTHORS:20220909-161413582

]]>

Abstract: This paper develops a new storage-optimal algorithm that provably solves almost all semidefinite programs (SDPs). This method is particularly effective for weakly constrained SDPs under appropriate regularity conditions. The key idea is to formulate an approximate complementarity principle: Given an approximate solution to the dual SDP, the primal SDP has an approximate solution whose range is contained in the eigenspace with small eigenvalues of the dual slack matrix. For weakly constrained SDPs, this eigenspace has very low dimension, so this observation significantly reduces the search space for the primal solution. This result suggests an algorithmic strategy that can be implemented with minimal storage: (1) solve the dual SDP approximately; (2) compress the primal SDP to the eigenspace with small eigenvalues of the dual slack matrix; (3) solve the compressed primal SDP. The paper also provides numerical experiments showing that this approach is successful for a range of interesting large-scale SDPs.

Publication: SIAM Journal of Optimization Vol.: 31 No.: 4 ISSN: 1052-6234

ID: CaltechAUTHORS:20220204-680252000

]]>

Abstract: We develop an approach to recover the underlying properties of fluid-dynamical processes from sparse measurements. We are motivated by the task of imaging the stochastically evolving environment surrounding black holes, and demonstrate how flow parameters can be estimated from sparse interferometric measurements used in radio astronomical imaging. To model the stochastic flow we use spatio-temporal Gaussian Random Fields (GRFs). The high dimensionality of the underlying source video makes direct representation via a GRF’s full covariance matrix intractable. In contrast, stochastic partial differential equations are able to capture correlations at multiple scales by specifying only local interaction coefficients. Our approach estimates the coefficients of a space-time diffusion equation that dictates the stationary statistics of the dynamical process. We analyze our approach on realistic simulations of black hole evolution and demonstrate its advantage over state-of-the-art dynamic black hole imaging techniques.

ID: CaltechAUTHORS:20220307-188412000

]]>

Abstract: This paper develops nonasymptotic growth and concentration bounds for a product of independent random matrices. These results sharpen and generalize recent work of Henriksen–Ward, and they are similar in spirit to the results of Ahlswede–Winter and of Tropp for a sum of independent random matrices. The argument relies on the uniform smoothness properties of the Schatten trace classes.

Publication: Foundations of Computational MathematicsISSN: 1615-3375

ID: CaltechAUTHORS:20201218-154434116

]]>

Abstract: This paper deduces exponential matrix concentration from a Poincaré inequality via a short, conceptual argument. Among other examples, this theory applies to matrix-valued functions of a uniformly log-concave random vector. The proof relies on the subadditivity of Poincaré inequalities and a chain rule inequality for the trace of the matrix Dirichlet form. It also uses a symmetrization technique to avoid difficulties associated with a direct extension of the classic scalar argument.

Publication: Bernoulli Vol.: 27 No.: 3 ISSN: 1350-7265

ID: CaltechAUTHORS:20201218-154430753

]]>

Abstract: Random projections reduce the dimension of a set of vectors while preserving structural information, such as distances between vectors in the set. This paper proposes a novel use of row-product random matrices in random projection, where we call it Tensor Random Projection (TRP). It requires substantially less memory than existing dimension reduction maps. The TRP map is formed as the Khatri-Rao product of several smaller random projections, and is compatible with any base random projection including sparse maps, which enable dimension reduction with very low query cost and no floating point operations. We also develop a reduced variance extension. We provide a theoretical analysis of the bias and variance of the TRP, and a non-asymptotic error analysis for a TRP composed of two smaller maps. Experiments on both synthetic and MNIST data show that our method performs as well as conventional methods with substantially less storage.

Publication: arXiv
ID: CaltechAUTHORS:20210621-223135493

]]>

Abstract: ACM 217 is a second-year graduate course on high-dimensional probability, designed for students in computing and mathematical sciences. We discuss phenomena that emerge from probability models with many degrees of freedom, tools for working with these models, and a selection of applications to computational mathematics. The Winter 2021 edition of ACM 217 is the fourth instantiation of a class that initially focused on concentration inequalities and that has expanded to include other topics in high-dimensional probability. This year, the course was more mathematical than some previous editions, with less attention to tools and applications. This slant may not serve applied students well, and it is likely that future versions of the course will strike a different balance between theory and practice. These lecture notes document ACM 217 as it was taught in Winter 2021. The notes are being transcribed by the students as part of their coursework, and they are edited lightly by the instructor. They are intended as a record for the students who have taken the course. Other readers should beware that this course is neither refined nor especially coherent. There is no warranty about correctness. Furthermore, these notes have been prepared using many sources and without appropriate scholarly citations.

No.: 2021-01
ID: CaltechAUTHORS:20220412-221302767

]]>

Abstract: Matrix concentration inequalities provide information about the probability that a random matrix is close to its expectation with respect to the ℓ₂ operator norm. This paper uses semigroup methods to derive sharp nonlinear matrix inequalities. The main result is that the classical Bakry–Émery curvature criterion implies subgaussian concentration for “matrix Lipschitz” functions. This argument circumvents the need to develop a matrix version of the log-Sobolev inequality, a technical obstacle that has blocked previous attempts to derive matrix concentration inequalities in this setting. The approach unifies and extends much of the previous work on matrix concentration. When applied to a product measure, the theory reproduces the matrix Efron–Stein inequalities due to Paulin et al. It also handles matrix-valued functions on a Riemannian manifold with uniformly positive Ricci curvature.

Publication: Electronic Journal of Probability Vol.: 26ISSN: 1083-6489

ID: CaltechAUTHORS:20201218-154427353

]]>

Abstract: Quantum simulation has wide applications in quantum chemistry and physics. Recently, scientists have begun exploring the use of randomized methods for accelerating quantum simulation. Among them, a simple and powerful technique, called qDRIFT, is known to generate random product formulas for which the average quantum channel approximates the ideal evolution. This work provides a comprehensive analysis of a single realization of the random product formula produced by qDRIFT. The main results prove that a typical realization of the randomized product formula approximates the ideal unitary evolution up to a small diamond-norm error. The gate complexity is independent of the number of terms in the Hamiltonian, but it depends on the system size and the sum of the interaction strengths in the Hamiltonian. Remarkably, the same random evolution starting from an arbitrary, but fixed, input state yields a much shorter circuit suitable for that input state. If the observable is also fixed, the same random evolution provides an even shorter product formula. The proofs depend on concentration inequalities for vector and matrix martingales. Numerical experiments verify the theoretical predictions.

Publication: arXiv
ID: CaltechAUTHORS:20201218-154423869

]]>

Abstract: The intrinsic volumes are measures of the content of a convex body. This paper applies probabilistic and information-theoretic methods to study the sequence of intrinsic volumes. The main result states that the intrinsic volume sequence concentrates sharply around a specific index, called the central intrinsic volume. Furthermore, among all convex bodies whose central intrinsic volume is fixed, an appropriately scaled cube has the intrinsic volume sequence with maximum entropy.

Publication: Quantum Potential Theory No.: 2266 ISSN: 0075-8434

ID: CaltechAUTHORS:20200821-083421883

]]>

Abstract: The intrinsic volumes of a convex body are fundamental invariants that capture information about the average volume of the projection of the convex body onto a random subspace of fixed dimension. The intrinsic volumes also play a central role in integral geometry formulas that describe how moving convex bodies interact. Recent work has demonstrated that the sequence of intrinsic volumes concentrates sharply around its centroid, which is called the central intrinsic volume. The purpose of this paper is to derive finer concentration inequalities for the intrinsic volumes and related sequences. These concentration results have striking implications for high-dimensional integral geometry. In particular, they uncover new phase transitions in formulas for random projections, rotation means, random slicing, and the kinematic formula. In each case, the location of the phase transition is determined by reducing each convex body to a single summary parameter.

No.: 2020-01
ID: CaltechAUTHORS:20220829-181401723

]]>

Abstract: Projected least squares is an intuitive and numerically cheap technique for quantum state tomography: compute the least-squares estimator and project it onto the space of states. The main result of this paper equips this point estimator with rigorous, non-asymptotic convergence guarantees expressed in terms of the trace distance. The estimator's sample complexity is comparable to the strongest convergence guarantees available in the literature and—in the case of the uniform POVM—saturates fundamental lower bounds. Numerical simulations support these competitive features.

Publication: Journal of Physics A: Mathematical and General Vol.: 53 No.: 20 ISSN: 0305-4470

ID: CaltechAUTHORS:20190212-160252658

]]>

Abstract: This survey describes probabilistic algorithms for linear algebraic computations, such as factorizing matrices and solving linear systems. It focuses on techniques that have a proven track record for real-world problems. The paper treats both the theoretical foundations of the subject and practical computational issues. Topics include norm estimation, matrix approximation by sampling, structured and unstructured random embeddings, linear regression problems, low-rank approximation, subspace iteration and Krylov methods, error estimation and adaptivity, interpolatory and CUR factorizations, Nyström approximation of positive semidefinite matrices, single-view (‘streaming’) algorithms, full rank-revealing factorizations, solvers for linear systems, and approximation of kernel matrices that arise in machine learning and in scientific computing.

Publication: Acta Numerica Vol.: 29ISSN: 0962-4929

ID: CaltechAUTHORS:20201217-104322985

]]>

Abstract: ACM 204 is a graduate course on randomized algorithms for matrix computations. It was taught for the first time in Winter 2020. The course begins with Monte Carlo algorithms for trace estimation. This is a relatively simple setting that allows us to explore how randomness can be used for matrix computations. We continue with a discussion of the randomized power method and the Lanczos method for estimating the largest eigenvalue of a symmetric matrix. For these algorithms, the randomized starting point regularizes the trajectory of the iterations. The Lanczos iteration and randomized trace estimation fuse together in the stochastic Lanczos quadrature method for estimating the trace of a matrix function. Then we turn to Monte Carlo sampling methods for matrix approximation. This approach is justified by the matrix Bernstein inequality, a powerful tool for matrix approximation. As a simple example, we develop sampling methods for approximate matrix multiplication. In the next part of the course, we study random linear embeddings. These are random matrices that can reduce the dimension of a dataset while approximately preserving its geometry. First, we treat Gaussian embeddings in detail, and then we discuss structured embeddings that can be implemented using fewer computational resources. Afterward, we describe several ways to use random embeddings to solve over-determined least-squares problems. We continue with a detailed treatment of the randomized SVD algorithm, the most widely used technique from this area. We give a complete a priori analysis with detailed error bounds. Then we show how to modify this algorithm for the streaming setting, where the matrix is presented as a sequence of linear updates. Last, we show how to develop an effective algorithm for selecting influential columns and rows from a matrix to obtain skeleton or CUR factorizations. The next section of the course studies kernel matrices that arise in high-dimensional data analysis. We discuss positive-definite kernels and outline the computational issues associated with solving linear algebra problems involving kernels. We introduce random feature approximations and Nyström approximations based on randomized sampling. This area is still not fully developed. The last part of the course gives a complete presentation of the sparse Cholesky algorithm of Kyng & Sachdeva [KS16], including a full proof of correctness.

No.: 2020-01
ID: CaltechAUTHORS:20210421-101607288

]]>

Abstract: Semidefinite programming (SDP) is a powerful framework from convex optimization that has striking potential for data science applications. This paper develops a provably correct algorithm for solving large SDP problems by economizing on both the storage and the arithmetic costs. Numerical evidence shows that the method is effective for a range of applications, including relaxations of MaxCut, abstract phase retrieval, and quadratic assignment. Running on a laptop, the algorithm can handle SDP instances where the matrix variable has over 10¹³ entries.

Publication: arXiv
ID: CaltechAUTHORS:20201218-154437706

]]>

Abstract: This paper studies the problem of decomposing a low-rank positive-semidefinite matrix into symmetric factors with binary entries, either {±1} or {0,1}. This research answers fundamental questions about the existence and uniqueness of these decompositions. It also leads to tractable factorization algorithms that succeed under a mild deterministic condition. A companion paper addresses the related problem of decomposing a low-rank rectangular matrix into a binary factor and an unconstrained factor.

Publication: arXiv
ID: CaltechAUTHORS:20201218-154441081

]]>

Abstract: This paper studies the problem of decomposing a low-rank matrix into a factor with binary entries, either from {±1} or from {0,1}, and an unconstrained factor. The research answers fundamental questions about the existence and uniqueness of these decompositions. It also leads to tractable factorization algorithms that succeed under a mild deterministic condition. This work builds on a companion paper that addresses the related problem of decomposing a low-rank positive-semidefinite matrix into symmetric binary factors.

Publication: arXiv
ID: CaltechAUTHORS:20201218-154444454

]]>

Abstract: This paper argues that randomized linear sketching is a natural tool for on-the-fly compression of data matrices that arise from large-scale scientific simulations and data collection. The technical contribution consists in a new algorithm for constructing an accurate low-rank approximation of a matrix from streaming data. This method is accompanied by an a priori analysis that allows the user to set algorithm parameters with confidence and an a posteriori error estimator that allows the user to validate the quality of the reconstructed matrix. In comparison to previous techniques, the new method achieves smaller relative approximation errors and is less sensitive to parameter choices. As concrete applications, the paper outlines how the algorithm can be used to compress a Navier--Stokes simulation and a sea surface temperature dataset.

Publication: SIAM Journal on Scientific Computing Vol.: 41 No.: 4 ISSN: 1064-8275

ID: CaltechAUTHORS:20190920-085459909

]]>

Abstract: These lecture notes were written to support the short course Matrix Concentration & Computational Linear Algebra delivered by the author at École Normale Supérieure in Paris from 1–5 July 2019 as part of the summer school “High-dimensional probability and algorithms.” The aim of this course is to present some practical computational applications of matrix concentration.

No.: 2019-01
ID: CaltechAUTHORS:20190715-125341188

]]>

Abstract: ACM 204, Winter 2019

No.: 2019-02
ID: CaltechAUTHORS:20220412-220319430

]]>

Abstract: This paper describes new algorithms for constructing a low-rank approximation of an input matrix from a sketch, a random low-dimensional linear image of the matrix. These algorithms come with rigorous performance guarantees. Empirically, the proposed methods achieve significantly smaller relative errors than other approaches that have appeared in the literature. For a concrete application, the paper outlines how the algorithms support on-the-fly compression of data from a direct Navier-Stokes (DNS) simulation.

No.: 2018-01
ID: CaltechAUTHORS:20220826-183609942

]]>

Abstract: This paper concerns the facial geometry of the set of n×n correlation matrices. The main result states that almost every set of r vertices generates a simplicial face, provided that r ≤ √cn, where c is an absolute constant. This bound is qualitatively sharp because the set of correlation matrices has no simplicial face generated by more than √2n vertices.

Publication: Discrete and Computational Geometry Vol.: 60 No.: 2 ISSN: 0179-5376

ID: CaltechAUTHORS:20180103-154009197

]]>

Abstract: Randomized block Krylov subspace methods form a powerful class of algorithms for computing the (extreme) eigenvalues and singular values of a matrix. The purpose of this paper is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for these problems. The results demonstrate that, for many matrices, it is possible to obtain an accurate spectral norm estimate using only a constant number of steps of the randomized block Krylov method. Furthermore, the analysis reveals that the behavior of the algorithm depends in a delicate way on the block size. Randomized block Krylov subspace methods are a powerful class of algorithms for computing information about the spectrum of a matrix. The purpose of this note is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for estimating a number of extreme eigenvalues. The results demonstrate that, for many matrices, it is possible to obtain accurate approximations using only a constant number of steps of the randomized block Krylov method. Randomized block Krylov subspace methods are a powerful class of techniques for computing information about the spectrum of a matrix. The purpose of this paper is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for computing a low-rank approximation of a matrix. The results demonstrate that, for many matrices, it is possible to obtain accurate approximations using only a constant number of steps of the randomized block Krylov method.

No.: 2018-02
ID: CaltechAUTHORS:20210624-180721369

]]>

Abstract: This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image, or sketch, of the matrix. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.

Publication: SIAM Journal on Matrix Analysis and Applications Vol.: 38 No.: 4 ISSN: 0895-4798

ID: CaltechAUTHORS:20180111-134219270

]]>

Abstract: Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nyström approximation with a novel mechanism for rank truncation. Theoretical analysis establishes that the proposed method can achieve any prescribed relative error in the Schatten 1-norm and that it exploits the spectral decay of the input matrix. Computer experiments show that the proposed method dominates alternative techniques for fixed-rank psd matrix approximation across a wide range of examples.

ID: CaltechAUTHORS:20180829-073029156

]]>

Abstract: This recording is for the presentation titled, “Sketchy decisions: convex optimization with optimal storage”, part of the SPIE symposium on “SPIE Optical Engineering + Applications"

No.: 10394
ID: CaltechAUTHORS:20190826-161640394

]]>

Abstract: Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nyström approximation with a novel mechanism for rank truncation. Theoretical analysis establishes that the proposed method can achieve any prescribed relative error in the Schatten 1-norm and that it exploits the spectral decay of the input matrix. Computer experiments show that the proposed method dominates alternative techniques for fixed-rank psd matrix approximation across a wide range of examples.

No.: 2017-03
ID: CaltechAUTHORS:20170620-081901312

]]>

Abstract: This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to store only a small randomized sketch of the matrix variable. After the optimization terminates, the algorithm extracts a low-rank approximation of the solution from the sketch. In contrast to nonconvex heuristics, the guarantees for SketchyCGM do not rely on statistical models for the problem data. Numerical work demonstrates the benefits of SketchyCGM over heuristics.

ID: CaltechAUTHORS:20180828-145534045

]]>

Abstract: Demixing is the problem of identifying multiple structured signals from a superimposed, undersampled, and noisy observation. This work analyzes a general framework, based on convex optimization, for solving demixing problems. When the constituent signals follow a generic incoherence model, this analysis leads to precise recovery guarantees. These results admit an attractive interpretation: each signal possesses an intrinsic degrees-of-freedom parameter, and demixing can succeed if and only if the dimension of the observation exceeds the total degrees of freedom present in the observation.

No.: 2017-02
ID: CaltechAUTHORS:20170314-110228775

]]>

Abstract: A mathematical introduction to compressive sensing by Simon Foucart and Holger Rauhut [FR13] is about sparse solutions to systems of random linear equations. To begin, let me describe some striking phenomena that take place in this context. Afterward, I shall try to explain why these facts have captivated so many researchers over the last decade. I shall conclude with some comments on the book.

Publication: Bulletin of the American Mathematical Society Vol.: 54 No.: 1 ISSN: 0273-0979

ID: CaltechAUTHORS:20170913-080352752

]]>

Abstract: This paper develops a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by computer experiments.

No.: 2017-01
ID: CaltechAUTHORS:20170215-154809329

]]>

Abstract: In contemporary applied and computational mathematics, a frequent challenge is to bound the expectation of the spectral norm of a sum of independent random matrices. This quantity is controlled by the norm of the expected square of the random matrix and the expectation of the maximum squared norm achieved by one of the summands; there is also a weak dependence on the dimension of the random matrix. The purpose of this paper is to give a complete, elementary proof of this important inequality.

No.: 71 ISSN: 1050-6977

ID: CaltechAUTHORS:20170214-075417526

]]>

Abstract: This paper establishes new concentration inequalities for random matrices constructed from independent random variables. These results are analogous with the generalized Efron–Stein inequalities developed by Boucheron et al. The proofs rely on the method of exchangeable pairs.

Publication: Annals of Probability Vol.: 44 No.: 5 ISSN: 0091-1798

ID: CaltechAUTHORS:20161103-151616710

]]>

Abstract: Dimension reduction is the process of embedding high-dimensional data into a lower dimensional space to facilitate its analysis. In the Euclidean setting, one fundamental technique for dimension reduction is to apply a random linear map to the data. This dimension reduction procedure succeeds when it preserves certain geometric features of the set. The question is how large the embedding dimension must be to ensure that randomized dimension reduction succeeds with high probability. This paper studies a natural family of randomized dimension reduction maps and a large class of data sets. It proves that there is a phase transition in the success probability of the dimension reduction map as the embedding dimension increases. For a given data set, the location of the phase transition is the same for all maps in this family. Furthermore, each map has the same stability properties, as quantified through the restricted minimum singular value. These results can be viewed as new universality laws in high-dimensional stochastic geometry. Universality laws for randomized dimension reduction have many applications in applied mathematics, signal processing, and statistics. They yield design principles for numerical linear algebra algorithms, for compressed sensing measurement ensembles, and for random linear codes. Furthermore, these results have implications for the performance of statistical estimation methods under a large class of random experimental designs.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112137332

]]>

Abstract: This paper establishes that every positive-definite matrix can be written as a positive linear combination of outer products of integer-valued vectors whose entries are bounded by the geometric mean of the condition number and the dimension of the matrix.

Publication: SIAM Journal on Discrete Mathematics Vol.: 29 No.: 4 ISSN: 0895-4801

ID: CaltechAUTHORS:20160115-141425609

]]>

Abstract: This paper proposes a tradeoff between computational time, sample complexity, and statistical accuracy that applies to statistical estimators based on convex optimization. When we have a large amount of data, we can exploit excess samples to decrease statistical risk, to decrease computational cost, or to trade off between the two. We propose to achieve this tradeoff by varying the amount of smoothing applied to the optimization problem. This work uses regularized linear regression as a case study to argue for the existence of this tradeoff both theoretically and experimentally. We also apply our method to describe a tradeoff in an image interpolation problem.

Publication: IEEE Journal of Selected Topics in Signal Processing Vol.: 9 No.: 4 ISSN: 1932-4553

ID: CaltechAUTHORS:20150611-103104163

]]>

Abstract: Random matrices now play a role in many areas of theoretical, applied, and computational mathematics. Therefore, it is desirable to have tools for studying random matrices that are flexible, easy to use, and powerful. Over the last fifteen years, researchers have developed a remarkable family of results, called matrix concentration inequalities, that achieve all of these goals. This monograph offers an invitation to the field of matrix concentration inequalities. It begins with some history of random matrix theory; it describes a flexible model for random matrices that is suitable for many problems; and it discusses the most important matrix concentration results. To demonstrate the value of these techniques, the presentation includes examples drawn from statistics, machine learning, optimization, combinatorics, algorithms, scientific computing, and beyond.

Publication: Foundations and Trends in Machine Learning Vol.: 8 No.: 1-2 ISSN: 1935-8237

ID: CaltechAUTHORS:20150714-140245621

]]>

Abstract: Ptychography is a powerful computational imaging technique that transforms a collection of low-resolution images into a high-resolution sample reconstruction. Unfortunately, algorithms that currently solve this reconstruction problem lack stability, robustness, and theoretical guarantees. Recently, convex optimization algorithms have improved the accuracy and reliability of several related reconstruction efforts. This paper proposes a convex formulation of the ptychography problem. This formulation has no local minima, it can be solved using a wide range of algorithms, it can incorporate appropriate noise models, and it can include multiple a priori constraints. The paper considers a specific algorithm, based on low-rank factorization, whose runtime and memory usage are near-linear in the size of the output image. Experiments demonstrate that this approach offers a 25% lower background variance on average than alternating projections, the ptychographic reconstruction algorithm that is currently in widespread use.

Publication: New Journal of Physics Vol.: 17 No.: 5 ISSN: 1367-2630

ID: CaltechAUTHORS:20150619-160809918

]]>

Abstract: Matrix concentration inequalities give bounds for the spectral-norm deviation of a random matrix from its expected value. These results have a weak dimensional dependence that is sometimes, but not always, necessary. This paper identifies one of the sources of the dimensional term and exploits this insight to develop sharper matrix concentration inequalities. In particular, this analysis delivers two refinements of the matrix Khintchine inequality that use information beyond the matrix variance to reduce or eliminate the dimensional dependence.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112133957

]]>

Abstract: Consider a data set of vector-valued observations that consists of noisy inliers, which are explained well by a low-dimensional subspace, along with some number of outliers. This work describes a convex optimization problem, called reaper, that can reliably fit a low-dimensional model to this type of data. This approach parameterizes linear subspaces using orthogonal projectors and uses a relaxation of the set of orthogonal projectors to reach the convex formulation. The paper provides an efficient algorithm for solving the reaper problem, and it documents numerical experiments that confirm that reaper can dependably find linear structure in synthetic and natural data. In addition, when the inliers lie near a low-dimensional subspace, there is a rigorous theory that describes when reaper can approximate this subspace.

Publication: Foundations of Computational Mathematics Vol.: 15 No.: 2 ISSN: 1615-3375

ID: CaltechAUTHORS:20150416-134303719

]]>

Abstract: This chapter develops a theoretical analysis of the convex programming method for recovering a structured signal from independent random linear measurements. This technique delivers bounds for the sampling complexity that are similar to recent results for standard Gaussian measurements, but the argument applies to a much wider class of measurement ensembles. To demonstrate the power of this approach, the chapter presents a short analysis of phase retrieval by trace-norm minimization. The key technical tool is a framework, due to Mendelson and coauthors, for bounding a nonnegative empirical process.

Vol.: I
ID: CaltechAUTHORS:20160818-084011981

]]>

Abstract: An analytical framework for studying the logarithmic region of turbulent channels is formulated. We build on recent findings (Moarref et al., J. Fluid Mech., 734, 2013) that the velocity fluctuations in the logarithmic region can be decomposed into a weighted sum of geometrically self-similar resolvent modes. The resolvent modes and the weights represent the linear amplification mechanisms and the scaling influence of the nonlinear interactions in the Navier-Stokes equations (NSE), respectively (McKeon & Sharma, J. Fluid Mech., 658, 2010). Originating from the NSE, this framework provides an analytical support for Townsend’s attached-eddy model. Our main result is that self-similarity enables order reduction in modeling the logarithmic region by establishing a quantitative link between the self-similar structures and the velocity spectra. Specifically, the energy intensities, the Reynolds stresses, and the energy budget are expressed in terms of the resolvent modes with speeds corresponding to the top of the logarithmic region. The weights of the triad modes -the modes that directly interact via the quadratic nonlinearity in the NSE- are coupled via the interaction coefficients that depend solely on the resolvent modes (McKeon et al., Phys. Fluids, 25, 2013). We use the hierarchies of self-similar modes in the logarithmic region to extend the notion of triad modes to triad hierarchies. It is shown that the interaction coefficients for the triad modes that belong to a triad hierarchy follow an exponential function. The combination of these findings can be used to better understand the dynamics and interaction of flow structures in the logarithmic region. The compatibility of the proposed model with theoretical and experimental results is further discussed.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112157832

]]>

Abstract: Recent research indicates that many convex optimization problems with random constraints exhibit a phase transition as the number of constraints increases. For example, this phenomenon emerges in the ℓ_1 minimization method for identifying a sparse vector from random linear measurements. Indeed, the ℓ_1 approach succeeds with high probability when the number of measurements exceeds a threshold that depends on the sparsity level; otherwise, it fails with high probability. This paper provides the first rigorous analysis that explains why phase transitions are ubiquitous in random convex optimization problems. It also describes tools for making reliable predictions about the quantitative aspects of the transition, including the location and the width of the transition region. These techniques apply to regularized linear inverse problems with random measurements, to demixing problems under a random incoherence model, and also to cone programs with random affine constraints. The applied results depend on foundational research in conic geometry. This paper introduces a summary parameter, called the statistical dimension, that canonically extends the dimension of a linear subspace to the class of convex cones. The main technical result demonstrates that the sequence of intrinsic volumes of a convex cone concentrates sharply around the statistical dimension. This fact leads to accurate bounds on the probability that a randomly rotated cone shares a ray with a fixed cone.

Publication: Information and Inference Vol.: 3 No.: 3 ISSN: 2049-8772

ID: CaltechAUTHORS:20150422-120051110

]]>

Abstract: Randomized matrix sparsification has proven to be a fruitful technique for producing faster algorithms in applications ranging from graph partitioning to semidefinite programming. In the decade or so of research into this technique, the focus has been—with few exceptions—on ensuring the quality of approximation in the spectral and Frobenius norms. For certain graph algorithms, however, the ∞→1 norm may be a more natural measure of performance. This paper addresses the problem of approximating a real matrix A by a sparse random matrix X with respect to several norms. It provides the first results on approximation error in the ∞→1 and ∞→2 norms, and it uses a result of Lata la to study approximation error in the spectral norm. These bounds hold for a reasonable family of random sparsification schemes, those which ensure that the entries of X are independent and average to the corresponding entries of A. Optimality of the ∞→1 and ∞→2 error estimates is established. Concentration results for the three norms hold when the entries of X are uniformly bounded. The spectral error bound is used to predict the performance of several sparsification and quantization schemes that have appeared in the literature; the results are competitive with the performance guarantees given by earlier scheme-specific analyses.

No.: 2014-01
ID: CaltechAUTHORS:20140828-082707636

]]>

Abstract: This work introduces the minimax Laplace transform method, a modification of the cumulant-based matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random self-adjoint matrices. This machinery is used to derive eigenvalue analogs of the classical Chernoff, Bennett, and Bernstein bounds. Two examples demonstrate the efficacy of the minimax Laplace transform. The first concerns the effects of column sparsification on the spectrum of a matrix with orthonormal rows. Here, the behavior of the singular values can be described in terms of coherence-like quantities. The second example addresses the question of relative accuracy in the estimation of eigenvalues of the covariance matrix of a random process. Standard results on the convergence of sample covariance matrices provide bounds on the number of samples needed to obtain relative accuracy in the spectral norm, but these results only guarantee relative accuracy in the estimate of the maximum eigenvalue. The minimax Laplace transform argument establishes that if the lowest eigenvalues decay sufficiently fast, Ω(ε^(-2)κ^2_ℓ ℓ log p) samples, where κ_ℓ = λ_1(C)/λ_ℓ(C), are sufficient to ensure that the dominant ℓ eigenvalues of the covariance matrix of a N(0,C) random vector are estimated to within a factor of 1 ± ε with high probability.

No.: 2014-02
ID: CaltechAUTHORS:20140828-084239607

]]>

Abstract: The intrinsic volumes of a convex cone are geometric functionals that return basic structural information about the cone. Recent research has demonstrated that conic intrinsic volumes are valuable for understanding the behavior of random convex optimization problems. This paper develops a systematic technique for studying conic intrinsic volumes using methods from probability. At the heart of this approach is a general Steiner formula for cones. This result converts questions about the intrinsic volumes into questions about the projection of a Gaussian random vector onto the cone, which can then be resolved using tools from Gaussian analysis. The approach leads to new identities and bounds for the intrinsic volumes of a cone, including a near-optimal concentration inequality.

Publication: Discrete and Computational Geometry Vol.: 51 No.: 4 ISSN: 0179-5376

ID: CaltechAUTHORS:20140703-103651135

]]>

Abstract: Demixing refers to the challenge of identifying two structured signals given only the sum of the two signals and prior information about their structures. Examples include the problem of separating a signal that is sparse with respect to one basis from a signal that is sparse with respect to a second basis, and the problem of decomposing an observed matrix into a low-rank matrix plus a sparse matrix. This paper describes and analyzes a framework, based on convex optimization, for solving these demixing problems, and many others. This work introduces a randomized signal model that ensures that the two structures are incoherent, i.e., generically oriented. For an observation from this model, this approach identifies a summary statistic that reflects the complexity of a particular signal. The difficulty of separating two structured, incoherent signals depends only on the total complexity of the two structures. Some applications include (1) demixing two signals that are sparse in mutually incoherent bases, (2) decoding spread-spectrum transmissions in the presence of impulsive errors, and (3) removing sparse corruptions from a low-rank matrix. In each case, the theoretical analysis of the convex demixing method closely matches its empirical behavior.

Publication: Foundations of Computational Mathematics Vol.: 14 No.: 3 ISSN: 1615-3375

ID: CaltechAUTHORS:20140606-140701312

]]>

Abstract: We combine resolvent-mode decomposition with techniques from convex optimization to optimally approximate velocity spectra in a turbulent channel. The velocity is expressed as a weighted sum of resolvent modes that are dynamically significant, non-empirical, and scalable with Reynolds number. To optimally represent direct numerical simulations (DNS) data at friction Reynolds number 2003, we determine the weights of resolvent modes as the solution of a convex optimization problem. Using only 12 modes per wall-parallel wavenumber pair and temporal frequency, we obtain close agreement with DNS-spectra, reducing the wall-normal and temporal resolutions used in the simulation by three orders of magnitude.

Publication: Physics of Fluids Vol.: 26 No.: 5 ISSN: 1070-6631

ID: CaltechAUTHORS:20140519-153243814

]]>

Abstract: This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein’s method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables.

Publication: Annals of Probability Vol.: 42 No.: 3 ISSN: 0091-1798

ID: CaltechAUTHORS:20140605-070821101

]]>

Abstract: This paper considers a class of entropy functionals defined for random matrices, and it demonstrates that these functionals satisfy a subadditivity property. Several matrix concentration inequalities are derived as an application of this result.

Publication: Electronic Journal of Probability Vol.: 19ISSN: 1083-6489

ID: CaltechAUTHORS:20140918-084717665

]]>

Abstract: The block Kaczmarz method is an iterative scheme for solving overdetermined least-squares problems. At each step, the algorithm projects the current iterate onto the solution space of a subset of the constraints. This paper describes a block Kaczmarz algorithm that uses a randomized control scheme to choose the subset at each step. This algorithm is the first block Kaczmarz method with an (expected) linear rate of convergence that can be expressed in terms of the geometric properties of the matrix and its submatrices. The analysis reveals that the algorithm is most effective when it is given a good row paving of the matrix, a partition of the rows into well-conditioned blocks. The operator theory literature provides detailed information about the existence and construction of good row pavings. Together, these results yield an efficient block Kaczmarz scheme that applies to many overdetermined least-squares problem.

Publication: Linear Algebra and its Applications Vol.: 441ISSN: 0024-3795

ID: CaltechAUTHORS:20140207-094442254

]]>

Abstract: Compressive sampling is well-known to be a useful tool used to resolve the energetic content of signals that admit a sparse representation. The broadband temporal spectrum acquired from point measurements in wall-bounded turbulence has precluded the prior use of compressive sampling in this kind of flow, however it is shown here that the frequency content of flow fields that have been Fourier transformed in the homogeneous spatial (wall-parallel) directions is approximately sparse, giving rise to a compact representation of the velocity field. As such, compressive sampling is an ideal tool for reducing the amount of information required to approximate the velocity field. Further, success of the compressive sampling approach provides strong evidence that this representation is both physically meaningful and indicative of special properties of wall turbulence. Another advantage of compressive sampling over periodic sampling becomes evident at high Reynolds numbers, since the number of samples required to resolve a given bandwidth with compressive sampling scales as the logarithm of the dynamically significant bandwidth instead of linearly for periodic sampling. The combination of the Fourier decomposition in the wall-parallel directions, the approximate sparsity in frequency, and empirical bounds on the convection velocity leads to a compact representation of an otherwise broadband distribution of energy in the space defined by streamwise and spanwise wavenumber, frequency, and wall-normal location. The data storage requirements for reconstruction of the full field using compressive sampling are shown to be significantly less than for periodic sampling, in which the Nyquist criterion limits the maximum frequency that can be resolved. Conversely, compressive sampling maximizes the frequency range that can be recovered if the number of samples is limited, resolving frequencies up to several times higher than the mean sampling rate. It is proposed that the approximate sparsity in frequency and the corresponding structure in the spatial domain can be exploited to design simulation schemes for canonical wall turbulence with significantly reduced computational expense compared with current techniques.

Publication: Physics of Fluids Vol.: 26 No.: 1 ISSN: 1070-6631

ID: CaltechAUTHORS:20140320-104900351

]]>

Abstract: This paper proposes a tradeoff between sample complexity and computation time that applies to statistical estimators based on convex optimization. As the amount of data increases, we can smooth optimization problems more and more aggressively to achieve accurate estimates more quickly. This work provides theoretical and experimental evidence of this tradeoff for a class of regularized linear inverse problems.

No.: 27
ID: CaltechAUTHORS:20160401-170735760

]]>

Abstract: We study the Reynolds-number scaling and the geometric self-similarity of a gainbased, low-rank approximation to turbulent channel flows, determined by the resolvent formulation of McKeon & Sharma (J. Fluid Mech., vol. 658, 2010, pp. 336–382), in order to obtain a description of the streamwise turbulence intensity from direct consideration of the Navier–Stokes equations. Under this formulation, the velocity field is decomposed into propagating waves (with single streamwise and spanwise wavelengths and wave speed) whose wall-normal shapes are determined from the principal singular function of the corresponding resolvent operator. Using the accepted scalings of the mean velocity in wall-bounded turbulent flows, we establish that the resolvent operator admits three classes of wave parameters that induce universal behaviour with Reynolds number in the low-rank model, and which are consistent with scalings proposed throughout the wall turbulence literature. In addition, it is shown that a necessary condition for geometrically self-similar resolvent modes is the presence of a logarithmic turbulent mean velocity. Under the practical assumption that the mean velocity consists of a logarithmic region, we identify the scalings that constitute hierarchies of self-similar modes that are parameterized by the critical wall-normal location where the speed of the mode equals the local turbulent mean velocity. For the rank-1 model subject to broadband forcing, the integrated streamwise energy density takes a universal form which is consistent with the dominant near-wall turbulent motions. When the shape of the forcing is optimized to enforce matching with results from direct numerical simulations at low turbulent Reynolds numbers, further similarity appears. Representation of these weight functions using similarity laws enables prediction of the Reynolds number and wall-normal variations of the streamwise energy intensity at high Reynolds numbers (Re_τ ≈ 10^3–10^(10)). Results from this low-rank model of the Navier–Stokes equations compare favourably with experimental results in the literature.

Publication: Journal of Fluid Mechanics Vol.: 734ISSN: 0022-1120

ID: CaltechAUTHORS:20131121-132351952

]]>

Abstract: Demixing is the problem of identifying multiple structured signals from a superimposed, undersampled, and noisy observation. This work analyzes a general framework, based on convex optimization, for solving demixing problems. When the constituent signals follow a generic incoherence model, this analysis leads to precise recovery guarantees. These results admit an attractive interpretation: each signal possesses an intrinsic degrees-of-freedom parameter, and demixing can succeed if and only if the dimension of the observation exceeds the total degrees of freedom present in the observation.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112130540

]]>

Abstract: This paper establishes the restricted isometry property for a Gabor system generated by n^2 time–frequency shifts of a random window function in n dimensions. The sth order restricted isometry constant of the associated n × n^2 Gabor synthesis matrix is small provided that s ≤ cn^(2/3) / log^2 n. This bound provides a qualitative improvement over previous estimates, which achieve only quadratic scaling of the sparsity s with respect to n. The proof depends on an estimate for the expected supremum of a second-order chaos.

Publication: Probability Theory and Related Fields Vol.: 156 No.: 3-4 ISSN: 0178-8051

ID: CaltechAUTHORS:20130815-130448316

]]>

Abstract: We evaluate the efficacy of a gain-based rank-1 model, developed by McKeon & Sharma (J. Fluid Mech., 2010), for representing the energy spectra and the streamwise/wall-normal co-spectrum in a turbulent channel. This is motivated by our previous observation that the streamwise turbulent energy intensity is well approximated by the rank-1 model subject to a broadband forcing in the wall-parallel directions and a properly selected temporal intensity. In the present study, the evaluation is based on finding the optimal forcing spectrum that minimizes the deviation between the two-dimensional velocity spectra at different wall-normal locations obtained from direct numerical simulations at friction Reynolds number 2003 (Hoyas & Jiminénez, Phys. Fluids, 2006) and from the rank-1 model at equal Reynolds number. It is shown that the optimally forced rank-1 model captures the streamwise energy spectrum for streamwise wavelengths smaller than approximately 1000 viscous units throughout the channel. For larger wavelengths, the streamwise spectrum is matched in the outer region of the channel, i.e. wall-normal distances larger than approximately 0.15 times the channel half-height, and the mismatch close to the wall results in less than 5 percent error in the inner-scaled peak of the streamwise energy intensity. In addition, we show that the rank-1 model with optimal forcing captures the essential features of the wall-normal and spanwise spectra and the streamwise/wall-normal co-spectrum. We observe that the predicted magnitudes of the latter three spectra are smaller in the rank-1 model compared to the simulation results suggesting that a higher-order or different rank-1 model may be necessary for accurate representation of these spectra.

ID: CaltechAUTHORS:20150218-094025300

]]>

Abstract: This paper derives exponential tail bounds and polynomial moment inequalities for the spectral norm deviation of a random matrix from its mean value. The argument depends on a matrix extension of Stein's method of exchangeable pairs for concentration of measure, as introduced by Chatterjee. Recent work of Mackey et al. uses these techniques to analyze random matrices with additive structure, while the enhancements in this paper cover a wider class of matrix-valued random elements. In particular, these ideas lead to a bounded differences inequality that applies to random matrices constructed from weakly dependent random variables. The proofs require novel trace inequalities that may be of independent interest.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112127106

]]>

Abstract: This note demonstrates that it is possible to bound the expectation of an arbitrary norm of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a standard Gaussian matrix with the same dimensions. A related comparison holds for any convex function of a random matrix drawn from the Stiefel manifold. For certain norms, a reversed inequality is also valid.

Publication: Probability Theory and Related Fields Vol.: 153 No.: 3-4 ISSN: 0178-8051

ID: CaltechAUTHORS:20120820-074343297

]]>

Abstract: This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices. These results place simple and easily verifiable hypotheses on the summands, and they deliver strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of random rectangular matrices follow as an immediate corollary. The proof techniques also yield some information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable.

Publication: Foundations of Computational Mathematics Vol.: 12 No.: 4 ISSN: 1615-3375

ID: CaltechAUTHORS:20120821-072332716

]]>

Abstract: This paper provides a succinct proof of a 1973 theorem of Lieb that establishes the concavity of a certain trace function. The development relies on a deep result from quantum information theory, the joint convexity of quantum relative entropy, as well as a recent argument due to Carlen and Lieb.

Publication: Proceedings of the American Mathematical Society Vol.: 140 No.: 5 ISSN: 0002-9939

ID: CaltechAUTHORS:20120515-094709707

]]>

Abstract: In the theory of compressed sensing, restricted isometry analysis has become a standard tool for studying how efficiently a measurement matrix acquires information about sparse and compressible signals. Many recovery algorithms are known to succeed when the restricted isometry constants of the sampling matrix are small. Many potential applications of compressed sensing involve a data-acquisition process that proceeds by convolution with a random pulse followed by (nonrandom) subsampling. At present, the theoretical analysis of this measurement technique is lacking. This paper demonstrates that the sth-order restricted isometry constant is small when the number m of samples satisfies m ≳ (s logn)^(3/2), where n is the length of the pulse. This bound improves on previous estimates, which exhibit quadratic scaling.

Publication: Applied and Computational Harmonic Analysis Vol.: 32 No.: 2 ISSN: 1063-5203

ID: CaltechAUTHORS:20120302-134838002

]]>

Abstract: Covariance estimation becomes challenging in the regime where the number p of variables outstrips the number n of samples available to construct the estimate. One way to circumvent this problem is to assume that the covariance matrix is nearly sparse and to focus on estimating only the significant entries. To analyze this approach, Levina and Vershynin (2011) introduce a formalism called masked covariance estimation, where each entry of the sample covariance estimator is reweighed to reflect an a priori assessment of its importance. This paper provides a new analysis of the masked sample covariance estimator based on the matrix Laplace transform method. The main result applies to general subgaussian distributions. Specialized to the case of a Gaussian distribution, the theory offers qualitative improvements over earlier work. For example, the new results show that n = O(B log ^2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral-norm error, in contrast to the sample complexity n = O(B log ^5 p) obtained by Levina and Vershynin.

No.: 2012-01
ID: CaltechAUTHORS:20120411-102106234

]]>

Abstract: This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X ≈ CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to find a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes.

No.: 25
ID: CaltechAUTHORS:20160401-165447853

]]>

Abstract: Covariance estimation becomes challenging in the regime where the number p of variables outstrips the number n of samples available to construct the estimate. One way to circumvent this problem is to assume that the covariance matrix is nearly sparse and to focus on estimating only the significant entries. To analyze this approach, Levina and Vershynin (2011) introduce a formalism called masked covariance estimation, where each entry of the sample covariance estimator is reweighted to reflect an a priori assessment of its importance. This paper provides a short analysis of the masked sample covariance estimator by means of a matrix concentration inequality. The main result applies to general distributions with at least four moments. Specialized to the case of a Gaussian distribution, the theory offers qualitative improvements over earlier work. For example, the new results show that n = O(B log^2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral-norm error, in contrast to the sample complexity n = O(B log^5 p) obtained by Levina and Vershynin.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112123699

]]>

Abstract: This paper describes a new thresholding technique for constructing sparse principal components. Large-scale implementation issues are addressed, and a mathematical analysis describes situations where the algorithm is effective. In experiments, this method compares favorably with more sophisticated algorithms.

No.: 2011-02
ID: CaltechAUTHORS:20220826-185558571

]]>

Abstract: Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently established a natural extension of Freedman's inequality that provides tail bounds for the maximum singular value of a matrix-valued martingale. This note describes a different proof of the matrix Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other types of matrix martingales. The new techniques are adapted from recent work by the present author.

Publication: Electronic Communications in Probability Vol.: 16ISSN: 1083-589X

ID: CaltechAUTHORS:20120105-135621556

]]>

Abstract: Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently established a natural extension of Freedman's inequality that provides tail bounds for the maximum singular value of a matrix-valued martingale. This note describes a different proof of the matrix Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other types of matrix martingales. The new techniques are adapted from recent work by the present author.

Publication: Electronic Communications in Probability Vol.: 16 No.: 25 ISSN: 1083-589X

ID: CaltechAUTHORS:20110606-111746280

]]>

Abstract: This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers---for the first time---optimal constants in the estimate on the number of dimensions required for the embedding.

Publication: Advances in Adaptive Data Analysis Vol.: 3 No.: 1-2 ISSN: 1793-5369

ID: CaltechAUTHORS:20180831-112120288

]]>

Abstract: This report presents probability inequalities for sums of adapted sequences of random, self-adjoint matrices. The results frame simple, easily verifiable hypotheses on the summands, and they yield strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. The methods also specialize to sums of independent random matrices.

No.: 2011-01
ID: CaltechAUTHORS:20111012-114710310

]]>

Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

Publication: SIAM Review Vol.: 53 No.: 2 ISSN: 0036-1445

ID: CaltechAUTHORS:20111025-085943917

]]>

Abstract: This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers — for the first time — optimal constants in the estimate on the number of dimensions required for the embedding.

Publication: Advances in Adaptive Data Analysis Vol.: 3 No.: 1-2 ISSN: 1793-5369

ID: CaltechAUTHORS:20120321-092826256

]]>

Abstract: The performance of principal component analysis suffers badly in the presence of outliers. This paper proposes two novel approaches for robust principal component analysis based on semidefinite programming. The first method, maximum mean absolute deviation rounding, seeks directions of large spread in the data while damping the effect of outliers. The second method produces a low-leverage decomposition of the data that attempts to form a low-rank model for the data by separating out corrupted observations. This paper also presents efficient computational methods for solving these semidefinite programs. Numerical experiments confirm the value of these new techniques.

Publication: Electronic Journal of Statistics Vol.: 5ISSN: 1935-7524

ID: CaltechAUTHORS:20111021-161307161

]]>

Abstract: Compressive sampling (CoSa) is a new paradigm for developing data sampling technologies. It is based on the principle that many types of vector-space data are compressible, which is a term of art in mathematical signal processing. The key ideas are that randomized dimension reduction preserves the information in a compressible signal and that it is possible to develop hardware devices that implement this dimension reduction efficiently. The main computational challenge in CoSa is to reconstruct a compressible signal from the reduced representation acquired by the sampling device. This extended abstract describes a recent algorithm, called CoSaMP, that accomplishes the data recovery task. It was the first known method to offer near-optimal guarantees on resource usage.

Publication: Communications of the ACM Vol.: 53 No.: 12 ISSN: 0001-0782

ID: CaltechAUTHORS:20110201-090245482

]]>

Abstract: The goal of the sparse approximation problem is to approximate a target signal using a linear combination of a few elementary signals drawn from a fixed collection. This paper surveys the major practical algorithms for sparse approximation. Specific attention is paid to computational issues, to the circumstances in which individual methods tend to perform well, and to the theoretical guarantees available. Many fundamental questions in electrical engineering, statistics, and applied mathematics can be posed as sparse approximation problems, making these algorithms versatile and relevant to a plethora of applications.

Publication: Proceedings of the IEEE Vol.: 98 No.: 6 ISSN: 0018-9219

ID: CaltechAUTHORS:20100608-080853280

]]>

Abstract: This work presents probability inequalities for sums of independent, random, self-adjoint matrices. The results frame simple, easily verifiable hypotheses on the summands, and they yield strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of rectangular matrices follow as an immediate corollary, and similar techniques yield information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same ease of use, diversity of application, and strength of conclusion that have made the scalar inequalities so valuable.

No.: 2010-01
ID: CaltechAUTHORS:20111012-112125900

]]>

Abstract: In an incoherent dictionary, most signals that admit a sparse representation admit a unique sparse representation. In other words, there is no way to express the signal without using strictly more atoms. This work demonstrates that sparse signals typically enjoy a higher privilege: each nonoptimal representation of the signal requires far more atoms than the sparsest representation-unless it contains many of the same atoms as the sparsest representation. One impact of this finding is to confer a certain degree of legitimacy on the particular atoms that appear in a sparse representation. This result can also be viewed as an uncertainty principle for random sparse signals over an incoherent dictionary.

ID: CaltechAUTHORS:20180831-112116678

]]>

Abstract: Wideband analog signals push contemporary analog- to-digital conversion (ADC) systems to their performance limits. In many applications, however, sampling at the Nyquist rate is inefficient because the signals of interest contain only a small number of significant frequencies relative to the band limit, although the locations of the frequencies may not be known a priori. For this type of sparse signal, other sampling strategies are possible. This paper describes a new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components. Let K denote the total number of frequencies in the signal, and let W denote its band limit in hertz. Simulations suggest that the random demodulator requires just O(K log (W/K)) samples per second to stably reconstruct the signal. This sampling rate is exponentially lower than the Nyquist rate of $W$ hertz. In contrast to Nyquist sampling, one must use nonlinear methods, such as convex programming, to recover the signal from the samples taken by the random demodulator. This paper provides a detailed theoretical analysis of the system's performance that supports the empirical observations.

Publication: IEEE Transactions on Information Theory Vol.: 56 No.: 1 ISSN: 0018-9448

ID: CaltechAUTHORS:20100119-103356110

]]>

Abstract: The max-norm was proposed as a convex matrix regularizer in [1] and was shown to be empirically superior to the trace-norm for collaborative filtering problems. Although the max-norm can be computed in polynomial time, there are currently no practical algorithms for solving large-scale optimization problems that incorporate the max-norm. The present work uses a factorization technique of Burer and Monteiro [2] to devise scalable first-order algorithms for convex programs involving the max-norm. These algorithms are applied to solve huge collaborative filtering, graph cut, and clustering problems. Empirically, the new methods outperform mature techniques from all three areas.

No.: 23
ID: CaltechAUTHORS:20160331-164724199

]]>

Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. In particular, these techniques o®er a route toward principal component analysis (PCA) for petascale data. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed|either explicitly or implicitly|to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m x n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in slow memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

No.: 2009-05
ID: CaltechAUTHORS:20111012-111324407

]]>

Abstract: Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect to an orthonormal basis. The major algorithmic challenge in compressive sampling is to approximate a compressible signal from noisy samples. This paper describes a new iterative recovery algorithm called CoSaMP that delivers the same guarantees as the best optimization-based approaches. Moreover, this algorithm offers rigorous bounds on computational cost and storage. It is likely to be extremely efficient for practical problems because it requires only matrix–vector multiplies with the sampling matrix. For compressible signals, the running time is just O(Nlog^2N), where N is the length of the signal.

Publication: Applied and Computational Harmonic Analysis Vol.: 26 No.: 3 ISSN: 1063-5203

ID: CaltechAUTHORS:20090706-113128889

]]>

Abstract: In sparse approximation problems, the goal is to find an approximate representation of a target signal using a linear combination of a few elementary signals drawn from a fixed collection. This paper surveys the major algorithms that are used for solving sparse approximation problems in practice. Specific attention is paid to computational issues, to the circumstances in which individual methods tend to perform well, and to the theoretical guarantees available. Many fundamental questions in electrical engineering, statistics, and applied mathematics can be posed as sparse approximation problems, which makes the algorithms discussed in this paper versatile tools with a wealth of applications.

No.: 2009-01
ID: CaltechAUTHORS:20111011-163243421

]]>

Abstract: Given a fixed matrix, the problem of column subset selection requests a column submatrix that has favorable spectral properties. Most research from the algorithms and numerical linear algebra communities focuses on a variant called rank-revealing QR, which seeks a well-conditioned collection of columns that spans the (numerical) range of the matrix. The functional analysis literature contains another strand of work on column selection whose algorithmic implications have not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that each matrix with normalized columns contains a large column submatrix that is exceptionally well conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic. This paper presents a randomized, polynomial-time algorithm that produces the submatrix promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing the Grothendieck factorization. These ideas also result in an approximation algorithm for the (∞, 1) norm of a matrix, which is generally NP-hard to compute exactly. As an added bonus, this work reveals a surprising connection between matrix factorization and the famous maxcut semidefinite program.

ID: CaltechAUTHORS:20100921-101535590

]]>

Abstract: Periodic nonuniform sampling is a known method to sample spectrally sparse signals below the Nyquist rate. This strategy relies on the implicit assumption that the individual samplers are exposed to the entire frequency range. This assumption becomes impractical for wideband sparse signals. The current paper proposes an alternative sampling stage that does not require a full-band front end. Instead, signals are captured with an analog front end that consists of a bank of multipliers and lowpass filters whose cutoff is much lower than the Nyquist rate. The problem of recovering the original signal from the low-rate samples can be studied within the framework of compressive sampling. An appropriate parameter selection ensures that the samples uniquely determine the analog input. Moreover, the analog input can be stably reconstructed with digital algorithms. Numerical experiments support the theoretical analysis.

ID: CaltechAUTHORS:20180831-112113124

]]>

Abstract: Many problems in the theory of sparse approximation require bounds on operator norms of a random submatrix drawn from a fixed matrix. The purpose of this Note is to collect estimates for several different norms that are most important in the analysis of ℓ1 minimization algorithms. Several of these bounds have not appeared in detail.

Publication: Comptes Rendus Mathematique Vol.: 346 No.: 23-24 ISSN: 1631-073X

ID: CaltechAUTHORS:TROcrm08

]]>

Abstract: The purpose of this work is to survey what is known about the linear independence of spikes and sines. The paper provides new results for the case where the locations of the spikes and the frequencies of the sines are chosen at random. This problem is equivalent to studying the spectral norm of a random submatrix drawn from the discrete Fourier transform matrix. The proof depends on an extrapolation argument of Bourgain and Tzafriri.

Publication: Journal of Fourier Analysis and Applications Vol.: 14 No.: 5-6 ISSN: 1069-5869

ID: CaltechAUTHORS:TROjfaa08

]]>

Abstract: The two major approaches to sparse recovery are L_1-minimization and greedy methods. Recently, Needell and Vershynin developed regularized orthogonal matching pursuit (ROMP) that has bridged the gap between these two approaches. ROMP is the first stable greedy algorithm providing uniform guarantees. Even more recently, Needell and Tropp developed the stable greedy algorithm compressive sampling matching pursuit (CoSaMP). CoSaMP provides uniform guarantees and improves upon the stability bounds and RIC requirements of ROMP. CoSaMP offers rigorous bounds on computational cost and storage. In many cases, the running time is just O(N log N), where N is the ambient dimension of the signal. This review summarizes these major advances.

ID: CaltechAUTHORS:20180831-112109709

]]>

Abstract: Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect to an orthonormal basis. The major algorithmic challenge in compressive sampling is to approximate a compressible signal from noisy samples. This paper describes a new iterative recovery algorithm called CoSaMP that delivers the same guarantees as the best optimization-based approaches. Moreover, this algorithm offers rigorous bounds on computational cost and storage. It is likely to be extremely efficient for practical problems because it requires only matrix-vector multiplies with the sampling matrix. For compressible signals, the running time is just O(N log^2 N), where N is the length of the signal.

No.: 2008-01
ID: CaltechAUTHORS:20111011-160707642

]]>

Abstract: Given a fixed matrix, the problem of column subset selection requests a column submatrix that has favorable spectral properties. Most research from the algorithms and numerical linear algebra communities focuses on a variant called rank-revealing QR, which seeks a well-conditioned collection of columns that spans the (numerical) range of the matrix. The functional analysis literature contains another strand of work on column selection whose algorithmic implications have not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that each matrix with normalized columns contains a large column submatrix that is exceptionally well conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic. This paper presents a randomized, polynomial-time algorithm that produces the submatrix promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing the Grothendieck factorization. These ideas also result in a novel approximation algorithm for the (∞, 1) norm of a matrix, which is generally NP-hard to compute exactly. As an added bonus, this work reveals a surprising connection between matrix factorization and the famous MAXCUT semidefinite program.

No.: 2008-02
ID: CaltechAUTHORS:20111011-161421093

]]>

Abstract: Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

Publication: SIAM Journal on Matrix Analysis and Applications Vol.: 30 No.: 1 ISSN: 0895-4798

ID: CaltechAUTHORS:20110513-152152857

]]>

Abstract: This article describes a computational method, called the Fourier sampling algorithm, that exploits this insight [10]. The algorithm takes a small number of (correlated) random samples from a signal and processes them efficiently to produce an approximation of the DFT of the signal. The algorithm offers provable guarantees on the number of samples, the running time, and the amount of storage. As we will see, these requirements are exponentially better than the FFT for some cases of interest. This article describes in detail how to implement a version of Fourier sampling, it presents some evidence of its empirical performance, and it explains the theoretical ideas that underlie the analysis. Our hope is that this tutorial will allow engineers to apply Fourier sampling to their own problems. We also hope that it will stimulate further research on practical implementations and extensions of the algorithm.

Publication: IEEE Signal Processing Magazine Vol.: 25 No.: 2 ISSN: 1053-5888

ID: CaltechAUTHORS:GILieeespm08

]]>

Abstract: This paper describes a numerical method for finding good packings in Grassmannian manifolds equipped with various metrics. This investigation also encompasses packing in projective spaces. In each case, producing a good packing is equivalent to constructing a matrix that has certain structural and spectral properties. By alternately enforcing the structural condition and then the spectral condition, it is often possible to reach a matrix that satisfies both. One may then extract a packing from this matrix. This approach is both powerful and versatile. In cases in which experiments have been performed, the alternating projection method yields packings that compete with the best packings recorded. It also extends to problems that have not been studied numerically. For example, it can be used to produce packings of subspaces in real and complex Grassmannian spaces equipped with the Fubini–Study distance; these packings are valuable in wireless communications. One can prove that some of the novel configurations constructed by the algorithm have packing diameters that are nearly optimal.

Publication: Experimental Mathematics Vol.: 17 No.: 1 ISSN: 1058-6458

ID: CaltechAUTHORS:20180831-112106252

]]>

Abstract: This note presents a new proof of an important result due to Bourgain and Tzafriri that provides a partial solution to the Kadison-Singer problem. The result shows that every unit-norm matrix whose entries are relatively small in comparison with its dimension can be paved by a partition of constant size. That is, the coordinates can be partitioned into a constant number of blocks so that the restriction of the matrix to each block of coordinates has norm less than one half. The original proof of Bourgain and Tzafriri involves a long, delicate calculation. The new proof relies on the systematic use of symmetrization and (noncommutative) Khinchin inequalities to estimate the norms of some random matrices.

Publication: Studia Mathematica Vol.: 185 No.: 1 ISSN: 1730-6337

ID: CaltechAUTHORS:20170408-150838584

]]>

Abstract: This paper demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with $m$ nonzero entries in dimension $d$ given $ {rm O}(m ln d)$ random linear measurements of that signal. This is a massive improvement over previous results, which require ${rm O}(m^{2})$ measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems.

Publication: IEEE Transactions on Information Theory Vol.: 53 No.: 12 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit07

]]>

Abstract: This paper discusses a new class of matrix nearness problems that measure approximation error using a directed distance measure called a Bregman divergence. Bregman divergences offer an important generalization of the squared Frobenius norm and relative entropy, and they all share fundamental geometric properties. In addition, these divergences are intimately connected with exponential families of probability distributions. Therefore, it is natural to study matrix approximation problems with respect to Bregman divergences. This article proposes a framework for studying these problems, discusses some specific matrix nearness problems, and provides algorithms for solving them numerically. These algorithms apply to many classical and novel problems, and they admit a striking geometric interpretation.

Publication: SIAM Journal on Matrix Analysis and Applications Vol.: 29 No.: 4 ISSN: 0895-4798

ID: CaltechAUTHORS:DHIsiamjmaa07

]]>

Abstract: This report demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(mln d) random linear measurements of that signal. This is a massive improvement over previous results, which require O(m2) measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems.

No.: 2007-01
ID: CaltechAUTHORS:20111010-134929077

]]>

Abstract: This paper discusses random filtering, a recently proposed method for directly acquiring a compressed version of a digital signal. The technique is based on convolution of the signal with a fixed FIR filter having random taps, followed by downsampling. Experiments show that random filtering is effective at acquiring sparse and compressible signals. This process has the potential for implementation in analog hardware, and so it may have a role to play in new types of analog/digital converters.

ID: CaltechAUTHORS:TROciss06.975

]]>

Abstract: This paper develops a new method for recovering m-sparse signals that is simultaneously uniform and quick. We present a reconstruction algorithm whose run time, O(m log^2(m) log^2(d)), is sublinear in the length d of the signal. The reconstruction error is within a logarithmic factor (in m) of the optimal m-term approximation error in l_1. In particular, the algorithm recovers m-sparse signals perfectly and noisy signals are recovered with polylogarithmic distortion. Our algorithm makes O(m log^2 (d)) measurements, which is within a logarithmic factor of optimal. We also present a small-space implementation of the algorithm. These sketching techniques and the corresponding reconstruction algorithms provide an algorithmic dimension reduction in the l_1 norm. In particular, vectors of support m in dimension d can be linearly embedded into O(m log^2 d) dimensions with polylogarithmic distortion. We can reconstruct a vector from its low-dimensional sketch in time O(m log^2(m) log^2(d)). Furthermore, this reconstruction is stable and robust under small perturbations.

ID: CaltechAUTHORS:20180828-150010838

]]>

Abstract: We propose and study a new technique for efficiently acquiring and reconstructing signals based on convolution with a fixed FIR filter having random taps. The method is designed for sparse and compressible signals, i.e., ones that are well approximated by a short linear combination of vectors from an orthonormal basis. Signal reconstruction involves a non-linear Orthogonal Matching Pursuit algorithm that we implement efficiently by exploiting the nonadaptive, time-invariant structure of the measurement process. While simpler and more efficient than other random acquisition techniques like Compressed Sensing, random filtering is sufficiently generic to summarize many types of compressible signals and generalizes to streaming and continuous-time signals. Extensive numerical experiments demonstrate its efficacy for acquiring and reconstructing signals sparse in the time, frequency, and wavelet domains, as well as piecewise smooth signals and Poisson processes.

Vol.: III
ID: CaltechAUTHORS:TROicassp06.977

]]>

Abstract: Compressed Sensing uses a small number of random, linear measurements to acquire a sparse signal. Nonlinear algorithms, such as l1minimization, are used to reconstruct the signal from the measured data. This paper proposes row-action methods as a computational approach to solving the l1optimization problem. This paper presents a specific row-action method and provides extensive empirical evidence that it is an effective technique for signal reconstruction. This approach offers several advantages over interior-point methods, including minimal storage and computational requirements, scalability, and robustness.

Vol.: III
ID: CaltechAUTHORS:SRAicassp06.963

]]>

Abstract: The well-known shrinkage technique is still relevant for contemporary signal processing problems over redundant dictionaries. We present theoretical and empirical analyses for two iterative algorithms for sparse approximation that use shrinkage. The GENERAL IT algorithm amounts to a Landweber iteration with nonlinear shrinkage at each iteration step. The BLOCK IT algorithm arises in morphological components analysis. A sufficient condition for which General IT exactly recovers a sparse signal is presented, in which the cumulative coherence function naturally arises. This analysis extends previous results concerning the Orthogonal Matching Pursuit (OMP) and Basis Pursuit (BP) algorithms to IT algorithms.

Vol.: III
ID: CaltechAUTHORS:HERicassp06.876

]]>

Abstract: This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that has been contaminated with additive noise, the goal is to identify which elementary signals participated and to approximate their coefficients. Although many algorithms have been proposed, there is little theory which guarantees that these algorithms can accurately and efficiently solve the problem. This paper studies a method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program. This approach is powerful because the optimization can be completed in polynomial time with standard scientific software. The paper provides general conditions which ensure that convex relaxation succeeds. As evidence of the broad impact of these results, the paper describes how convex relaxation can be used for several concrete signal recovery problems. It also describes applications to channel coding, linear regression, and numerical analysis.

Publication: IEEE Transactions on Information Theory Vol.: 52 No.: 3 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit06

]]>

Abstract: Sparse approximation problems abound in many scientific, mathematical, and engineering applications. These problems are defined by two competing notions: we approximate a signal vector as a linear combination of elementary atoms and we require that the approximation be both as accurate and as concise as possible. We introduce two natural and direct applications of these problems and algorithmic solutions in communications. We do so by constructing enhanced codebooks from base codebooks. We show that we can decode these enhanced codebooks in the presence of Gaussian noise. For MIMO wireless communication channels, we construct simultaneous sparse approximation problems and demonstrate that our algorithms can both decode the transmitted signals and estimate the channel parameters.

ID: CaltechAUTHORS:GILisit05

]]>

Abstract: In this paper, we present new algorithms that can replace the diagonal entries of a Hermitian matrix by any set of diagonal entries that majorize the original set without altering the eigenvalues of the matrix. They perform this feat by applying a sequence of (N-1) or fewer plane rotations, where N is the dimension of the matrix. Both the Bendel-Mickey and the Chan-Li algorithms are special cases of the proposed procedures. Using the fact that a positive semidefinite matrix can always be factored as $\mtx{X^\adj X}$, we also provide more efficient versions of the algorithms that can directly construct factors with specified singular values and column norms. We conclude with some open problems related to the construction of Hermitian matrices with joint diagonal and spectral properties.

Publication: SIAM Journal on Matrix Analysis and Applications Vol.: 27 No.: 1 ISSN: 0895-4798

ID: CaltechAUTHORS:DHIsiamjmaa05

]]>

Abstract: A simple sparse approximation problem requests an approximation of a given input signal as a linear combination of T elementary signals drawn from a large, linearly dependent collection. An important generalization is simultaneous sparse approximation. Now one must approximate several input signals at once using different linear combinations of the same T elementary signals. This formulation appears, for example, when analyzing multiple observations of a sparse signal that have been contaminated with noise. A new approach to this problem is presented here: a greedy pursuit algorithm called simultaneous orthogonal matching pursuit. The paper proves that the algorithm calculates simultaneous approximations whose error is within a constant factor of the optimal simultaneous approximation error. This result requires that the collection of elementary signals be weakly correlated, a property that is also known as incoherence. Numerical experiments demonstrate that the algorithm often succeeds, even when the inputs do not meet the hypotheses of the proof.

Vol.: 5
ID: CaltechAUTHORS:TROicassp05

]]>

Abstract: This note provides a condition under which ℓ1 minimization (also known as basis pursuit) can recover short linear combinations of complex vectors chosen from fixed, overcomplete collection. This condition has already been established in the real setting by Fuchs, who used convex analysis. The proof given here is more direct.

Publication: IEEE Transactions on Information Theory Vol.: 2005 No.: 4 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit05b

]]>

Abstract: Welch bound equality (WBE) signature sequences maximize the uplink sum capacity in direct-spread synchronous code division multiple access (CDMA) systems. WBE sequences have a nice interference invariance property that typically holds only when the system is fully loaded, and, to maintain this property, the signature set must be redesigned and reassigned as the number of active users changes. An additional equiangular constraint on the signature set, however, maintains interference invariance. Finding such signatures requires equiangular side constraints to be imposed on an inverse eigenvalue problem. The paper presents an alternating projection algorithm that can design WBE sequences that satisfy equiangular side constraints. The proposed algorithm can be used to find Grassmannian frames as well as equiangular tight frames. Though one projection is onto a closed, but non-convex, set, it is shown that this algorithm converges to a fixed point, and these fixed points are partially characterized.

ID: CaltechAUTHORS:HEAissta04

]]>

Abstract: A description of optimal sequences for direct-sequence code division multiple access is a byproduct of recent characterizations of the sum capacity. The paper restates the sequence design problem as an inverse singular value problem and shows that it can be solved with finite-step algorithms from matrix analysis. Relevant algorithms are reviewed and a new one-sided construction is proposed that obtains the sequences directly instead of computing the Gram matrix of the optimal signatures.

ID: CaltechAUTHORS:TROissta04

]]>

Abstract: Tight frames, also known as general Welch-bound- equality sequences, generalize orthonormal systems. Numerous applications - including communications, coding, and sparse approximation- require finite-dimensional tight frames that possess additional structural properties. This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem. To apply this method, one needs only to solve a matrix nearness problem that arises naturally from the design specifications. Therefore, it is the fast and easy to develop versions of the algorithm that target new design problems. Alternating projection will often succeed even if algebraic constructions are unavailable. To demonstrate that alternating projection is an effective tool for frame design, the paper studies some important structural properties in detail. First, it addresses the most basic design problem: constructing tight frames with prescribed vector norms. Then, it discusses equiangular tight frames, which are natural dictionaries for sparse approximation. Finally, it examines tight frames whose individual vectors have low peak-to-average-power ratio (PAR), which is a valuable property for code-division multiple-access (CDMA) applications. Numerical experiments show that the proposed algorithm succeeds in each of these three cases. The appendices investigate the convergence properties of the algorithm.

Publication: IEEE Transactions on Information Theory Vol.: 51 No.: 1 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit05a

]]>

Abstract: A description of optimal sequences for direct-spread code-division multiple access (DS-CDMA) is a byproduct of recent characterizations of the sum capacity. This paper restates the sequence design problem as an inverse singular value problem and shows that the problem can be solved with finite-step algorithms from matrix theory. It proposes a new one-sided algorithm that is numerically stable and faster than previous methods.

Publication: IEEE Transactions on Information Theory Vol.: 50 No.: 11 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit04b

]]>

Abstract: This article presents new results on using a greedy algorithm, orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries. It provides a sufficient condition under which both OMP and Donoho's basis pursuit (BP) paradigm can recover the optimal representation of an exactly sparse signal. It leverages this theory to show that both OMP and BP succeed for every sparse input signal from a wide class of dictionaries. These quasi-incoherent dictionaries offer a natural generalization of incoherent dictionaries, and the cumulative coherence function is introduced to quantify the level of incoherence. This analysis unifies all the recent results on BP and extends them to OMP. Furthermore, the paper develops a sufficient condition under which OMP can identify atoms from an optimal approximation of a nonsparse signal. From there, it argues that OMP is an approximation algorithm for the sparse problem over a quasi-incoherent dictionary. That is, for every input signal, OMP calculates a sparse approximant whose error is only a small factor worse than the minimal error that can be attained with the same number of terms.

Publication: IEEE Transactions on Information Theory Vol.: 50 No.: 10 ISSN: 0018-9448

ID: CaltechAUTHORS:TROieeetit04a

]]>

Abstract: Several algorithms have been proposed to construct optimal signature sequences that maximize the sum capacity of the uplink in a direct-spread synchronous code division multiple access (CDMA) system. These algorithms produce signatures with real-valued or complex-valued entries that generally have a large peak-to-average power ratio (PAR). This paper presents an alternating projection algorithm that can design optimal signature sequences that satisfy PAR side constraints. This algorithm converges to a fixed point, and these fixed points are partially characterized.

Vol.: 1
ID: CaltechAUTHORS:TROasilo03

]]>

Abstract: This paper describes the matrix-theoretic ideas known as Welch-bound-equality sequences or unit-norm tight frames that are used to alternate minimizing the total squared correlation. This paper shows the construction of an optimal signature sequences for the synchronous code-division multiple-access (S-CDMA) channel in the presence of white noise and uniform received powers to solve inverse eigenvalue problems that maximize the sum capacity of the S-CDMA channel.

ID: CaltechAUTHORS:TROisit03

]]>

Abstract: This paper discusses a new greedy algorithm for solving the sparse approximation problem over quasi-incoherent dictionaries. These dictionaries consist of waveforms that are uncorrelated "on average," and they provide a natural generalization of incoherent dictionaries. The algorithm provides strong guarantees on the quality of the approximations it produces, unlike most other methods for sparse approximation. Moreover, very efficient implementations are possible via approximate nearest-neighbor data structures

Vol.: 1
ID: CaltechAUTHORS:TROicip03

]]>