CaltechAUTHORS: Article

Greed is good: algorithmic results for sparse approximation

Year: 2004 DOI: 10.1109/TIT.2004.834793 This article presents new results on using a greedy algorithm, orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries. It provides a sufficient condition under which both OMP and Donoho's basis pursuit (BP) paradigm can recover the optimal representation of an exactly sparse signal. It leverages this theory to show that both OMP and BP succeed for every sparse input signal from a wide class of dictionaries. These quasi-incoherent dictionaries offer a natural generalization of incoherent dictionaries, and the cumulative coherence function is introduced to quantify the level of incoherence. This analysis unifies all the recent results on BP and extends them to OMP. Furthermore, the paper develops a sufficient condition under which OMP can identify atoms from an optimal approximation of a nonsparse signal. From there, it argues that OMP is an approximation algorithm for the sparse problem over a quasi-incoherent dictionary. That is, for every input signal, OMP calculates a sparse approximant whose error is only a small factor worse than the minimal error that can be attained with the same number of terms.

Finite-step algorithms for constructing optimal CDMA signature sequences

Year: 2004 DOI: 10.1109/TIT.2004.836698 A description of optimal sequences for direct-spread code-division multiple access (DS-CDMA) is a byproduct of recent characterizations of the sum capacity. This paper restates the sequence design problem as an inverse singular value problem and shows that the problem can be solved with finite-step algorithms from matrix theory. It proposes a new one-sided algorithm that is numerically stable and faster than previous methods.

Designing structured tight frames via an alternating projection method

Year: 2005 DOI: 10.1109/TIT.2004.839492 Tight frames, also known as general Welch-bound- equality sequences, generalize orthonormal systems. Numerous applications - including communications, coding, and sparse approximation- require finite-dimensional tight frames that possess additional structural properties. This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem. To apply this method, one needs only to solve a matrix nearness problem that arises naturally from the design specifications. Therefore, it is the fast and easy to develop versions of the algorithm that target new design problems. Alternating projection will often succeed even if algebraic constructions are unavailable. To demonstrate that alternating projection is an effective tool for frame design, the paper studies some important structural properties in detail. First, it addresses the most basic design problem: constructing tight frames with prescribed vector norms. Then, it discusses equiangular tight frames, which are natural dictionaries for sparse approximation. Finally, it examines tight frames whose individual vectors have low peak-to-average-power ratio (PAR), which is a valuable property for code-division multiple-access (CDMA) applications. Numerical experiments show that the proposed algorithm succeeds in each of these three cases. The appendices investigate the convergence properties of the algorithm.

Recovery of short, complex linear combinations via ℓ1 minimization

Year: 2005 DOI: 10.1109/TIT.2005.844057 This note provides a condition under which ℓ1 minimization (also known as basis pursuit) can recover short linear combinations of complex vectors chosen from fixed, overcomplete collection. This condition has already been established in the real setting by Fuchs, who used convex analysis. The proof given here is more direct.

Generalized Finite Algorithms for Constructing Hermitian Matrices with Prescribed Diagonal and Spectrum

Year: 2005 DOI: 10.1137/S0895479803438183 In this paper, we present new algorithms that can replace the diagonal entries of a Hermitian matrix by any set of diagonal entries that majorize the original set without altering the eigenvalues of the matrix. They perform this feat by applying a sequence of (N-1) or fewer plane rotations, where N is the dimension of the matrix. Both the Bendel-Mickey and the Chan-Li algorithms are special cases of the proposed procedures. Using the fact that a positive semidefinite matrix can always be factored as $\mtx{X^\adj X}$, we also provide more efficient versions of the algorithms that can directly construct factors with specified singular values and column norms. We conclude with some open problems related to the construction of Hermitian matrices with joint diagonal and spectral properties.

Just relax: convex programming methods for identifying sparse signals in noise

Year: 2006 DOI: 10.1109/TIT.2005.864420 This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that has been contaminated with additive noise, the goal is to identify which elementary signals participated and to approximate their coefficients. Although many algorithms have been proposed, there is little theory which guarantees that these algorithms can accurately and efficiently solve the problem. This paper studies a method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program. This approach is powerful because the optimization can be completed in polynomial time with standard scientific software. The paper provides general conditions which ensure that convex relaxation succeeds. As evidence of the broad impact of these results, the paper describes how convex relaxation can be used for several concrete signal recovery problems. It also describes applications to channel coding, linear regression, and numerical analysis.

Matrix Nearness Problems with Bregman Divergences

Year: 2007 DOI: 10.1137/060649021 This paper discusses a new class of matrix nearness problems that measure approximation error using a directed distance measure called a Bregman divergence. Bregman divergences offer an important generalization of the squared Frobenius norm and relative entropy, and they all share fundamental geometric properties. In addition, these divergences are intimately connected with exponential families of probability distributions. Therefore, it is natural to study matrix approximation problems with respect to Bregman divergences. This article proposes a framework for studying these problems, discusses some specific matrix nearness problems, and provides algorithms for solving them numerically. These algorithms apply to many classical and novel problems, and they admit a striking geometric interpretation.

Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit

Year: 2007 DOI: 10.1109/TIT.2007.909108 This paper demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with $m$ nonzero entries in dimension $d$ given $ {rm O}(m ln d)$ random linear measurements of that signal. This is a massive improvement over previous results, which require ${rm O}(m^{2})$ measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems.

The random paving property for uniformly bounded matrices

Year: 2008 DOI: 10.4064/sm185-1-4 This note presents a new proof of an important result due to Bourgain and Tzafriri that provides a partial solution to the Kadison-Singer problem. The result shows that every unit-norm matrix whose entries are relatively small in comparison with its dimension can be paved by a partition of constant size. That is, the coordinates can be partitioned into a constant number of blocks so that the restriction of the matrix to each block of coordinates has norm less than one half. The original proof of Bourgain and Tzafriri involves a long, delicate calculation. The new proof relies on the systematic use of symmetrization and (noncommutative) Khinchin inequalities to estimate the norms of some random matrices.

Constructing Packings in Grassmannian Manifolds via Alternating Projection

Year: 2008 DOI: 10.1080/10586458.2008.10129018 This paper describes a numerical method for finding good packings in Grassmannian manifolds equipped with various metrics. This investigation also encompasses packing in projective spaces. In each case, producing a good packing is equivalent to constructing a matrix that has certain structural and spectral properties. By alternately enforcing the structural condition and then the spectral condition, it is often possible to reach a matrix that satisfies both. One may then extract a packing from this matrix. This approach is both powerful and versatile. In cases in which experiments have been performed, the alternating projection method yields packings that compete with the best packings recorded. It also extends to problems that have not been studied numerically. For example, it can be used to produce packings of subspaces in real and complex Grassmannian spaces equipped with the Fubini–Study distance; these packings are valuable in wireless communications. One can prove that some of the novel configurations constructed by the algorithm have packing diameters that are nearly optimal.

A Tutorial on Fast Fourier Sampling [How to apply it to problems]

Year: 2008 DOI: 10.1109/MSP.2007.915000 This article describes a computational method, called the Fourier sampling algorithm, that exploits this insight [10]. The algorithm takes a small number of (correlated) random samples from a signal and processes them efficiently to produce an approximation of the DFT of the signal. The algorithm offers provable guarantees on the number of samples, the running time, and the amount of storage. As we will see, these requirements are exponentially better than the FFT for some cases of interest. This article describes in detail how to implement a version of Fourier sampling, it presents some evidence of its empirical performance, and it explains the theoretical ideas that underlie the analysis. Our hope is that this tutorial will allow engineers to apply Fourier sampling to their own problems. We also hope that it will stimulate further research on practical implementations and extensions of the algorithm.

The Metric Nearness Problem

Year: 2008 DOI: 10.1137/060653391 Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a "nearest" set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

On the Linear Independence of Spikes and Sines

Year: 2008 DOI: 10.1007/s00041-008-9042-0 The purpose of this work is to survey what is known about the linear independence of spikes and sines. The paper provides new results for the case where the locations of the spikes and the frequencies of the sines are chosen at random. This problem is equivalent to studying the spectral norm of a random submatrix drawn from the discrete Fourier transform matrix. The proof depends on an extrapolation argument of Bourgain and Tzafriri.

Norms of random submatrices and sparse approximation

Year: 2008 DOI: 10.1016/j.crma.2008.10.008 Many problems in the theory of sparse approximation require bounds on operator norms of a random submatrix drawn from a fixed matrix. The purpose of this Note is to collect estimates for several different norms that are most important in the analysis of ℓ1 minimization algorithms. Several of these bounds have not appeared in detail.

CoSaMP: Iterative signal recovery from incomplete and inaccurate samples

Year: 2009 DOI: 10.1016/j.acha.2008.07.002 Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect to an orthonormal basis. The major algorithmic challenge in compressive sampling is to approximate a compressible signal from noisy samples. This paper describes a new iterative recovery algorithm called CoSaMP that delivers the same guarantees as the best optimization-based approaches. Moreover, this algorithm offers rigorous bounds on computational cost and storage. It is likely to be extremely efficient for practical problems because it requires only matrix–vector multiplies with the sampling matrix. For compressible signals, the running time is just O(Nlog^2N), where N is the length of the signal.

Beyond Nyquist: Efficient Sampling of Sparse Bandlimited Signals

Year: 2010 DOI: 10.1109/TIT.2009.2034811 Wideband analog signals push contemporary analog- to-digital conversion (ADC) systems to their performance limits. In many applications, however, sampling at the Nyquist rate is inefficient because the signals of interest contain only a small number of significant frequencies relative to the band limit, although the locations of the frequencies may not be known a priori. For this type of sparse signal, other sampling strategies are possible. This paper describes a new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components. Let K denote the total number of frequencies in the signal, and let W denote its band limit in hertz. Simulations suggest that the random demodulator requires just O(K log (W/K)) samples per second to stably reconstruct the signal. This sampling rate is exponentially lower than the Nyquist rate of $W$ hertz. In contrast to Nyquist sampling, one must use nonlinear methods, such as convex programming, to recover the signal from the samples taken by the random demodulator. This paper provides a detailed theoretical analysis of the system's performance that supports the empirical observations.

Computational Methods for Sparse Solution of Linear Inverse Problems

Year: 2010 DOI: 10.1109/JPROC.2010.2044010 The goal of the sparse approximation problem is to approximate a target signal using a linear combination of a few elementary signals drawn from a fixed collection. This paper surveys the major practical algorithms for sparse approximation. Specific attention is paid to computational issues, to the circumstances in which individual methods tend to perform well, and to the theoretical guarantees available. Many fundamental questions in electrical engineering, statistics, and applied mathematics can be posed as sparse approximation problems, making these algorithms versatile and relevant to a plethora of applications.

CoSaMP: iterative signal recovery from incomplete and inaccurate samples

Year: 2010 DOI: 10.1145/1859204.1859229 Compressive sampling (CoSa) is a new paradigm for developing data sampling technologies. It is based on the principle that many types of vector-space data are compressible, which is a term of art in mathematical signal processing. The key ideas are that randomized dimension reduction preserves the information in a compressible signal and that it is possible to develop hardware devices that implement this dimension reduction efficiently. The main computational challenge in CoSa is to reconstruct a compressible signal from the reduced representation acquired by the sampling device. This extended abstract describes a recent algorithm, called CoSaMP, that accomplishes the data recovery task. It was the first known method to offer near-optimal guarantees on resource usage.

Two proposals for robust PCA using semidefinite programming

Year: 2011 DOI: 10.1214/11-EJS636 The performance of principal component analysis suffers badly in the presence of outliers. This paper proposes two novel approaches for robust principal component analysis based on semidefinite programming. The first method, maximum mean absolute deviation rounding, seeks directions of large spread in the data while damping the effect of outliers. The second method produces a low-leverage decomposition of the data that attempts to form a low-rank model for the data by separating out corrupted observations. This paper also presents efficient computational methods for solving these semidefinite programs. Numerical experiments confirm the value of these new techniques.

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

Year: 2011 DOI: 10.1137/090771806 Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

Improved Analysis of the Subsamples Randomized Hadamard Transform

Year: 2011 DOI: 10.1142/S1793536911000787 This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers — for the first time — optimal constants in the estimate on the number of dimensions required for the embedding.

Improved analysis of the subsampled randomized Hadamard transform

Year: 2011 DOI: 10.1142/S1793536911000787 This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers---for the first time---optimal constants in the estimate on the number of dimensions required for the embedding.

Freedman's inequality for matrix martingales

Year: 2011 DOI: 10.1214/ECP.v16-1624 Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently established a natural extension of Freedman's inequality that provides tail bounds for the maximum singular value of a matrix-valued martingale. This note describes a different proof of the matrix Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other types of matrix martingales. The new techniques are adapted from recent work by the present author.

Freedman's Inequality for Matrix Martingales

Restricted isometries for partial random circulant matrices

Year: 2012 DOI: 10.1016/j.acha.2011.05.001 In the theory of compressed sensing, restricted isometry analysis has become a standard tool for studying how efficiently a measurement matrix acquires information about sparse and compressible signals. Many recovery algorithms are known to succeed when the restricted isometry constants of the sampling matrix are small. Many potential applications of compressed sensing involve a data-acquisition process that proceeds by convolution with a random pulse followed by (nonrandom) subsampling. At present, the theoretical analysis of this measurement technique is lacking. This paper demonstrates that the sth-order restricted isometry constant is small when the number m of samples satisfies m ≳ (s logn)^(3/2), where n is the length of the pulse. This bound improves on previous estimates, which exhibit quadratic scaling.

From joint convexity of quantum relative entropy to a concavity theorem of Lieb

Year: 2012 DOI: 10.1090/S0002-9939-2011-11141-9 This paper provides a succinct proof of a 1973 theorem of Lieb that establishes the concavity of a certain trace function. The development relies on a deep result from quantum information theory, the joint convexity of quantum relative entropy, as well as a recent argument due to Carlen and Lieb.

User-Friendly Tail Bounds for Sums of Random Matrices

Year: 2012 DOI: 10.1007/s10208-011-9099-z This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices. These results place simple and easily verifiable hypotheses on the summands, and they deliver strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of random rectangular matrices follow as an immediate corollary. The proof techniques also yield some information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable.

A comparison principle for functions of a uniformly random subspace

Year: 2012 DOI: 10.1007/s00440-011-0360-9 This note demonstrates that it is possible to bound the expectation of an arbitrary norm of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a standard Gaussian matrix with the same dimensions. A related comparison holds for any convex function of a random matrix drawn from the Stiefel manifold. For certain norms, a reversed inequality is also valid.

The restricted isometry property for time-frequency structured random matrices

Year: 2013 DOI: 10.1007/s00440-012-0441-4 This paper establishes the restricted isometry property for a Gabor system generated by n^2 time–frequency shifts of a random window function in n dimensions. The sth order restricted isometry constant of the associated n × n^2 Gabor synthesis matrix is small provided that s ≤ cn^(2/3) / log^2 n. This bound provides a qualitative improvement over previous estimates, which achieve only quadratic scaling of the sparsity s with respect to n. The proof depends on an estimate for the expected supremum of a second-order chaos.

Model-based scaling of the streamwise energy density in high-Reynolds-number turbulent channels

Year: 2013 DOI: 10.1017/jfm.2013.457 We study the Reynolds-number scaling and the geometric self-similarity of a gainbased, low-rank approximation to turbulent channel flows, determined by the resolvent formulation of McKeon & Sharma (J. Fluid Mech., vol. 658, 2010, pp. 336–382), in order to obtain a description of the streamwise turbulence intensity from direct consideration of the Navier–Stokes equations. Under this formulation, the velocity field is decomposed into propagating waves (with single streamwise and spanwise wavelengths and wave speed) whose wall-normal shapes are determined from the principal singular function of the corresponding resolvent operator. Using the accepted scalings of the mean velocity in wall-bounded turbulent flows, we establish that the resolvent operator admits three classes of wave parameters that induce universal behaviour with Reynolds number in the low-rank model, and which are consistent with scalings proposed throughout the wall turbulence literature. In addition, it is shown that a necessary condition for geometrically self-similar resolvent modes is the presence of a logarithmic turbulent mean velocity. Under the practical assumption that the mean velocity consists of a logarithmic region, we identify the scalings that constitute hierarchies of self-similar modes that are parameterized by the critical wall-normal location where the speed of the mode equals the local turbulent mean velocity. For the rank-1 model subject to broadband forcing, the integrated streamwise energy density takes a universal form which is consistent with the dominant near-wall turbulent motions. When the shape of the forcing is optimized to enforce matching with results from direct numerical simulations at low turbulent Reynolds numbers, further similarity appears. Representation of these weight functions using similarity laws enables prediction of the Reynolds number and wall-normal variations of the streamwise energy intensity at high Reynolds numbers (Re_τ ≈ 10^3–10^(10)). Results from this low-rank model of the Navier–Stokes equations compare favourably with experimental results in the literature.

Compact representation of wall-bounded turbulence using compressive sampling

Year: 2014 DOI: 10.1063/1.4862303 Compressive sampling is well-known to be a useful tool used to resolve the energetic content of signals that admit a sparse representation. The broadband temporal spectrum acquired from point measurements in wall-bounded turbulence has precluded the prior use of compressive sampling in this kind of flow, however it is shown here that the frequency content of flow fields that have been Fourier transformed in the homogeneous spatial (wall-parallel) directions is approximately sparse, giving rise to a compact representation of the velocity field. As such, compressive sampling is an ideal tool for reducing the amount of information required to approximate the velocity field. Further, success of the compressive sampling approach provides strong evidence that this representation is both physically meaningful and indicative of special properties of wall turbulence. Another advantage of compressive sampling over periodic sampling becomes evident at high Reynolds numbers, since the number of samples required to resolve a given bandwidth with compressive sampling scales as the logarithm of the dynamically significant bandwidth instead of linearly for periodic sampling. The combination of the Fourier decomposition in the wall-parallel directions, the approximate sparsity in frequency, and empirical bounds on the convection velocity leads to a compact representation of an otherwise broadband distribution of energy in the space defined by streamwise and spanwise wavenumber, frequency, and wall-normal location. The data storage requirements for reconstruction of the full field using compressive sampling are shown to be significantly less than for periodic sampling, in which the Nyquist criterion limits the maximum frequency that can be resolved. Conversely, compressive sampling maximizes the frequency range that can be recovered if the number of samples is limited, resolving frequencies up to several times higher than the mean sampling rate. It is proposed that the approximate sparsity in frequency and the corresponding structure in the spatial domain can be exploited to design simulation schemes for canonical wall turbulence with significantly reduced computational expense compared with current techniques.

Paved with good intentions: Analysis of a randomized block Kaczmarz method

Year: 2014 DOI: 10.1016/j.laa.2012.12.022 The block Kaczmarz method is an iterative scheme for solving overdetermined least-squares problems. At each step, the algorithm projects the current iterate onto the solution space of a subset of the constraints. This paper describes a block Kaczmarz algorithm that uses a randomized control scheme to choose the subset at each step. This algorithm is the first block Kaczmarz method with an (expected) linear rate of convergence that can be expressed in terms of the geometric properties of the matrix and its submatrices. The analysis reveals that the algorithm is most effective when it is given a good row paving of the matrix, a partition of the rows into well-conditioned blocks. The operator theory literature provides detailed information about the existence and construction of good row pavings. Together, these results yield an efficient block Kaczmarz scheme that applies to many overdetermined least-squares problem.

Subadditivity of matrix φ-entropy and concentration of random matrices

Year: 2014 DOI: 10.1214/EJP.v19-2964 This paper considers a class of entropy functionals defined for random matrices, and it demonstrates that these functionals satisfy a subadditivity property. Several matrix concentration inequalities are derived as an application of this result.

A low-order decomposition of turbulent channel flow via resolvent analysis and convex optimization

Year: 2014 DOI: 10.1063/1.4876195 We combine resolvent-mode decomposition with techniques from convex optimization to optimally approximate velocity spectra in a turbulent channel. The velocity is expressed as a weighted sum of resolvent modes that are dynamically significant, non-empirical, and scalable with Reynolds number. To optimally represent direct numerical simulations (DNS) data at friction Reynolds number 2003, we determine the weights of resolvent modes as the solution of a convex optimization problem. Using only 12 modes per wall-parallel wavenumber pair and temporal frequency, we obtain close agreement with DNS-spectra, reducing the wall-normal and temporal resolutions used in the simulation by three orders of magnitude.

Matrix concentration inequalities via the method of exchangeable pairs

Year: 2014 DOI: 10.1214/13-AOP892 This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein's method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables.

From Steiner Formulas for Cones to Concentration of Intrinsic Volumes

Year: 2014 DOI: 10.1007/s00454-014-9595-4 The intrinsic volumes of a convex cone are geometric functionals that return basic structural information about the cone. Recent research has demonstrated that conic intrinsic volumes are valuable for understanding the behavior of random convex optimization problems. This paper develops a systematic technique for studying conic intrinsic volumes using methods from probability. At the heart of this approach is a general Steiner formula for cones. This result converts questions about the intrinsic volumes into questions about the projection of a Gaussian random vector onto the cone, which can then be resolved using tools from Gaussian analysis. The approach leads to new identities and bounds for the intrinsic volumes of a cone, including a near-optimal concentration inequality.

Sharp Recovery Bounds for Convex Demixing, with Applications

Year: 2014 DOI: 10.1007/s10208-014-9191-2 Demixing refers to the challenge of identifying two structured signals given only the sum of the two signals and prior information about their structures. Examples include the problem of separating a signal that is sparse with respect to one basis from a signal that is sparse with respect to a second basis, and the problem of decomposing an observed matrix into a low-rank matrix plus a sparse matrix. This paper describes and analyzes a framework, based on convex optimization, for solving these demixing problems, and many others. This work introduces a randomized signal model that ensures that the two structures are incoherent, i.e., generically oriented. For an observation from this model, this approach identifies a summary statistic that reflects the complexity of a particular signal. The difficulty of separating two structured, incoherent signals depends only on the total complexity of the two structures. Some applications include (1) demixing two signals that are sparse in mutually incoherent bases, (2) decoding spread-spectrum transmissions in the presence of impulsive errors, and (3) removing sparse corruptions from a low-rank matrix. In each case, the theoretical analysis of the convex demixing method closely matches its empirical behavior.

Living on the edge: phase transitions in convex programs with random data

Year: 2014 DOI: 10.1093/imaiai/iau005 Recent research indicates that many convex optimization problems with random constraints exhibit a phase transition as the number of constraints increases. For example, this phenomenon emerges in the ℓ_1 minimization method for identifying a sparse vector from random linear measurements. Indeed, the ℓ_1 approach succeeds with high probability when the number of measurements exceeds a threshold that depends on the sparsity level; otherwise, it fails with high probability. This paper provides the first rigorous analysis that explains why phase transitions are ubiquitous in random convex optimization problems. It also describes tools for making reliable predictions about the quantitative aspects of the transition, including the location and the width of the transition region. These techniques apply to regularized linear inverse problems with random measurements, to demixing problems under a random incoherence model, and also to cone programs with random affine constraints. The applied results depend on foundational research in conic geometry. This paper introduces a summary parameter, called the statistical dimension, that canonically extends the dimension of a linear subspace to the class of convex cones. The main technical result demonstrates that the sequence of intrinsic volumes of a convex cone concentrates sharply around the statistical dimension. This fact leads to accurate bounds on the probability that a randomly rotated cone shares a ray with a fixed cone.

Robust Computation of Linear Models by Convex Relaxation

Year: 2015 DOI: 10.1007/s10208-014-9221-0 Consider a data set of vector-valued observations that consists of noisy inliers, which are explained well by a low-dimensional subspace, along with some number of outliers. This work describes a convex optimization problem, called reaper, that can reliably fit a low-dimensional model to this type of data. This approach parameterizes linear subspaces using orthogonal projectors and uses a relaxation of the set of orthogonal projectors to reach the convex formulation. The paper provides an efficient algorithm for solving the reaper problem, and it documents numerical experiments that confirm that reaper can dependably find linear structure in synthetic and natural data. In addition, when the inliers lie near a low-dimensional subspace, there is a rigorous theory that describes when reaper can approximate this subspace.

An Introduction to Matrix Concentration Inequalities

Year: 2015 DOI: 10.1561/2200000048 Random matrices now play a role in many areas of theoretical, applied, and computational mathematics. Therefore, it is desirable to have tools for studying random matrices that are flexible, easy to use, and powerful. Over the last fifteen years, researchers have developed a remarkable family of results, called matrix concentration inequalities, that achieve all of these goals. This monograph offers an invitation to the field of matrix concentration inequalities. It begins with some history of random matrix theory; it describes a flexible model for random matrices that is suitable for many problems; and it discusses the most important matrix concentration results. To demonstrate the value of these techniques, the presentation includes examples drawn from statistics, machine learning, optimization, combinatorics, algorithms, scientific computing, and beyond.

Solving ptychography with a convex relaxation

Year: 2015 DOI: 10.1088/1367-2630/17/5/053044 PMCID: PMC4486359 Ptychography is a powerful computational imaging technique that transforms a collection of low-resolution images into a high-resolution sample reconstruction. Unfortunately, algorithms that currently solve this reconstruction problem lack stability, robustness, and theoretical guarantees. Recently, convex optimization algorithms have improved the accuracy and reliability of several related reconstruction efforts. This paper proposes a convex formulation of the ptychography problem. This formulation has no local minima, it can be solved using a wide range of algorithms, it can incorporate appropriate noise models, and it can include multiple a priori constraints. The paper considers a specific algorithm, based on low-rank factorization, whose runtime and memory usage are near-linear in the size of the output image. Experiments demonstrate that this approach offers a 25% lower background variance on average than alternating projections, the ptychographic reconstruction algorithm that is currently in widespread use.

Designing Statistical Estimators That Balance Sample Size, Risk, and Computational Cost

Year: 2015 DOI: 10.1109/JSTSP.2015.2400412 This paper proposes a tradeoff between computational time, sample complexity, and statistical accuracy that applies to statistical estimators based on convex optimization. When we have a large amount of data, we can exploit excess samples to decrease statistical risk, to decrease computational cost, or to trade off between the two. We propose to achieve this tradeoff by varying the amount of smoothing applied to the optimization problem. This work uses regularized linear regression as a case study to argue for the existence of this tradeoff both theoretically and experimentally. We also apply our method to describe a tradeoff in an image interpolation problem.

Integer Factorization of a Positive-Definite Matrix

Year: 2015 DOI: 10.1137/15M1024718 This paper establishes that every positive-definite matrix can be written as a positive linear combination of outer products of integer-valued vectors whose entries are bounded by the geometric mean of the condition number and the dimension of the matrix.

Efron–Stein inequalities for random matrices

Year: 2016 DOI: 10.1214/15-AOP1054 This paper establishes new concentration inequalities for random matrices constructed from independent random variables. These results are analogous with the generalized Efron–Stein inequalities developed by Boucheron et al. The proofs rely on the method of exchangeable pairs.

A mathematical introduction to compressive sensing [Book Review]

Year: 2017 DOI: 10.1090/bull/1546 A mathematical introduction to compressive sensing by Simon Foucart and Holger Rauhut [FR13] is about sparse solutions to systems of random linear equations. To begin, let me describe some striking phenomena that take place in this context. Afterward, I shall try to explain why these facts have captivated so many researchers over the last decade. I shall conclude with some comments on the book.

Practical Sketching Algorithms for Low-Rank Matrix Approximation

Year: 2017 DOI: 10.1137/17M1111590 This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image, or sketch, of the matrix. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.

Simplicial Faces of the Set of Correlation Matrices

Year: 2018 DOI: 10.1007/s00454-017-9961-0 This paper concerns the facial geometry of the set of n×n correlation matrices. The main result states that almost every set of r vertices generates a simplicial face, provided that r ≤ √cn, where c is an absolute constant. This bound is qualitatively sharp because the set of correlation matrices has no simplicial face generated by more than √2n vertices.

Streaming Low-Rank Matrix Approximation with an Application to Scientific Simulation

Year: 2019 DOI: 10.1137/18m1201068 This paper argues that randomized linear sketching is a natural tool for on-the-fly compression of data matrices that arise from large-scale scientific simulations and data collection. The technical contribution consists in a new algorithm for constructing an accurate low-rank approximation of a matrix from streaming data. This method is accompanied by an a priori analysis that allows the user to set algorithm parameters with confidence and an a posteriori error estimator that allows the user to validate the quality of the reconstructed matrix. In comparison to previous techniques, the new method achieves smaller relative approximation errors and is less sensitive to parameter choices. As concrete applications, the paper outlines how the algorithm can be used to compress a Navier--Stokes simulation and a sea surface temperature dataset.

Randomized numerical linear algebra: Foundations and algorithms

Year: 2020 DOI: 10.1017/s0962492920000021 This survey describes probabilistic algorithms for linear algebraic computations, such as factorizing matrices and solving linear systems. It focuses on techniques that have a proven track record for real-world problems. The paper treats both the theoretical foundations of the subject and practical computational issues. Topics include norm estimation, matrix approximation by sampling, structured and unstructured random embeddings, linear regression problems, low-rank approximation, subspace iteration and Krylov methods, error estimation and adaptivity, interpolatory and CUR factorizations, Nyström approximation of positive semidefinite matrices, single-view ('streaming') algorithms, full rank-revealing factorizations, solvers for linear systems, and approximation of kernel matrices that arise in machine learning and in scientific computing.

Fast state tomography with optimal error bounds

Year: 2020 DOI: 10.1088/1751-8121/ab8111 Projected least squares is an intuitive and numerically cheap technique for quantum state tomography: compute the least-squares estimator and project it onto the space of states. The main result of this paper equips this point estimator with rigorous, non-asymptotic convergence guarantees expressed in terms of the trace distance. The estimator's sample complexity is comparable to the strongest convergence guarantees available in the literature and—in the case of the uniform POVM—saturates fundamental lower bounds. Numerical simulations support these competitive features.

Nonlinear matrix concentration via semigroup methods

Year: 2021 DOI: 10.1214/20-EJP578 Matrix concentration inequalities provide information about the probability that a random matrix is close to its expectation with respect to the ℓ₂ operator norm. This paper uses semigroup methods to derive sharp nonlinear matrix inequalities. The main result is that the classical Bakry–Émery curvature criterion implies subgaussian concentration for "matrix Lipschitz" functions. This argument circumvents the need to develop a matrix version of the log-Sobolev inequality, a technical obstacle that has blocked previous attempts to derive matrix concentration inequalities in this setting. The approach unifies and extends much of the previous work on matrix concentration. When applied to a product measure, the theory reproduces the matrix Efron–Stein inequalities due to Paulin et al. It also handles matrix-valued functions on a Riemannian manifold with uniformly positive Ricci curvature.

From Poincaré inequalities to nonlinear matrix concentration

Year: 2021 DOI: 10.3150/20-BEJ1289 This paper deduces exponential matrix concentration from a Poincaré inequality via a short, conceptual argument. Among other examples, this theory applies to matrix-valued functions of a uniformly log-concave random vector. The proof relies on the subadditivity of Poincaré inequalities and a chain rule inequality for the trace of the matrix Dirichlet form. It also uses a symmetrization technique to avoid difficulties associated with a direct extension of the classic scalar argument.

Matrix Concentration for Products

Year: 2021 DOI: 10.1007/s10208-021-09533-9 This paper develops nonasymptotic growth and concentration bounds for a product of independent random matrices. These results sharpen and generalize recent work of Henriksen–Ward, and they are similar in spirit to the results of Ahlswede–Winter and of Tropp for a sum of independent random matrices. The argument relies on the uniform smoothness properties of the Schatten trace classes.

An Optimal-Storage Approach to Semidefinite Programming Using Approximate Complementarity

Year: 2021 DOI: 10.1137/19m1244603 This paper develops a new storage-optimal algorithm that provably solves almost all semidefinite programs (SDPs). This method is particularly effective for weakly constrained SDPs under appropriate regularity conditions. The key idea is to formulate an approximate complementarity principle: Given an approximate solution to the dual SDP, the primal SDP has an approximate solution whose range is contained in the eigenspace with small eigenvalues of the dual slack matrix. For weakly constrained SDPs, this eigenspace has very low dimension, so this observation significantly reduces the search space for the primal solution. This result suggests an algorithmic strategy that can be implemented with minimal storage: (1) solve the dual SDP approximately; (2) compress the primal SDP to the eigenspace with small eigenvalues of the dual slack matrix; (3) solve the compressed primal SDP. The paper also provides numerical experiments showing that this approach is successful for a range of interesting large-scale SDPs.

Randomized block Krylov methods for approximating extreme eigenvalues

Year: 2022 DOI: 10.1007/s00211-021-01250-3 Randomized block Krylov subspace methods form a powerful class of algorithms for computing the extreme eigenvalues of a symmetric matrix or the extreme singular values of a general matrix. The purpose of this paper is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for these problems. For matrices with polynomial spectral decay, the randomized block Krylov method can obtain an accurate spectral norm estimate using only a constant number of steps (that depends on the decay rate and the accuracy). Furthermore, the analysis reveals that the behavior of the algorithm depends in a delicate way on the block size. Numerical evidence confirms these predictions.

Sparse Random Hamiltonians Are Quantumly Easy

Year: 2024 DOI: 10.1103/physrevx.14.011014

A candidate application for quantum computers is to simulate the low-temperature properties of quantum systems. For this task, there is a well-studied quantum algorithm that performs quantum phase estimation on an initial trial state that has a non-negligible overlap with a low-energy state. However, it is notoriously hard to give theoretical guarantees that such a trial state can be prepared efficiently. Moreover, the heuristic proposals that are currently available, such as with adiabatic state preparation, appear insufficient in practical cases. This paper shows that, for most random sparse Hamiltonians, the maximally mixed state is a sufficiently good trial state, and phase estimation efficiently prepares states with energy arbitrarily close to the ground energy. Furthermore, any low-energy state must have non-negligible quantum circuit complexity, suggesting that low-energy states are classically nontrivial and phase estimation is the optimal method for preparing such states (up to polynomial factors). These statements hold for two models of random Hamiltonians: (i) a sum of random signed Pauli strings and (ii) a random signed 𝑑-sparse Hamiltonian. The main technical argument is based on some new results in nonasymptotic random matrix theory. In particular, a refined concentration bound for the spectral density is required to obtain complexity guarantees for these random Hamiltonians.

Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations

Year: 2024 DOI: 10.1002/cpa.22234

The randomly pivoted Cholesky algorithm (RPCholesky) computes a factorized rank-k approximation of an N×N positive-semidefinite (psd) matrix. RPCholesky requires only (k+1)⁢N entry evaluations and O⁡(k2⁢N) additional arithmetic operations, and it can be implemented with just a few lines of code. The method is particularly useful for approximating a kernel matrix. This paper offers a thorough new investigation of the empirical and theoretical behavior of this fundamental algorithm. For matrix approximation problems that arise in scientific machine learning, experiments show that RPCholesky matches or beats the performance of alternative algorithms. Moreover, RPCholesky provably returns low-rank approximations that are nearly optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly support its use in scientific computing and machine learning applications.