Abstract: This paper develops a new class of algorithms for general linear systems and eigenvalue problems. These algorithms apply fast randomized sketching to accelerate subspace projection methods, such as GMRES and Rayleigh--Ritz. This approach offers great flexibility in designing the basis for the approximation subspace, which can improve scalability in many computational environments. The resulting algorithms outperform the classic methods with minimal loss of accuracy. For model problems, numerical experiments show large advantages over MATLAB's optimized routines, including a 100× speedup over gmres and a 10× speedup over eigs.

No.: 2021-01
ID: CaltechAUTHORS:20220909-161413582

]]>

Abstract: Random projections reduce the dimension of a set of vectors while preserving structural information, such as distances between vectors in the set. This paper proposes a novel use of row-product random matrices in random projection, where we call it Tensor Random Projection (TRP). It requires substantially less memory than existing dimension reduction maps. The TRP map is formed as the Khatri-Rao product of several smaller random projections, and is compatible with any base random projection including sparse maps, which enable dimension reduction with very low query cost and no floating point operations. We also develop a reduced variance extension. We provide a theoretical analysis of the bias and variance of the TRP, and a non-asymptotic error analysis for a TRP composed of two smaller maps. Experiments on both synthetic and MNIST data show that our method performs as well as conventional methods with substantially less storage.

Publication: arXiv
ID: CaltechAUTHORS:20210621-223135493

]]>

Abstract: Quantum simulation has wide applications in quantum chemistry and physics. Recently, scientists have begun exploring the use of randomized methods for accelerating quantum simulation. Among them, a simple and powerful technique, called qDRIFT, is known to generate random product formulas for which the average quantum channel approximates the ideal evolution. This work provides a comprehensive analysis of a single realization of the random product formula produced by qDRIFT. The main results prove that a typical realization of the randomized product formula approximates the ideal unitary evolution up to a small diamond-norm error. The gate complexity is independent of the number of terms in the Hamiltonian, but it depends on the system size and the sum of the interaction strengths in the Hamiltonian. Remarkably, the same random evolution starting from an arbitrary, but fixed, input state yields a much shorter circuit suitable for that input state. If the observable is also fixed, the same random evolution provides an even shorter product formula. The proofs depend on concentration inequalities for vector and matrix martingales. Numerical experiments verify the theoretical predictions.

Publication: arXiv
ID: CaltechAUTHORS:20201218-154423869

]]>

Abstract: The intrinsic volumes of a convex body are fundamental invariants that capture information about the average volume of the projection of the convex body onto a random subspace of fixed dimension. The intrinsic volumes also play a central role in integral geometry formulas that describe how moving convex bodies interact. Recent work has demonstrated that the sequence of intrinsic volumes concentrates sharply around its centroid, which is called the central intrinsic volume. The purpose of this paper is to derive finer concentration inequalities for the intrinsic volumes and related sequences. These concentration results have striking implications for high-dimensional integral geometry. In particular, they uncover new phase transitions in formulas for random projections, rotation means, random slicing, and the kinematic formula. In each case, the location of the phase transition is determined by reducing each convex body to a single summary parameter.

No.: 2020-01
ID: CaltechAUTHORS:20220829-181401723

]]>

Abstract: Semidefinite programming (SDP) is a powerful framework from convex optimization that has striking potential for data science applications. This paper develops a provably correct algorithm for solving large SDP problems by economizing on both the storage and the arithmetic costs. Numerical evidence shows that the method is effective for a range of applications, including relaxations of MaxCut, abstract phase retrieval, and quadratic assignment. Running on a laptop, the algorithm can handle SDP instances where the matrix variable has over 10¹³ entries.

Publication: arXiv
ID: CaltechAUTHORS:20201218-154437706

]]>

Abstract: This paper studies the problem of decomposing a low-rank positive-semidefinite matrix into symmetric factors with binary entries, either {±1} or {0,1}. This research answers fundamental questions about the existence and uniqueness of these decompositions. It also leads to tractable factorization algorithms that succeed under a mild deterministic condition. A companion paper addresses the related problem of decomposing a low-rank rectangular matrix into a binary factor and an unconstrained factor.

Publication: arXiv
ID: CaltechAUTHORS:20201218-154441081

]]>

Abstract: This paper studies the problem of decomposing a low-rank matrix into a factor with binary entries, either from {±1} or from {0,1}, and an unconstrained factor. The research answers fundamental questions about the existence and uniqueness of these decompositions. It also leads to tractable factorization algorithms that succeed under a mild deterministic condition. This work builds on a companion paper that addresses the related problem of decomposing a low-rank positive-semidefinite matrix into symmetric binary factors.

Publication: arXiv
ID: CaltechAUTHORS:20201218-154444454

]]>

Abstract: This paper describes new algorithms for constructing a low-rank approximation of an input matrix from a sketch, a random low-dimensional linear image of the matrix. These algorithms come with rigorous performance guarantees. Empirically, the proposed methods achieve significantly smaller relative errors than other approaches that have appeared in the literature. For a concrete application, the paper outlines how the algorithms support on-the-fly compression of data from a direct Navier-Stokes (DNS) simulation.

No.: 2018-01
ID: CaltechAUTHORS:20220826-183609942

]]>

Abstract: Randomized block Krylov subspace methods form a powerful class of algorithms for computing the (extreme) eigenvalues and singular values of a matrix. The purpose of this paper is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for these problems. The results demonstrate that, for many matrices, it is possible to obtain an accurate spectral norm estimate using only a constant number of steps of the randomized block Krylov method. Furthermore, the analysis reveals that the behavior of the algorithm depends in a delicate way on the block size. Randomized block Krylov subspace methods are a powerful class of algorithms for computing information about the spectrum of a matrix. The purpose of this note is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for estimating a number of extreme eigenvalues. The results demonstrate that, for many matrices, it is possible to obtain accurate approximations using only a constant number of steps of the randomized block Krylov method. Randomized block Krylov subspace methods are a powerful class of techniques for computing information about the spectrum of a matrix. The purpose of this paper is to develop new theoretical bounds on the performance of randomized block Krylov subspace methods for computing a low-rank approximation of a matrix. The results demonstrate that, for many matrices, it is possible to obtain accurate approximations using only a constant number of steps of the randomized block Krylov method.

No.: 2018-02
ID: CaltechAUTHORS:20210624-180721369

]]>

Abstract: Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nyström approximation with a novel mechanism for rank truncation. Theoretical analysis establishes that the proposed method can achieve any prescribed relative error in the Schatten 1-norm and that it exploits the spectral decay of the input matrix. Computer experiments show that the proposed method dominates alternative techniques for fixed-rank psd matrix approximation across a wide range of examples.

No.: 2017-03
ID: CaltechAUTHORS:20170620-081901312

]]>

Abstract: This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to store only a small randomized sketch of the matrix variable. After the optimization terminates, the algorithm extracts a low-rank approximation of the solution from the sketch. In contrast to nonconvex heuristics, the guarantees for SketchyCGM do not rely on statistical models for the problem data. Numerical work demonstrates the benefits of SketchyCGM over heuristics.

ID: CaltechAUTHORS:20180828-145534045

]]>

Abstract: Demixing is the problem of identifying multiple structured signals from a superimposed, undersampled, and noisy observation. This work analyzes a general framework, based on convex optimization, for solving demixing problems. When the constituent signals follow a generic incoherence model, this analysis leads to precise recovery guarantees. These results admit an attractive interpretation: each signal possesses an intrinsic degrees-of-freedom parameter, and demixing can succeed if and only if the dimension of the observation exceeds the total degrees of freedom present in the observation.

No.: 2017-02
ID: CaltechAUTHORS:20170314-110228775

]]>

Abstract: This paper develops a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by computer experiments.

No.: 2017-01
ID: CaltechAUTHORS:20170215-154809329

]]>

Abstract: Dimension reduction is the process of embedding high-dimensional data into a lower dimensional space to facilitate its analysis. In the Euclidean setting, one fundamental technique for dimension reduction is to apply a random linear map to the data. This dimension reduction procedure succeeds when it preserves certain geometric features of the set. The question is how large the embedding dimension must be to ensure that randomized dimension reduction succeeds with high probability. This paper studies a natural family of randomized dimension reduction maps and a large class of data sets. It proves that there is a phase transition in the success probability of the dimension reduction map as the embedding dimension increases. For a given data set, the location of the phase transition is the same for all maps in this family. Furthermore, each map has the same stability properties, as quantified through the restricted minimum singular value. These results can be viewed as new universality laws in high-dimensional stochastic geometry. Universality laws for randomized dimension reduction have many applications in applied mathematics, signal processing, and statistics. They yield design principles for numerical linear algebra algorithms, for compressed sensing measurement ensembles, and for random linear codes. Furthermore, these results have implications for the performance of statistical estimation methods under a large class of random experimental designs.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112137332

]]>

Abstract: Matrix concentration inequalities give bounds for the spectral-norm deviation of a random matrix from its expected value. These results have a weak dimensional dependence that is sometimes, but not always, necessary. This paper identifies one of the sources of the dimensional term and exploits this insight to develop sharper matrix concentration inequalities. In particular, this analysis delivers two refinements of the matrix Khintchine inequality that use information beyond the matrix variance to reduce or eliminate the dimensional dependence.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112133957

]]>

Abstract: An analytical framework for studying the logarithmic region of turbulent channels is formulated. We build on recent findings (Moarref et al., J. Fluid Mech., 734, 2013) that the velocity fluctuations in the logarithmic region can be decomposed into a weighted sum of geometrically self-similar resolvent modes. The resolvent modes and the weights represent the linear amplification mechanisms and the scaling influence of the nonlinear interactions in the Navier-Stokes equations (NSE), respectively (McKeon & Sharma, J. Fluid Mech., 658, 2010). Originating from the NSE, this framework provides an analytical support for Townsend’s attached-eddy model. Our main result is that self-similarity enables order reduction in modeling the logarithmic region by establishing a quantitative link between the self-similar structures and the velocity spectra. Specifically, the energy intensities, the Reynolds stresses, and the energy budget are expressed in terms of the resolvent modes with speeds corresponding to the top of the logarithmic region. The weights of the triad modes -the modes that directly interact via the quadratic nonlinearity in the NSE- are coupled via the interaction coefficients that depend solely on the resolvent modes (McKeon et al., Phys. Fluids, 25, 2013). We use the hierarchies of self-similar modes in the logarithmic region to extend the notion of triad modes to triad hierarchies. It is shown that the interaction coefficients for the triad modes that belong to a triad hierarchy follow an exponential function. The combination of these findings can be used to better understand the dynamics and interaction of flow structures in the logarithmic region. The compatibility of the proposed model with theoretical and experimental results is further discussed.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112157832

]]>

Abstract: Randomized matrix sparsification has proven to be a fruitful technique for producing faster algorithms in applications ranging from graph partitioning to semidefinite programming. In the decade or so of research into this technique, the focus has been—with few exceptions—on ensuring the quality of approximation in the spectral and Frobenius norms. For certain graph algorithms, however, the ∞→1 norm may be a more natural measure of performance. This paper addresses the problem of approximating a real matrix A by a sparse random matrix X with respect to several norms. It provides the first results on approximation error in the ∞→1 and ∞→2 norms, and it uses a result of Lata la to study approximation error in the spectral norm. These bounds hold for a reasonable family of random sparsification schemes, those which ensure that the entries of X are independent and average to the corresponding entries of A. Optimality of the ∞→1 and ∞→2 error estimates is established. Concentration results for the three norms hold when the entries of X are uniformly bounded. The spectral error bound is used to predict the performance of several sparsification and quantization schemes that have appeared in the literature; the results are competitive with the performance guarantees given by earlier scheme-specific analyses.

No.: 2014-01
ID: CaltechAUTHORS:20140828-082707636

]]>

Abstract: This work introduces the minimax Laplace transform method, a modification of the cumulant-based matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random self-adjoint matrices. This machinery is used to derive eigenvalue analogs of the classical Chernoff, Bennett, and Bernstein bounds. Two examples demonstrate the efficacy of the minimax Laplace transform. The first concerns the effects of column sparsification on the spectrum of a matrix with orthonormal rows. Here, the behavior of the singular values can be described in terms of coherence-like quantities. The second example addresses the question of relative accuracy in the estimation of eigenvalues of the covariance matrix of a random process. Standard results on the convergence of sample covariance matrices provide bounds on the number of samples needed to obtain relative accuracy in the spectral norm, but these results only guarantee relative accuracy in the estimate of the maximum eigenvalue. The minimax Laplace transform argument establishes that if the lowest eigenvalues decay sufficiently fast, Ω(ε^(-2)κ^2_ℓ ℓ log p) samples, where κ_ℓ = λ_1(C)/λ_ℓ(C), are sufficient to ensure that the dominant ℓ eigenvalues of the covariance matrix of a N(0,C) random vector are estimated to within a factor of 1 ± ε with high probability.

No.: 2014-02
ID: CaltechAUTHORS:20140828-084239607

]]>

Abstract: Demixing is the problem of identifying multiple structured signals from a superimposed, undersampled, and noisy observation. This work analyzes a general framework, based on convex optimization, for solving demixing problems. When the constituent signals follow a generic incoherence model, this analysis leads to precise recovery guarantees. These results admit an attractive interpretation: each signal possesses an intrinsic degrees-of-freedom parameter, and demixing can succeed if and only if the dimension of the observation exceeds the total degrees of freedom present in the observation.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112130540

]]>

Abstract: This paper derives exponential tail bounds and polynomial moment inequalities for the spectral norm deviation of a random matrix from its mean value. The argument depends on a matrix extension of Stein's method of exchangeable pairs for concentration of measure, as introduced by Chatterjee. Recent work of Mackey et al. uses these techniques to analyze random matrices with additive structure, while the enhancements in this paper cover a wider class of matrix-valued random elements. In particular, these ideas lead to a bounded differences inequality that applies to random matrices constructed from weakly dependent random variables. The proofs require novel trace inequalities that may be of independent interest.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112127106

]]>

Abstract: Covariance estimation becomes challenging in the regime where the number p of variables outstrips the number n of samples available to construct the estimate. One way to circumvent this problem is to assume that the covariance matrix is nearly sparse and to focus on estimating only the significant entries. To analyze this approach, Levina and Vershynin (2011) introduce a formalism called masked covariance estimation, where each entry of the sample covariance estimator is reweighed to reflect an a priori assessment of its importance. This paper provides a new analysis of the masked sample covariance estimator based on the matrix Laplace transform method. The main result applies to general subgaussian distributions. Specialized to the case of a Gaussian distribution, the theory offers qualitative improvements over earlier work. For example, the new results show that n = O(B log ^2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral-norm error, in contrast to the sample complexity n = O(B log ^5 p) obtained by Levina and Vershynin.

No.: 2012-01
ID: CaltechAUTHORS:20120411-102106234

]]>

Abstract: Covariance estimation becomes challenging in the regime where the number p of variables outstrips the number n of samples available to construct the estimate. One way to circumvent this problem is to assume that the covariance matrix is nearly sparse and to focus on estimating only the significant entries. To analyze this approach, Levina and Vershynin (2011) introduce a formalism called masked covariance estimation, where each entry of the sample covariance estimator is reweighted to reflect an a priori assessment of its importance. This paper provides a short analysis of the masked sample covariance estimator by means of a matrix concentration inequality. The main result applies to general distributions with at least four moments. Specialized to the case of a Gaussian distribution, the theory offers qualitative improvements over earlier work. For example, the new results show that n = O(B log^2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral-norm error, in contrast to the sample complexity n = O(B log^5 p) obtained by Levina and Vershynin.

Publication: arXiv
ID: CaltechAUTHORS:20180831-112123699

]]>

Abstract: This paper describes a new thresholding technique for constructing sparse principal components. Large-scale implementation issues are addressed, and a mathematical analysis describes situations where the algorithm is effective. In experiments, this method compares favorably with more sophisticated algorithms.

No.: 2011-02
ID: CaltechAUTHORS:20220826-185558571

]]>

Abstract: This report presents probability inequalities for sums of adapted sequences of random, self-adjoint matrices. The results frame simple, easily verifiable hypotheses on the summands, and they yield strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. The methods also specialize to sums of independent random matrices.

No.: 2011-01
ID: CaltechAUTHORS:20111012-114710310

]]>

Abstract: This work presents probability inequalities for sums of independent, random, self-adjoint matrices. The results frame simple, easily verifiable hypotheses on the summands, and they yield strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of rectangular matrices follow as an immediate corollary, and similar techniques yield information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same ease of use, diversity of application, and strength of conclusion that have made the scalar inequalities so valuable.

No.: 2010-01
ID: CaltechAUTHORS:20111012-112125900

]]>

Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. In particular, these techniques o®er a route toward principal component analysis (PCA) for petascale data. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed|either explicitly or implicitly|to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m x n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in slow memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

No.: 2009-05
ID: CaltechAUTHORS:20111012-111324407

]]>

Abstract: In sparse approximation problems, the goal is to find an approximate representation of a target signal using a linear combination of a few elementary signals drawn from a fixed collection. This paper surveys the major algorithms that are used for solving sparse approximation problems in practice. Specific attention is paid to computational issues, to the circumstances in which individual methods tend to perform well, and to the theoretical guarantees available. Many fundamental questions in electrical engineering, statistics, and applied mathematics can be posed as sparse approximation problems, which makes the algorithms discussed in this paper versatile tools with a wealth of applications.

No.: 2009-01
ID: CaltechAUTHORS:20111011-163243421

]]>

Abstract: Compressive sampling offers a new paradigm for acquiring signals that are compressible with respect to an orthonormal basis. The major algorithmic challenge in compressive sampling is to approximate a compressible signal from noisy samples. This paper describes a new iterative recovery algorithm called CoSaMP that delivers the same guarantees as the best optimization-based approaches. Moreover, this algorithm offers rigorous bounds on computational cost and storage. It is likely to be extremely efficient for practical problems because it requires only matrix-vector multiplies with the sampling matrix. For compressible signals, the running time is just O(N log^2 N), where N is the length of the signal.

No.: 2008-01
ID: CaltechAUTHORS:20111011-160707642

]]>

Abstract: Given a fixed matrix, the problem of column subset selection requests a column submatrix that has favorable spectral properties. Most research from the algorithms and numerical linear algebra communities focuses on a variant called rank-revealing QR, which seeks a well-conditioned collection of columns that spans the (numerical) range of the matrix. The functional analysis literature contains another strand of work on column selection whose algorithmic implications have not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that each matrix with normalized columns contains a large column submatrix that is exceptionally well conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic. This paper presents a randomized, polynomial-time algorithm that produces the submatrix promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing the Grothendieck factorization. These ideas also result in a novel approximation algorithm for the (∞, 1) norm of a matrix, which is generally NP-hard to compute exactly. As an added bonus, this work reveals a surprising connection between matrix factorization and the famous MAXCUT semidefinite program.

No.: 2008-02
ID: CaltechAUTHORS:20111011-161421093

]]>

Abstract: This report demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(mln d) random linear measurements of that signal. This is a massive improvement over previous results, which require O(m2) measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems.

No.: 2007-01
ID: CaltechAUTHORS:20111010-134929077

]]>

Abstract: This paper develops a new method for recovering m-sparse signals that is simultaneously uniform and quick. We present a reconstruction algorithm whose run time, O(m log^2(m) log^2(d)), is sublinear in the length d of the signal. The reconstruction error is within a logarithmic factor (in m) of the optimal m-term approximation error in l_1. In particular, the algorithm recovers m-sparse signals perfectly and noisy signals are recovered with polylogarithmic distortion. Our algorithm makes O(m log^2 (d)) measurements, which is within a logarithmic factor of optimal. We also present a small-space implementation of the algorithm. These sketching techniques and the corresponding reconstruction algorithms provide an algorithmic dimension reduction in the l_1 norm. In particular, vectors of support m in dimension d can be linearly embedded into O(m log^2 d) dimensions with polylogarithmic distortion. We can reconstruct a vector from its low-dimensional sketch in time O(m log^2(m) log^2(d)). Furthermore, this reconstruction is stable and robust under small perturbations.

ID: CaltechAUTHORS:20180828-150010838

]]>