Committee Feed
https://feeds.library.caltech.edu/people/Tropp-J-A/committee.rss
A Caltech Library Repository Feedhttp://www.rssboard.org/rss-specificationpython-feedgenenTue, 16 Apr 2024 16:08:16 +0000Asymptotic Weight Analysis of Low-Density Parity Check (LDPC) Code Ensembles
https://resolver.caltech.edu/CaltechETD:etd-05202008-094714
Authors: {'items': [{'email': 'sarah@acm.caltech.edu', 'id': 'Sweatlock-Sarah-Lynne', 'name': {'family': 'Sweatlock', 'given': 'Sarah Lynne'}, 'show_email': 'NO'}]}
Year: 2008
DOI: 10.7907/86BY-MA30
<p>With the invention of turbo codes in 1993 came increased interest in codes and iterative decoding schemes. Gallager's Regular codes were rediscovered, and irregular codes were introduced. Protograph codes were introduced and analyzed by NASA's Jet Propulsion Laboratory in the early years of this century. Part of this thesis continues that work, investigating the decoding of specific protograph codes and extending existing tools for analyzing codes to protograph codes.</p>
<p>The rest of this work focuses on a previously unknown relationship between the binary entropy function and the asymptotic ensemble average weight enumerator, which we call the spectral shape of the ensemble. This result can be seen as an extension of the Pless power-moment identities based on the discovery that the convex hull of the spectral shape is the Legendre transform of a function closely related to the moment-generating function of a codeword's weight. </p>
<p>In order to fully investigate this new relationship, tools needed to be designed to calculate the derivatives of the spectral shape as the equation describing an ensemble's spectral shape is rarely straightforward. For Gallager's regular ensembles, a formula for calculating derivatives of functions defined parametrically was required. For repeat-accumulate (RA) codes, a formula was needed for functions defined implicitly through a second function. Both formulas are similar to Faa di Bruno's formula for derivatives of compositions of functions.With the invention of turbo codes in 1993 came increased interest in codes and iterative decoding schemes. Gallager's Regular codes were rediscovered, and irregular codes were introduced. Protograph codes were introduced and analyzed by NASA's Jet Propulsion Laboratory in the early years of this century. Part of this thesis continues that work, investigating the decoding of specific protograph codes and extending existing tools for analyzing codes to protograph codes.</p>
<p>The rest of this work focuses on a previously unknown relationship between the binary entropy function and the asymptotic ensemble average weight enumerator, which we call the spectral shape of the ensemble. This result can be seen as an extension of the Pless power-moment identities based on the discovery that the convex hull of the spectral shape is the Legendre transform of a function closely related to the moment-generating function of a codeword's weight.</p>
<p>In order to fully investigate this new relationship, tools needed to be designed to calculate the derivatives of the spectral shape as the equation describing an ensemble's spectral shape is rarely straightforward. For Gallager's regular ensembles, a formula for calculating derivatives of functions defined parametrically was required. For repeat-accumulate (RA) codes, a formula was needed for functions defined implicitly through a second function. Both formulas are similar to Faa di Bruno's formula for derivatives of compositions of functions.</p>https://thesis.library.caltech.edu/id/eprint/1898Compressive Sensing for Sparse Approximations: Constructions, Algorithms, and Analysis
https://resolver.caltech.edu/CaltechTHESIS:10262009-081233260
Authors: {'items': [{'email': 'weiyu@systems.caltech.edu', 'id': 'Xu-Weiyu', 'name': {'family': 'Xu', 'given': 'Weiyu'}, 'show_email': 'YES'}]}
Year: 2010
DOI: 10.7907/F63K-GT12
<p>Compressive sensing is an emerging research field that has applications in signal processing, error correction, medical imaging, seismology, and many more other areas. It promises to efficiently recover a sparse signal vector via a much smaller number of linear measurements than its dimension. Naturally, how to design these linear measurements, how to construct the original high-dimensional signals efficiently and accurately, and how to analyze the sparse signal recovery algorithms are important issues in the developments of compressive sensing. This thesis is devoted to addressing these fundamental issues in the field of compressive sensing.</p>
<p>In compressive sensing, random measurement matrices are generally used and ℓ₁ minimization algorithms often use linear programming or other optimization methods to recover the sparse signal vectors. But explicitly constructible measurement matrices providing performance guarantees were elusive and ℓ₁ minimization algorithms are often very demanding in computational complexity for applications involving very large problem dimensions. In chapter 2, we propose and discuss a compressive sensing scheme with deterministic performance guarantees using deterministic explicitly constructible expander graph-based measurement matrices and show that the sparse signal recovery can be achieved with linear complexity. This is the first of such a kind of compressive sensing scheme with linear decoding complexity, deterministic performance guarantees of linear sparsity recovery, and deterministic explicitly constructible measurement matrices.</p>
<p>The popular and powerful ℓ₁ minimization algorithms generally give better sparsity recovery performances than known greedy decoding algorithms. In chapter 3, starting from a necessary and sufficient null-space condition for achieving a certain signal recovery accuracy, using high-dimensional geometry, we give a unified <i>null-space Grassmann angle</i>-based analytical framework for compressive sensing. This new framework gives sharp quantitative trade-offs between the signal sparsity and the recovery accuracy of the ℓ₁ optimization for approximately sparse signal. Our results concern the fundamental "balancedness" properties of linear subspaces and so may be of independent mathematical interest.</p>
<p>The conventional approach to compressed sensing assumes no prior information on the unknown signal other than the fact that it is sufficiently sparse over a particular basis. In many applications, however, additional prior information is available. In chapter 4, we will consider a particular model for the sparse signal that assigns a probability of being zero or nonzero to each entry of the unknown vector. The standard compressed sensing model is therefore a special case where these probabilities are all equal. Following the introduction of the <i>null-space Grassmann angle</i>-based analytical framework in this thesis, we are able to characterize the optimal recoverable sparsity thresholds using weighted ℓ₁ minimization algorithms with the prior information.</p>
<p>The roles of ℓ₁ minimization algorithm in recovering sparse signals from incomplete measurements are now well understood, and sharp recoverable sparsity thresholds for ℓ₁ minimization have been obtained. The iterative reweighted ℓ₁ minimization algorithms or related algorithms have been empirically observed to boost the recoverable sparsity thresholds for certain types of signals, but no rigorous theoretical results have been established to prove this fact. In chapter 5, we try to provide a theoretical foundation for analyzing the iterative reweighted ℓ₁ algorithms. In particular, we show that for a nontrivial class of signals, the iterative reweighted ℓ₁ minimization can indeed deliver recoverable sparsity thresholds larger than the ℓ₁ minimization. Again, our results are based on the null-space Grassmann angle-based analytical framework.</p>
<p>Evolving from compressive sensing problems, where we are interested in recovering sparse vector signals from compressed linear measurements, we will turn our attention to recovering matrices of low rank from compressed linear measurements in chapter 6, which is a challenging problem that arises in many applications in machine learning, control theory, and discrete geometry. This class of optimization problems is NP-HARD, and for most practical problems there are no efficient algorithms that yield exact solutions. A popular heuristic replaces the rank function with the nuclear norm of the decision variable and has been shown to provide the optimal low rank solution in a variety of scenarios. We analytically assess the practical performance of this heuristic for finding the minimum rank matrix subject to linear constraints. We start from the characterization of a necessary and sufficient condition that determines when this heuristic finds the minimum rank solution. We then obtain probabilistic bounds on the matrix dimensions and rank and the number of constraints, such that our conditions for success are satisfied for almost all linear constraint sets as the matrix dimensions tend to infinity. Empirical evidence shows that these probabilistic bounds provide accurate predictions of the heuristic's performance in non-asymptotic scenarios.</p>https://thesis.library.caltech.edu/id/eprint/5329Compressed Sensing, Sparse Approximation, and Low-Rank Matrix Estimation
https://resolver.caltech.edu/CaltechTHESIS:02272011-233144146
Authors: {'items': [{'email': 'yanivplan@gmail.com', 'id': 'Plan-Yaniv', 'name': {'family': 'Plan', 'given': 'Yaniv'}, 'show_email': 'NO'}]}
Year: 2011
DOI: 10.7907/K8W9-RS71
<p>The importance of sparse signal structures has been recognized in a plethora of applications ranging from medical imaging to group disease testing to radar technology. It has been shown in practice that various signals of interest may be (approximately) sparsely modeled, and that sparse modeling is often beneficial, or even indispensable to signal recovery. Alongside an increase in applications, a rich theory of sparse and compressible signal recovery has recently been developed under the names compressed sensing (CS) and sparse approximation (SA). This revolutionary research has demonstrated that many signals can be recovered from severely undersampled measurements by taking advantage of their inherent low-dimensional structure. More recently, an offshoot of CS and SA has been a focus of research on other low-dimensional signal structures such as matrices of low rank. Low-rank matrix recovery (LRMR) is demonstrating a rapidly growing array of important applications such as quantum state tomography, triangulation from incomplete distance measurements, recommender systems (e.g., the Netflix problem), and system identification and control.</p>
<p>In this dissertation, we examine CS, SA, and LRMR from a theoretical perspective. We consider a variety of different measurement and signal models, both random and deterministic, and mainly ask two questions.</p>
<p>How many measurements are necessary? How large is the recovery error?</p>
<p>We give theoretical lower bounds for both of these questions, including oracle and minimax lower bounds for the error. However, the main emphasis of the thesis is to demonstrate the efficacy of convex optimization---in particular l1 and nuclear-norm minimization based programs---in CS, SA, and LRMR. We derive upper bounds for the number of measurements required and the error derived by convex optimization, which in many cases match the lower bounds up to constant or logarithmic factors. The majority of these results do not require the restricted isometry property (RIP), a ubiquitous condition in the literature.</p>https://thesis.library.caltech.edu/id/eprint/6259Network Coding for Error Correction
https://resolver.caltech.edu/CaltechTHESIS:06032011-153909265
Authors: {'items': [{'email': 'svitlana@caltech.edu', 'id': 'Vyetrenko-Svitlana-S', 'name': {'family': 'Vyetrenko', 'given': 'Svitlana S.'}, 'show_email': 'NO'}]}
Year: 2011
DOI: 10.7907/D2ZM-V541
<p>In this thesis, network error correction is considered from both theoretical and practical viewpoints. Theoretical parameters such as network structure and type of connection (multicast vs. nonmulticast) have a profound effect on network error correction capability. This work is also dictated by the practical network issues that arise in wireless ad-hoc networks, networks with limited computational power (e.g., sensor networks) and real-time data streaming systems (e.g., video/audio conferencing or media streaming).</p>
<p>Firstly, multicast network scenarios with probabilistic error and erasure occurrence are considered. In particular, it is shown that in networks with both random packet erasures and errors, increasing the relative occurrence of erasures compared to errors favors network coding over forwarding at network nodes, and vice versa. Also, fountain-like error-correcting codes, for which redundancy is incrementally added until decoding succeeds, are constructed. These codes are appropriate for use in scenarios where the upper bound on the number of errors is unknown a priori.</p>
<p>Secondly, network error correction in multisource multicast and nonmulticast network scenarios is discussed. Capacity regions for multisource multicast network error correction with both known and unknown topologies (coherent and noncoherent network coding) are derived. Several approaches to lower- and upper-bounding error-correction capacity regions of general nonmulticast networks are given. For 3-layer two-sink and nested-demand nonmulticast network topologies some of the given lower and upper bounds match. For these network topologies, code constructions that employ only intrasession coding are designed. These designs can be applied to streaming erasure correction code constructions.</p>https://thesis.library.caltech.edu/id/eprint/6497Practical Compressed Sensing: Modern Data Acquisition and Signal Processing
https://resolver.caltech.edu/CaltechTHESIS:06022011-152525054
Authors: {'items': [{'email': 'srbecker@wesleyan.edu', 'id': 'Becker-Stephen-R', 'name': {'family': 'Becker', 'given': 'Stephen R.'}, 'orcid': '0000-0002-1932-8159', 'show_email': 'YES'}]}
Year: 2011
DOI: 10.7907/DC16-K322
<p>Since 2004, the field of compressed sensing has grown quickly and seen tremendous interest because it provides a theoretically sound and computationally tractable method to stably recover signals by sampling at the information rate. This thesis presents in detail the design of one of the world's first compressed sensing hardware devices, the random modulation pre-integrator (RMPI). The RMPI is an analog-to-digital converter (ADC) that bypasses a current limitation in ADC technology and achieves an unprecedented 8 effective number of bits over a bandwidth of 2.5 GHz. Subtle but important design considerations are discussed, and state-of-the-art reconstruction techniques are presented.</p>
<p>Inspired by the need for a fast method to solve reconstruction problems for the RMPI, we develop two efficient large-scale optimization methods, NESTA and TFOCS, that are applicable to a wide range of other problems, such as image denoising and deblurring, MRI reconstruction, and matrix completion (including the famous Netflix problem). While many algorithms solve unconstrained l<sub>1</sub> problems, NESTA and TFOCS can solve the constrained form of l<sub>1</sub> minimization, and allow weighted norms. In addition to l<sub>1</sub> minimization problems such as the LASSO, both NESTA and TFOCS solve total-variation minimization problem. TFOCS also solves the Dantzig selector and most variants of the nuclear norm minimization problem. A common theme in both NESTA and TFOCS is the use of smoothing techniques, which make the problem tractable, and the use of optimal first-order methods that have an accelerated convergence rate yet have the same cost per iteration as gradient descent. The conic dual methodology is introduced in TFOCS and proves to be extremely flexible, covering such generic problems as linear programming, quadratic programming, and semi-definite programming. A novel continuation scheme is presented, and it is shown that the Dantzig selector benefits from an exact-penalty property. Both NESTA and TFOCS are released as software packages available freely for academic use.</p>https://thesis.library.caltech.edu/id/eprint/6492Random Matrix Recursions in Estimation, Control, and Adaptive Filtering
https://resolver.caltech.edu/CaltechTHESIS:06022011-214438378
Authors: {'items': [{'email': 'avakili@caltech.edu', 'id': 'Vakili-Ali', 'name': {'family': 'Vakili', 'given': 'Ali'}, 'show_email': 'NO'}]}
Year: 2011
DOI: 10.7907/HCKN-7W53
<p>This dissertation is devoted to the study of estimation and control over systems that can be described by linear time-varying state-space models. Examples of such systems are encountered frequently in systems theory, e.g., wireless sensor networks, adaptive filtering, distributed control, etc. Recent developments in distributed catastrophe surveillance, smart transportation, and power grid control systems further motivate such a study.</p>
<p>While linear time-invariant systems are well-understood, there is no general theory that captures various aspects of time-varying counterparts. With little exception, tackling these problems normally boils down to studying time-varying linear or non-linear recursive matrix equations, known as Lyapunov and Riccati recursions that are notoriously hard to analyze. We employ the theory of random matrices to elucidate different facets of these recursions and answer several important questions about the performance, stability, and convergence of estimation and control over such systems.</p>
<p>We make two general assumptions. First, we assume that the coefficient matrices are drawn from jointly stationary matrix-valued random processes. The stationarity assumption hardly restricts the analysis since almost all cases of practical interest fall into this category. We further assume that the state vector size, n, is relatively large. The law of large numbers however guarantees fast convergence to the asymptotic results for n being as small as 10. Under these assumptions, we develop a framework capable of characterizing steady-state and transient behavior of adaptive filters and control and estimation over communication networks. This framework proves promising by successfully tackling several problems for the first time in the literature.</p>
<p>We first study random Lyapunov recursions and characterize their transient and steady-state behavior. Lyapunov recursions appear in several classes of adaptive filters and also as lower bounds of random Riccati recursions in distributed Kalman filtering. We then look at random Riccati recursions whose nonlinearity makes them much more complicated to study. We investigate standard recursive-least-squares (RLS) filtering and extend our analysis beyond the standard case to filtering with multiple measurements, as well as the case of intermittent measurements. Finally, we study Kalman filtering with intermittent observations, which is frequently used to model wireless sensor networks. In all of these cases we obtain interesting universal laws that depend on the structure of the problem, rather than specific model parameters. We verify the accuracy of our results through various simulations for systems with as few as 10 states.</p>
https://thesis.library.caltech.edu/id/eprint/6495Combinatorial Regression and Improved Basis Pursuit for Sparse Estimation
https://resolver.caltech.edu/CaltechTHESIS:06072012-001523622
Authors: {'items': [{'email': 'amin.khajehnejad@gmail.com', 'id': 'Khajehnejad-M-Amin', 'name': {'family': 'Khajehnejad', 'given': 'M. Amin'}, 'show_email': 'NO'}]}
Year: 2012
DOI: 10.7907/04J6-Y832
<p>Sparse representations accurately model many real-world data sets. Some form of sparsity is conceivable in almost every practical application, from image and video processing, to spectral sensing in radar detection, to bio-computation and genomic signal processing. Modern statistics and estimation theory have come up with ways for efficiently accounting for sparsity in enhanced information retrieval systems. In particular, \emph{compressed sensing} and \emph{matrix rank minimization} are two newly born branches of dimensionality reduction techniques, with very promising horizons. Compressed sensing addresses the reconstruction of sparse signals from ill-conditioned linear measurements, a mathematical problem that arises in practical applications in one of the following forms: model fitting (regression), analog data compression, sub-Nyquist sampling, and data privacy. Low-rank matrix estimation addresses the reconstruction of multi-dimensional data (matrices) with strong coherence properties (low rank) under restricted sensing. This model is motivated by modern problems in machine learning, dynamic systems, and quantum computing.</p>
<p>This thesis provides an in-depth study of recent developments in the fields of compressed sensing and matrix rank minimization, and sets forth new directions for improved sparse recovery techniques. The contributions are threefold: the design of combinatorial structures for sparse encoding, the development of improved recovery algorithms, and extension of sparse vector recovery techniques to other problems.</p>
<p>We propose combinatorial structures for the measurement matrix that facilitate compressing sparse analog signal representations with better guarantees than any of the currently existing architectures. Our constructions are mostly deterministic and are based on ideas from expander graphs, LDPC error-correcting codes and combinatorial separators.</p>
<p>We propose novel reconstruction algorithms that are amenable to the combinatorial structures we study, and have various advantages over the conventional convex optimization techniques for sparse recovery. In addition, we separately study the convex optimization Basis Pursuit method for compressed sensing, and propose regularization schemes that expand the success domain for such algorithms. Our studies contain rigorous analysis, numerical simulations, and examples from practical applications.</p>
<p>Lastly, we extend some of our proposed techniques to low-rank matrix estimation and channel coding. These generalizations lead to the development of a novel and fast reconstruction algorithm for matrix rank minimization, and a modified regularized linear-programming-based decoding algorithm for detecting codewords of a linear LDPC code during an erroneous communication.</p>
https://thesis.library.caltech.edu/id/eprint/7142Compressed Sensing Receivers: Theory, Design, and Performance Limits
https://resolver.caltech.edu/CaltechTHESIS:06122012-144158047
Authors: {'items': [{'email': 'juhwan@gmail.com', 'id': 'Yoo-Juhwan', 'name': {'family': 'Yoo', 'given': 'Juhwan'}, 'show_email': 'YES'}]}
Year: 2012
DOI: 10.7907/Y3FA-VB87
<p>The past 50 years have seen tremendous developments in electronics due to the rise and rapid development of IC-fabrication technology [1]. In addition to the production of cheap and abundant computing resources, another area of rapid advancement has been wireless technologies. While the central focus of wireless research has been mobile communication, an area of increasing importance concerns the development of sensing/spectral applications over bandwidths exceeding multiple GHz. Such systems have many applications ranging from scientific to military. Although some solutions exist, their large size, weight, and power make more-efficient solutions desirable.</p>
<p>At present, one of the principal bottlenecks in designing such systems is the power consumption of the back-end ADCs at the required digitization rate. ADCs are a dominant source of power consumption; it is also often the case that ADC block specifications are used to determine parameters for the rest of the signal chain, such as the RF front-end and the DSP-core which processes the digitized samples [2]. Historically, increases in system bandwidth have come from developing ADCs with superior performance.</p>
<p>In contrast to improving ADC performance, this work presents a system-level approach with the goal of minimizing the required digitization rate for observation of a given effective instantaneous bandwidth (EIBW). The approach was inspired by the field of compressed sensing [3–5]. Loosely stated, CS asserts that samples which represent random projections can be used to recover sparse and/or compressible signals with what was previously thought to be insufficient information. The primary contributions of this thesis include: the establishment of physical feasibility of CS-based receivers through implementation of the first fully-integrated high speed CS-based front-end known as the random-modulation pre-integrator (RMPI) [6–9], and the development of a principled design methodology based on a rigorous analytical and empirical feasibility study of the system.</p>
<p>The 8-channel RMPI was implemented in 90 nm CMOS and was validated by physical measurements of the fabricated chip. The implemented RMPI achieves an EIBW of 2 GHz, with > 54 dB of dynamic range. Most notably, the aggregate digitization rate is fs = 320 Msps, 12.5× lower than the Nyquist rate.</p>
https://thesis.library.caltech.edu/id/eprint/7163A Geometric Analysis of Convex Demixing
https://resolver.caltech.edu/CaltechTHESIS:05202013-091317123
Authors: {'items': [{'email': 'michael.b.mccoy@gmail.com', 'id': 'McCoy-Michael-Brian', 'name': {'family': 'McCoy', 'given': 'Michael Brian'}, 'orcid': '0000-0002-9479-2090', 'show_email': 'NO'}]}
Year: 2013
DOI: 10.7907/156S-EZ89
<p>Demixing is the task of identifying multiple signals given only their sum and prior information about their structures. Examples of demixing problems include (i) separating a signal that is sparse with respect to one basis from a signal that is sparse with respect to a second basis; (ii) decomposing an observed matrix into low-rank and sparse components; and (iii) identifying a binary codeword with impulsive corruptions. This thesis describes and analyzes a convex optimization framework for solving an array of demixing problems.</p>
<p>Our framework includes a random orientation model for the constituent signals that ensures the structures are incoherent. This work introduces a summary parameter, the statistical dimension, that reflects the intrinsic complexity of a signal. The main result indicates that the difficulty of demixing under this random model depends only on the total complexity of the constituent signals involved: demixing succeeds with high probability when the sum of the complexities is less than the ambient dimension; otherwise, it fails with high probability.</p>
<p>The fact that a phase transition between success and failure occurs in demixing is a consequence of a new inequality in conic integral geometry. Roughly speaking, this inequality asserts that a convex cone behaves like a subspace whose dimension is equal to the statistical dimension of the cone. When combined with a geometric optimality condition for demixing, this inequality provides precise quantitative information about the phase transition, including the location and width of the transition region.</p> https://thesis.library.caltech.edu/id/eprint/7726Convex Analysis for Minimizing and Learning Submodular Set Functions
https://resolver.caltech.edu/CaltechTHESIS:05312013-151014984
Authors: {'items': [{'email': 'peterstobbe@gmail.com', 'id': 'Stobbe-Peter', 'name': {'family': 'Stobbe', 'given': 'Peter'}, 'show_email': 'NO'}]}
Year: 2013
DOI: 10.7907/1A1J-SA64
<p>The connections between convexity and submodularity are explored, for purposes of minimizing and learning submodular set functions.</p>
<p>First, we develop a novel method for minimizing a particular class of submodular functions, which can be expressed as a sum of concave functions composed with modular functions. The basic algorithm uses an accelerated first order method applied to a smoothed version of its convex extension. The smoothing algorithm is particularly novel as it allows us to treat general concave potentials without needing to construct a piecewise linear approximation as with graph-based techniques.</p>
<p>Second, we derive the general conditions under which it is possible to find a minimizer of a submodular function via a convex problem. This provides a framework for developing submodular minimization algorithms. The framework is then used to develop several algorithms that can be run in a distributed fashion. This is particularly useful for applications where the submodular objective function consists of a sum of many terms, each term dependent on a small part of a large data set.</p>
<p>Lastly, we approach the problem of learning set functions from an unorthodox perspective---sparse reconstruction. We demonstrate an explicit connection between the problem of learning set functions from random evaluations and that of sparse signals. Based on the observation that the Fourier transform for set functions satisfies exactly the conditions needed for sparse reconstruction algorithms to work, we examine some different function classes under which uniform reconstruction is possible.</p>
https://thesis.library.caltech.edu/id/eprint/7798Topics in Randomized Numerical Linear Algebra
https://resolver.caltech.edu/CaltechTHESIS:06102013-100609092
Authors: {'items': [{'email': 'swiftset@gmail.com', 'id': 'Gittens-Alex-A', 'name': {'family': 'Gittens', 'given': 'Alex A.'}, 'show_email': 'NO'}]}
Year: 2013
DOI: 10.7907/3K1S-R458
<p>This thesis studies three classes of randomized numerical linear algebra algorithms, namely: (i) randomized matrix sparsification algorithms, (ii) low-rank approximation algorithms that use randomized unitary transformations, and (iii) low-rank approximation algorithms for positive-semidefinite (PSD) matrices. </p>
<p>Randomized matrix sparsification algorithms set randomly chosen entries of the input matrix to zero. When the approximant is substituted for the original matrix in computations, its sparsity allows one to employ faster sparsity-exploiting algorithms. This thesis contributes bounds on the approximation error of nonuniform randomized sparsification schemes, measured in the spectral norm and two NP-hard norms that are of interest in computational graph theory and subset selection applications.</p>
<p> Low-rank approximations based on randomized unitary transformations have several desirable properties: they have low communication costs, are amenable to parallel implementation, and exploit the existence of fast transform algorithms. This thesis investigates the tradeoff between the accuracy and cost of generating such approximations. State-of-the-art spectral and Frobenius-norm error bounds are provided. </p>
<p> The last class of algorithms considered are SPSD "sketching" algorithms. Such sketches can be computed faster than approximations based on projecting onto mixtures of the columns of the matrix. The performance of several such sketching schemes is empirically evaluated using a suite of canonical matrices drawn from machine learning and data analysis applications, and a framework is developed for establishing theoretical error bounds. </p>
<p> In addition to studying these algorithms, this thesis extends the Matrix Laplace Transform framework to derive Chernoff and Bernstein inequalities that apply to all the eigenvalues of certain classes of random matrices. These inequalities are used to investigate the behavior of the singular values of a matrix under random sampling, and to derive convergence rates for each individual eigenvalue of a sample covariance matrix.</p>https://thesis.library.caltech.edu/id/eprint/7880Random Propagation in Complex Systems: Nonlinear Matrix Recursions and Epidemic Spread
https://resolver.caltech.edu/CaltechTHESIS:05232014-172754261
Authors: {'items': [{'email': 'ctznahj@gmail.com', 'id': 'Ahn-Hyoung-Jun', 'name': {'family': 'Ahn', 'given': 'Hyoung Jun'}, 'show_email': 'NO'}]}
Year: 2014
DOI: 10.7907/MC7M-EE22
This dissertation studies long-term behavior of random Riccati recursions and mathematical epidemic model. Riccati recursions are derived from Kalman filtering. The error covariance matrix of Kalman filtering satisfies Riccati recursions. Convergence condition of time-invariant Riccati recursions are well-studied by researchers. We focus on time-varying case, and assume that regressor matrix is random and identical and independently distributed according to given distribution whose probability distribution function is continuous, supported on whole space, and decaying faster than any polynomial. We study the geometric convergence of the probability distribution. We also study the global dynamics of the epidemic spread over complex networks for various models. For instance, in the discrete-time Markov chain model, each node is either healthy or infected at any given time. In this setting, the number of the state increases exponentially as the size of the network increases. The Markov chain has a unique stationary distribution where all the nodes are healthy with probability 1. Since the probability distribution of Markov chain defined on finite state converges to the stationary distribution, this Markov chain model concludes that epidemic disease dies out after long enough time. To analyze the Markov chain model, we study nonlinear epidemic model whose state at any given time is the vector obtained from the marginal probability of infection of each node in the network at that time. Convergence to the origin in the epidemic map implies the extinction of epidemics. The nonlinear model is upper-bounded by linearizing the model at the origin. As a result, the origin is the globally stable unique fixed point of the nonlinear model if the linear upper bound is stable. The nonlinear model has a second fixed point when the linear upper bound is unstable. We work on stability analysis of the second fixed point for both discrete-time and continuous-time models. Returning back to the Markov chain model, we claim that the stability of linear upper bound for nonlinear model is strongly related with the extinction time of the Markov chain. We show that stable linear upper bound is sufficient condition of fast extinction and the probability of survival is bounded by nonlinear epidemic map.https://thesis.library.caltech.edu/id/eprint/8391Community Sense and Response Systems
https://resolver.caltech.edu/CaltechTHESIS:04152014-111007328
Authors: {'items': [{'email': 'mnfaulk@gmail.com', 'id': 'Faulkner-Matthew-Nicholas', 'name': {'family': 'Faulkner', 'given': 'Matthew Nicholas'}, 'show_email': 'NO'}]}
Year: 2014
DOI: 10.7907/QFM5-FH06
<p>The proliferation of smartphones and other internet-enabled, sensor-equipped consumer devices enables us to sense and act upon the physical environment in unprecedented ways. This thesis considers Community Sense-and-Response (CSR) systems, a new class of web application for acting on sensory data gathered from participants' personal smart devices. The thesis describes how rare events can be reliably detected using a decentralized anomaly detection architecture that performs client-side anomaly detection and server-side event detection. After analyzing this decentralized anomaly detection approach, the thesis describes how weak but spatially structured events can be detected, despite significant noise, when the events have a sparse representation in an alternative basis. Finally, the thesis describes how the statistical models needed for client-side anomaly detection may be learned efficiently, using limited space, via coresets.</p>
<p>The Caltech Community Seismic Network (CSN) is a prototypical example of a CSR system that harnesses accelerometers in volunteers' smartphones and consumer electronics. Using CSN, this thesis presents the systems and algorithmic techniques to design, build and evaluate a scalable network for real-time awareness of spatial phenomena such as dangerous earthquakes.</p>https://thesis.library.caltech.edu/id/eprint/8188Convex Relaxation for Low-Dimensional Representation: Phase Transitions and Limitations
https://resolver.caltech.edu/CaltechTHESIS:08182014-091546460
Authors: {'items': [{'email': 'sametoymak@gmail.com', 'id': 'Oymak-Samet', 'name': {'family': 'Oymak', 'given': 'Samet'}, 'show_email': 'NO'}]}
Year: 2015
DOI: 10.7907/Z9S46PWX
<p>There is a growing interest in taking advantage of possible patterns and structures in data so as to extract the desired information and overcome the curse of dimensionality. In a wide range of applications, including computer vision, machine learning, medical imaging, and social networks, the signal that gives rise to the observations can be modeled to be approximately sparse and exploiting this fact can be very beneficial. This has led to an immense interest in the problem of efficiently reconstructing a sparse signal from limited linear observations. More recently, low-rank approximation techniques have become prominent tools to approach problems arising in machine learning, system identification and quantum tomography.</p>
<p>In sparse and low-rank estimation problems, the challenge is the inherent intractability of the objective function, and one needs efficient methods to capture the low-dimensionality of these models. Convex optimization is often a promising tool to attack such problems. An intractable problem with a combinatorial objective can often be "relaxed" to obtain a tractable but almost as powerful convex optimization problem. This dissertation studies convex optimization techniques that can take advantage of low-dimensional representations of the underlying high-dimensional data. We provide provable guarantees that ensure that the proposed algorithms will succeed under reasonable conditions, and answer questions of the following flavor:</p>
<UL>
<LI> For a given number of measurements, can we reliably estimate the true signal?</LI>
<LI> If so, how good is the reconstruction as a function of the model parameters?</LI>
</UL>
<p>More specifically, i) Focusing on linear inverse problems, we generalize the classical error bounds known for the least-squares technique to the lasso formulation, which incorporates the signal model. ii) We show that intuitive convex approaches do not perform as well as expected when it comes to signals that have multiple low-dimensional structures simultaneously. iii) Finally, we propose convex relaxations for the graph clustering problem and give sharp performance guarantees for a family of graphs arising from the so-called stochastic block model. We pay particular attention to the following aspects. For i) and ii), we aim to provide a general geometric framework, in which the results on sparse and low-rank estimation can be obtained as special cases. For i) and iii), we investigate the precise performance characterization, which yields the right constants in our bounds and the true dependence between the problem parameters.</p>https://thesis.library.caltech.edu/id/eprint/8635Computational Microscopy: Turning Megapixels into Gigapixels
https://resolver.caltech.edu/CaltechTHESIS:10202015-173005082
Authors: {'items': [{'email': 'roarke.horstmeyer@gmail.com', 'id': 'Horstmeyer-Roarke-William', 'name': {'family': 'Horstmeyer', 'given': 'Roarke William'}, 'orcid': '0000-0002-2480-9141', 'show_email': 'YES'}]}
Year: 2016
DOI: 10.7907/Z95Q4T1W
The layout of a typical optical microscope has remained effectively unchanged over the past century. Besides the widespread adoption of digital focal plane arrays, relatively few innovations have helped improve standard imaging with bright-field microscopes. This thesis presents a new microscope imaging method, termed Fourier ptychography, which uses an LED to provide variable sample illumination and post-processing algorithms to recover useful sample information. Examples include increasing the resolution of megapixel-scale images to one gigapixel, measuring quantitative phase, achieving oil-immersion quality resolution without an immersion medium, and recovering complex
three dimensional sample structure.https://thesis.library.caltech.edu/id/eprint/9231Convex Programming-Based Phase Retrieval: Theory and Applications
https://resolver.caltech.edu/CaltechTHESIS:05312016-051759406
Authors: {'items': [{'email': 'kishorejaganathan@gmail.com', 'id': 'Jaganathan-Kishore', 'name': {'family': 'Jaganathan', 'given': 'Kishore'}, 'show_email': 'YES'}]}
Year: 2016
DOI: 10.7907/Z9C82775
<p>Phase retrieval is the problem of recovering a signal from its Fourier magnitude. This inverse problem arises in many areas of engineering and applied physics, and has been studied for nearly a century. Due to the absence of Fourier phase, the available information is incomplete in general. Classic identifiability results state that phase retrieval of one-dimensional signals is impossible, and that phase retrieval of higher-dimensional signals is almost surely possible under mild conditions. However, there are no efficient recovery algorithms with theoretical guarantees. Classic algorithms are based on the method of alternating projections. These algorithms do not have theoretical guarantees, and have limited recovery abilities due to the issue of convergence to local optima.</p>
<p>Recently, there has been a renewed interest in phase retrieval due to technological advances in measurement systems and theoretical developments in structured signal recovery. In particular, it is now possible to obtain specific kinds of additional magnitude-only information about the signal, depending on the application. The premise is that, by carefully redesigning the measurement process, one could potentially overcome the issues of phase retrieval. To this end, another approach could be to impose certain kinds of prior on the signal, depending on the application. On the algorithmic side, convex programming based approaches have played a key role in modern phase retrieval, inspired by their success in provably solving several quadratic constrained problems.</p>
<p>In this work, we study several variants of phase retrieval using modern tools, with focus on applications like X-ray crystallography, diffraction imaging, optics, astronomy and radar. In the one-dimensional setup, we first develop conditions, which when satisfied, allow unique reconstruction. Then, we develop efficient recovery algorithms based on convex programming, and provide theoretical guarantees. The theory and algorithms we develop are independent of the dimension of the signal, and hence can be used in all the aforementioned applications. We also perform a comparative numerical study of the convex programming and the alternating projection based algorithms. Numerical simulations clearly demonstrate the superior ability of the convex programming based methods, both in terms of successful recovery in the noiseless setting and stable reconstruction in the noisy setting.</p>https://thesis.library.caltech.edu/id/eprint/9814Kinematics and Local Motion Planning for Quasi-static Whole-body Mobile Manipulation
https://resolver.caltech.edu/CaltechTHESIS:05222016-095145651
Authors: {'items': [{'email': 'krishnashankar+thesis@gmail.com', 'id': 'Shankar-Krishna', 'name': {'family': 'Shankar', 'given': 'Krishna'}, 'show_email': 'YES'}]}
Year: 2016
DOI: 10.7907/Z9KK98RX
<p>This thesis studies mobile robotic manipulators, where one or more robot manipulator arms are
integrated with a mobile robotic base. The base could be a wheeled or tracked vehicle, or it might be a
multi-limbed locomotor. As robots are increasingly deployed in complex and unstructured environments,
the need for mobile manipulation increases. Mobile robotic assistants have the potential to revolutionize human
lives in a large variety of settings including home, industrial and outdoor environments.</p>
<p>Mobile Manipulation is the use or study of such mobile robots as they interact with physical
objects in their environment. As compared to fixed base manipulators, mobile manipulators can take
advantage of the base mechanism’s added degrees of freedom in the task planning and execution process.
But their use also poses new problems in the analysis and control of base system stability, and the
planning of coordinated base and arm motions. For mobile manipulators to be successfully and
efficiently used, a thorough understanding of their kinematics, stability, and capabilities is required.
Moreover, because mobile manipulators typically possess a large number of actuators, new and efficient
methods to coordinate their large numbers of degrees of freedom are needed to make them practically
deployable. This thesis develops new kinematic and stability analyses of mobile manipulation, and new
algorithms to efficiently plan their motions.</p>
<p>I first develop detailed and novel descriptions of the kinematics governing the operation of multi-
limbed legged robots working in the presence of gravity, and whose limbs may also be simultaneously
used for manipulation. The fundamental stance constraint that arises from simple assumptions about
friction and the ground contact and feasible motions is derived. Thereafter, a local relationship between
joint motions and motions of the robot abdomen and reaching limbs is developed. Baseeon these
relationships, one can define and analyze local kinematic qualities including limberness, wrench
resistance and local dexterity. While previous researchers have noted the similarity between multi-
fingered grasping and quasi-static manipulation, this thesis makes explicit connections between these two
problems.</p>
<p>The kinematic expressions form the basis for a local motion planning problem that that
determines the joint motions to achieve several simultaneous objectives while maintaining stance stability
in the presence of gravity. This problem is translated into a convex quadratic program entitled the
balanced priority solution, whose existence and uniqueness properties are developed. This problem is
related in spirit to the classical redundancy resoxlution and task-priority approaches. With some simple
modifications, this local planning and optimization problem can be extended to handle a large variety of
goals and constraints that arise in mobile-manipulation. This local planning problem applies readily to
other mobile bases including wheeled and articulated bases. This thesis describes the use of the local
planning techniques to generate global plans, as well as for use within a feedback loop. The work in this
thesis is motivated in part by many practical tasks involving the Surrogate and RoboSimian robots at
NASA/JPL, and a large number of examples involving the two robots, both real and simulated, are
provided.</p>
<p>Finally, this thesis provides an analysis of simultaneous force and motion control for multi-
limbed legged robots. Starting with a classical linear stiffness relationship, an analysis of this problem for
multiple point contacts is described. The local velocity planning problem is extended to include
generation of forces, as well as to maintain stability using force-feedback. This thesis also provides a
concise, novel definition of static stability, and proves some conditions under which it is satisfied.</p>https://thesis.library.caltech.edu/id/eprint/9731Recovering Structured Signals in High Dimensions via Non-Smooth Convex Optimization: Precise Performance Analysis
https://resolver.caltech.edu/CaltechTHESIS:06032016-144604076
Authors: {'items': [{'email': 'thramboc@gmail.com', 'id': 'Thrampoulidis-Christos', 'name': {'family': 'Thrampoulidis', 'given': 'Christos'}, 'orcid': '0000-0001-9053-9365', 'show_email': 'NO'}]}
Year: 2016
DOI: 10.7907/Z998850V
<p>The typical scenario that arises in modern large-scale inference problems is one where the ambient dimension of the unknown signal is very large (e.g., high-resolution images, recommendation systems), yet its desired properties lie in some low-dimensional structure such as, sparsity or low-rankness. In the past couple of decades, non-smooth convex optimization methods have emerged as a powerful tool to extract those structures, since they are often computationally efficient, and also they offer enough flexibility while simultaneously being amenable to performance analysis. Especially, since the advent of Compressed Sensing (CS) there has been significant progress towards this direction. One of the key ideas is that random linear measurements offer an efficient way to acquire structured signals. When the measurement matrix has entries iid from a wide class of distributions (including Gaussians), a series of recent works have established a complete and transparent theory that precisely captures the performance in the noiseless setting. In the more practical scenario of noisy measurements the performance analysis task becomes significantly more challenging and corresponding precise and unifying results have hitherto remained scarce. The available class of optimization methods, often referred to as regularized M-estimators, is now richer; additional factors (e.g., the noise distribution, the loss function, and the regularizer parameter) and several different measures of performance (e.g., squared-error, probability of support recovery) need to be taken into account.</p>
<p>This thesis develops a novel analytical framework that overcomes these challenges, and establishes {precise asymptotic performance guarantees for regularized M-estimators under Gaussian measurement matrices. In particular, the framework allows for a unifying analysis among different instances (such as the Generalized LASSO, and the LAD, to name a few) and accounts for a wide class of performance measures. Among others, we show results on the mean-squared-error of the Generalized-LASSO method and make insightful connections to the classical theory of ordinary least squares and to noiseless CS. Empirical evidence is presented that suggests the Gaussian assumption is not necessary. Beyond iid measurement matrices, motivated by practical considerations, we study certain classes of random matrices with orthogonal rows and establish their superior performance when compared to Gaussians.</p>
<p>A prominent application of this generic theory is on the analysis of the bit-error rate (BER) of the popular convex-relaxation of the Maximum Likelihood decoder for recovering BPSK signals in a massive Multiple Input Multiple Output setting. Our precise BER analysis allows comparison of these schemes to the unattainable Matched-filter bound, and further suggests means to provably boost their performance. </p>
<p>The last challenge is to evaluate the performance under non-linear measurements. For the Generalized LASSO, it is shown that this is (asymptotically) equivalent to the one under noisy linear measurements with appropriately scaled variance. This encompasses state-of-the art theoretical results of one-bit CS , and is also used to prove that the optimal quantizer of the measurements that minimizes the estimation error of the Generalized LASSO is the celebrated Lloyd-Max quantizer.</p>
<p>The framework is based on Gaussian process methods; in particular, on a new strong and tight version of a classical comparison inequality (due to Gordon, 1988) in the presence of additional convexity assumptions. We call this the Convex Gaussian Min-max Theorem (CGMT).</p>https://thesis.library.caltech.edu/id/eprint/9836Recovering Structured Low-rank Operators Using Nuclear Norms
https://resolver.caltech.edu/CaltechTHESIS:02082017-062956314
Authors: {'items': [{'email': 'john.bruer@gmail.com', 'id': 'Bruer-John-Jacob', 'name': {'family': 'Bruer', 'given': 'John Jacob'}, 'orcid': '0000-0003-4590-3038', 'show_email': 'NO'}]}
Year: 2017
DOI: 10.7907/Z9F18WQS
<p>This work considers the problem of recovering matrices and operators from limited and/or noisy observations. Whereas matrices result from summing tensor products of vectors, operators result from summing tensor products of matrices. These constructions lead to viewing both matrices and operators as the sum of "simple" rank-1 factors.</p>
<p>A popular line of work in this direction is low-rank matrix recovery, i.e., using linear measurements of a matrix to reconstruct it as the sum of few rank-1 factors. Rank minimization problems are hard in general, and a popular approach to avoid them is convex relaxation. Using the trace norm as a surrogate for rank, the low-rank matrix recovery problem becomes convex.</p>
<p>While the trace norm has received much attention in the literature, other convexifications are possible. This thesis focuses on the class of nuclear norms—a class that includes the trace norm itself. Much as the trace norm is a convex surrogate for the matrix rank, other nuclear norms provide convex complexity measures for additional matrix structure. Namely, nuclear norms measure the structure of the factors used to construct the matrix.</p>
<p>Transitioning to the operator framework allows for novel uses of nuclear norms in recovering these structured matrices. In particular, this thesis shows how to lift structured matrix factorization problems to rank-1 operator recovery problems. This new viewpoint allows nuclear norms to measure richer types of structures present in matrix factorizations.</p>
<p>This work also includes a Python software package to model and solve structured operator recovery problems. Systematic numerical experiments in operator denoising demonstrate the effectiveness of nuclear norms in recovering structured operators. In particular, choosing a specific nuclear norm that corresponds to the underlying factor structure of the operator improves the performance of the recovery procedures when compared, for instance, to the trace norm.
Applications in hyperspectral imaging and self-calibration demonstrate the additional flexibility gained by utilizing operator (as opposed to matrix) factorization models.</p>https://thesis.library.caltech.edu/id/eprint/10048Concentration Inequalities of Random Matrices and Solving Ptychography with a Convex Relaxation
https://resolver.caltech.edu/CaltechTHESIS:09022016-135721172
Authors: {'items': [{'email': 'richardchen100@gmail.com', 'id': 'Chen-Yuhua-Richard', 'name': {'family': 'Chen', 'given': 'Yuhua Richard'}, 'show_email': 'NO'}]}
Year: 2017
DOI: 10.7907/Z9M906MF
<p>Random matrix theory has seen rapid development in recent years. In particular, researchers have developed many non-asymptotic matrix concentration inequalities that parallel powerful scalar concentration inequalities. In this thesis, we focus on three topics: 1) estimating sparse covariance matrix using matrix concentration inequalities, 2) constructing the matrix phi-entropy to derive matrix concentration inequalities, 3) developing scalable algorithms to solve the phase recovery problem of ptychography based on low-rank matrix factorization.</p>
<p>Estimation of covariance matrix is an important subject. In the setting of high dimensional statistics, the number of samples can be small in comparison to the dimension of the problem, thus estimating the complete covariance matrix is unfeasible. By assuming that the covariance matrix satisfies some sparsity assumptions, prior work has proved that it is feasible to estimate the sparse covariance matrix of Gaussian distribution using the masked sample covariance estimator. In this thesis, we use a new approach and apply non-asymptotic matrix concentration inequalities to obtain tight sample bounds for estimating the sparse covariance matrix of subgaussian distributions.</p>
<p>The entropy method is a powerful approach in developing scalar concentration inequalities. The key ingredient is the subadditivity property that scalar entropy function exhibits. In this thesis, we construct a new concept of matrix phi-entropy and prove that matrix phi-entropy also satisfies a subadditivity property similar to the scalar form. We apply this new concept of matrix phi-entropy to derive non-asymptotic matrix concentration inequalities.</p>
<p>Ptychography is a computational imaging technique which transforms low-resolution intensity-only images into a high-resolution complex recovery of the signal. Conventional algorithms are based on alternating projection, which lacks theoretical guarantees for their performance. In this thesis, we construct two new algorithms. The first algorithm relies on a convex formulation of the ptychography problem and on low-rank matrix recovery. This algorithm improves traditional approaches' performance but has high computational cost. The second algorithm achieves near-linear runtime and memory complexity by factorizing the objective matrix into its low-rank components and approximates the first algorithm's imaging quality.</p>https://thesis.library.caltech.edu/id/eprint/9911Fitting Convex Sets to Data: Algorithms and Applications
https://resolver.caltech.edu/CaltechTHESIS:09282018-091842941
Authors: {'items': [{'email': 'sohyongsheng87@gmail.com', 'id': 'Soh-Yong-Sheng', 'name': {'family': 'Soh', 'given': 'Yong Sheng'}, 'orcid': '0000-0003-3367-1401', 'show_email': 'YES'}]}
Year: 2019
DOI: 10.7907/jkmq-b430
<p>This thesis concerns the geometric problem of finding a convex set that best fits a given dataset. Our question serves as an abstraction for data-analytical tasks arising in a range of scientific and engineering applications. We focus on two specific instances:</p>
<p>1. A key challenge that arises in solving inverse problems is ill-posedness due to a lack of measurements. A prominent family of methods for addressing such issues is based on augmenting optimization-based approaches with a convex penalty function so as to induce a desired structure in the solution. These functions are typically chosen using prior knowledge about the data. In Chapter 2, we study the problem of learning convex penalty functions directly from data for settings in which we lack the domain expertise to choose a penalty function. Our solution relies on suitably transforming the problem of learning a penalty function into a fitting task.</p>
<p>2. In Chapter 3, we study the problem of fitting tractably-described convex sets given the optimal value of linear functionals evaluated in different directions.</p>
<p>Our computational procedures for fitting convex sets are based on a broader framework in which we search among families of sets that are parameterized as linear projections of a fixed structured convex set. The utility of such a framework is that our procedures reduce to the computation of simple primitives at each iteration, and these primitives can be further performed in parallel. In addition, by choosing structured sets that are non-polyhedral, our framework provides a principled way to search over expressive collections of non-polyhedral descriptions; in particular, convex sets that can be described via semidefinite programming provide a rich source of non-polyhedral sets, and such sets feature prominently in this thesis.</p>
<p>We provide performance guarantees for our procedures. Our analyses rely on understanding geometrical aspects of determinantal varieties, building on ideas from empirical processes as well as random matrix theory. We demonstrate the utility of our framework with numerical experiments on synthetic data as well as applications in image denoising and computational geometry.</p>
<p>As secondary contributions, we consider the following:</p>
<p>1. In Chapter 4, we consider the problem of optimally approximating a convex set as a spectrahedron of a given size. Spectrahedra are sets that can be expressed as feasible regions of a semidefinite program.</p>
<p>2. In Chapter 5, we consider change-point estimation in a sequence of high-dimensional signals given noisy observations. Our method integrates classical approaches with a convex optimization-based step that is useful for exploiting structure in high-dimensional data.</p>https://thesis.library.caltech.edu/id/eprint/11208Riemannian Optimization for Convex and Non-Convex Signal Processing and Machine Learning Applications
https://resolver.caltech.edu/CaltechTHESIS:06012020-120425051
Authors: {'items': [{'email': 'douik.ahmed@gmail.com', 'id': 'Douik-Ahmed', 'name': {'family': 'Douik', 'given': 'Ahmed'}, 'orcid': '0000-0001-7791-9443', 'show_email': 'NO'}]}
Year: 2020
DOI: 10.7907/jt3c-0m30
The performance of most algorithms for signal processing and machine learning applications highly depends on the underlying optimization algorithms. Multiple techniques have been proposed for solving convex and non-convex problems such as interior-point methods and semidefinite programming. However, it is well known that these algorithms are not ideally suited for large-scale optimization with a high number of variables and/or constraints. This thesis exploits a novel optimization method, known as Riemannian optimization, for efficiently solving convex and non-convex problems with signal processing and machine learning applications. Unlike most optimization techniques whose complexities increase with the number of constraints, Riemannian methods smartly exploit the structure of the search space, a.k.a., the set of feasible solutions, to reduce the embedded dimension and efficiently solve optimization problems in a reasonable time. However, such efficiency comes at the expense of universality as the geometry of each manifold needs to be investigated individually. This thesis explains the steps of designing first and second-order Riemannian optimization methods for smooth matrix manifolds through the study and design of optimization algorithms for various applications. In particular, the paper is interested in contemporary applications in signal processing and machine learning, such as community detection, graph-based clustering, phase retrieval, and indoor and outdoor location determination. Simulation results are provided to attest to the efficiency of the proposed methods against popular generic and specialized solvers for each of the above applications.https://thesis.library.caltech.edu/id/eprint/13758Universality Laws and Performance Analysis of the Generalized Linear Models
https://resolver.caltech.edu/CaltechTHESIS:06092020-005908250
Authors: {'items': [{'email': 'eabbasia@gmail.com', 'id': 'Abbasi-Ehsan', 'name': {'family': 'Abbasi', 'given': 'Ehsan'}, 'orcid': '0000-0002-0185-7933', 'show_email': 'NO'}]}
Year: 2020
DOI: 10.7907/873c-ej41
<p>In the past couple of decades, non-smooth convex optimization has emerged as a powerful tool for the recovery of structured signals (sparse, low rank, etc.) from noisy linear or non-linear measurements in a variety of applications in genomics, signal processing, wireless communications, machine learning, etc.. Taking advantage of the particular structure of the unknown signal of interest is critical since in most of these applications, the dimension <i>p</i> of the signal to be estimated is comparable, or even larger than the number of observations <i>n</i>. With the advent of Compressive Sensing there has been a very large number of theoretical results that study the estimation performance of non-smooth convex optimization in such a <i>high-dimensional setting</i>.</p>
<p>A popular approach for estimating an unknown signal β₀ ϵ ℝ<i>ᵖ</i> in a <i>generalized linear model</i>, with observations <b>y</b> = g(<b>X</b>β₀) ϵ ℝ<i>ⁿ</i>, is via solving the estimator β̂ = arg min<sub>β</sub> <i>L</i>(<b>y</b>, <b>X</b>β + <i>λf</i>(<i>β</i>). Here, <i>L</i>(•,•) is a loss function which is convex with respect to its second argument, and <i>f</i>(•) is a regularizer that enforces the structure of the unknown β₀. We first analyze the generalization error performance of this estimator, for the case where the entries of <b>X</b> are drawn <i>independently from real standard Gaussian</i> distribution. The <i>precise</i> nature of our analysis permits an accurate performance comparison between different instances of these estimators, and allows to optimally tune the hyperparameters based on the model parameters. We apply our result to some of the most popular cases of generalized linear models, such as M-estimators in linear regression, logistic regression and generalized margin maximizers in binary classification problems, and Poisson regression in count data models. The key ingredient of our proof is the <i>Convex Gaussian Min-max Theorem (CGMT)</i>, which is a tight version of the Gaussian comparison inequality proved by Gordon in 1988. Unfortunately, having real iid entries in the features matrix <b>X</b> is crucial in this theorem, and it cannot be naturally extended to other cases.</p>
<p>But for some special cases, we prove some universality properties and indirectly extend these results to more general designs of the features matrix <b>X</b>, where the entries are not necessarily real, independent, or identically distributed. This extension, enables us to analyze problems that CGMT was incapable of, such as models with quadratic measurements, phase-lift in phase retrieval, and data recovery in massive MIMO, and help us settle a few long standing open problems in these areas.</p>https://thesis.library.caltech.edu/id/eprint/13804Inference, Computation, and Games
https://resolver.caltech.edu/CaltechTHESIS:06082021-005706263
Authors: {'items': [{'email': 'flotosch@gmail.com', 'id': 'Schäfer-Florian-Tobias', 'name': {'family': 'Schäfer', 'given': 'Florian Tobias'}, 'orcid': '0000-0002-4891-0172', 'show_email': 'YES'}]}
Year: 2021
DOI: 10.7907/esyv-2181
<p>In this thesis, we use statistical inference and competitive games to design algorithms for computational mathematics.</p>
<p> In the first part, comprising chapters two through six, we use ideas from Gaussian process statistics to obtain fast solvers for differential and integral equations. We begin by observing the equivalence of conditional (near-)independence of Gaussian processes and the (near-)sparsity of the Cholesky factors of its precision and covariance matrices. This implies the existence of a large class of <em>dense</em> matrices with almost <em>sparse</em> Cholesky factors, thereby greatly increasing the scope of application of sparse Cholesky factorization. Using an elimination ordering and sparsity pattern motivated by the <em>screening effect</em> in spatial statistics, we can compute approximate Cholesky factors of the covariance matrices of Gaussian processes admitting a screening effect in near-linear computational complexity. These include many popular smoothness priors such as the Matérn class of covariance functions.
In the special case of Green's matrices of elliptic boundary value problems (with possibly unknown elliptic operators of arbitrarily high order, with possibly rough coefficients), we can use tools from numerical homogenization to prove the exponential accuracy of our method. This result improves the state-of-the-art for solving general elliptic integral equations and provides the first proof of an exponential screening effect. We also derive a fast solver for elliptic partial differential equations, with accuracy-vs-complexity guarantees that improve upon the state-of-the-art. Furthermore, the resulting solver is performant in practice, frequently beating established algebraic multigrid libraries such as AMGCL and Trilinos on a series of challenging problems in two and three dimensions.
Finally, for any given covariance matrix, we obtain a closed-form expression for its <em>optimal</em> (in terms of Kullback-Leibler divergence) approximate inverse-Cholesky factorization subject to a sparsity constraint, recovering Vecchia approximation and factorized sparse approximate inverses. Our method is highly robust, embarrassingly parallel, and further improves our asymptotic results on the solution of elliptic integral equations. We also provide a way to apply our techniques to sums of independent Gaussian processes, resolving a major limitation of existing methods based on the screening effect. As a result, we obtain fast algorithms for large-scale Gaussian process regression problems with possibly noisy measurements.</p>
<p>In the second part of this thesis, comprising chapters seven through nine, we study continuous optimization through the lens of competitive games. In particular, we consider <em>competitive optimization</em>, where multiple agents attempt to minimize conflicting objectives. In the single-agent case, the updates of gradient descent are minimizers of quadratically regularized linearizations of the loss function. We propose to generalize this idea by using the Nash equilibria of quadratically regularized linearizations of the competitive game as updates (<em>linearize the game</em>). We provide fundamental reasons why the natural notion of linearization for competitive optimization problems is given by the <em>multilinear</em> (as opposed to linear) approximation of the agents' loss functions. The resulting algorithm, which we call <em>competitive gradient descent</em>, thus provides a natural generalization of gradient descent to competitive optimization. By using ideas from information geometry, we extend CGD to competitive mirror descent (CMD) that can be applied to a vast range of constrained competitive optimization problems. CGD and CMD resolve the cycling problem of simultaneous gradient descent and show promising results on problems arising in constrained optimization, robust control theory, and generative adversarial networks. Finally, we point out the <em>GAN-dilemma</em> that refutes the common interpretation of GANs as approximate minimizers of a divergence obtained in the limit of a fully trained discriminator. Instead, we argue that GAN performance relies on the <em>implicit competitive regularization</em> (ICR) due to the simultaneous optimization of generator and discriminator and support this hypothesis with results on low-dimensional model problems and GANs on CIFAR10.</p>https://thesis.library.caltech.edu/id/eprint/14261Applications of Convex Analysis to Signomial and Polynomial Nonnegativity Problems
https://resolver.caltech.edu/CaltechTHESIS:05202021-194439071
Authors: {'items': [{'email': 'rjmurray201693@gmail.com', 'id': 'Murray-Riley-John', 'name': {'family': 'Murray', 'given': 'Riley John'}, 'orcid': '0000-0003-1461-6458', 'show_email': 'NO'}]}
Year: 2021
DOI: 10.7907/vn9x-xj10
<p>Here is a question that is easy to state, but often hard to answer:</p>
<p><i>Is this function nonnegative on this set?</i></p>
<p>When faced with such a question, one often makes appeals to known inequalities. One crafts arguments that are <i>sufficient</i> to establish the nonnegativity of the function, rather than determining the function's precise range of values. This thesis studies sufficient conditions for nonnegativity of signomials and polynomials. Conceptually, signomials may be viewed as generalized polynomials that feature arbitrary real exponents, but with variables restricted to the positive orthant.</p>
<p>Our methods leverage efficient algorithms for a type of convex optimization known as relative entropy programming (REP). By virtue of this integration with REP, our methods can help answer questions like the following:</p>
<p>Is there some function, in this particular space of functions, that is nonnegative on this set?</p>
<p>The ability to answer such questions is <i>extremely</i> useful in applied mathematics.
Alternative approaches in this same vein (e.g., methods for polynomials based on semidefinite programming)
have been used successfully as convex relaxation frameworks for nonconvex optimization, as mechanisms for analyzing dynamical systems, and even as tools for solving nonlinear partial differential equations.</p>
<p>This thesis builds from the <i>sums of arithmetic-geometric exponentials</i> or <i>SAGE</i> approach to signomial nonnegativity. The term "exponential" appears in the SAGE acronym because SAGE parameterizes signomials in terms of exponential functions.</p>
<p>Our first round of contributions concern the original SAGE approach. We employ basic techniques in convex analysis and convex geometry to derive structural results for spaces of SAGE signomials and exactness results for SAGE-based REP relaxations of nonconvex signomial optimization problems.
We frame our analysis primarily in terms of the coefficients of a signomial's basis expansion rather than in terms of signomials themselves.
The effect of this framing is that our results for signomials readily transfer to polynomials. In particular, we are led to define a new concept of <i>SAGE polynomials</i>. For sparse polynomials, this method offers an exponential efficiency improvement relative to certificates of nonnegativity obtained through semidefinite programming.</p>
<p>We go on to create the <i>conditional SAGE</i> methodology for exploiting convex substructure in constrained signomial nonnegativity problems.
The basic insight here is that since the standard relative entropy representation of SAGE signomials is obtained by a suitable application of convex duality, we are free to add additional convex constraints into the duality argument. In the course of explaining this idea we provide some illustrative examples in signomial optimization and analysis of chemical dynamics.</p>
<p>The majority of this thesis is dedicated to exploring fundamental questions surrounding conditional SAGE signomials. We approach these questions through analysis frameworks of <i>sublinear circuits</i> and <i>signomial rings</i>. These sublinear circuits generalize simplicial circuits of affine-linear matroids, and lead to rich modes of analysis for sets that are simultaneously convex in the usual sense and convex under a logarithmic transformation. The concept of signomial rings lets us develop a powerful signomial Positivstellensatz and an elementary signomial moment theory. The Positivstellensatz provides for an effective hierarchy of REP relaxations for approaching the value of a nonconvex signomial minimization problem from below, as well as a first-of-its-kind hierarchy for approaching the same value from above.</p>
<p>In parallel with our mathematical work, we have developed the sageopt python package. Sageopt drives all the examples and experiments used throughout this thesis, and has been used by engineers to solve high-degree polynomial optimization problems at scales unattainable by alternative methods.
We conclude this thesis with an explanation of how our theoretical results affected sageopt's design.</p>https://thesis.library.caltech.edu/id/eprint/14169Optimisation & Generalisation in Networks of Neurons
https://resolver.caltech.edu/CaltechTHESIS:10132022-000100592
Authors: {'items': [{'email': 'jembernstein@gmail.com', 'id': 'Bernstein-Jeremy-David', 'name': {'family': 'Bernstein', 'given': 'Jeremy David'}, 'orcid': '0000-0001-9110-7476', 'show_email': 'NO'}]}
Year: 2023
DOI: 10.7907/1jz8-5t85
<p>The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. The thesis tackles two central questions. Given training data and a network architecture:</p>
<ol>
<li style="text-align:left"><span style="padding-left:10px">Which weight setting will generalise best to unseen data, and why?</span></li>
<li style="text-align:left"><span style="padding-left:10px">What optimiser should be used to recover this weight setting?</span></li>
</ol>
<p>On optimisation, an essential feature of neural network training is that the network weights affect the loss function only indirectly through their appearance in the network architecture. This thesis proposes a three-step framework for deriving novel “architecture aware” optimisation algorithms. The first step—termed <em>functional majorisation</em>—is to majorise a series expansion of the loss function in terms of functional perturbations. The second step is to derive <em>architectural perturbation bounds</em> that relate the size of functional perturbations to the size of weight perturbations. The third step is to substitute these architectural perturbation bounds into the functional majorisation of the loss and to obtain an optimisation algorithm via minimisation. This constitutes an application of the <em>majorise-minimise meta-algorithm</em> to neural networks.</p>
<p>On generalisation, a promising recent line of work has applied PAC-Bayes theory to derive non-vacuous generalisation guarantees for neural networks. Since these guarantees control the average risk of ensembles of networks, they do not address which individual network should generalise best. To close this gap, the thesis rekindles an old idea from the kernels literature: the <em>Bayes point machine</em>. A Bayes point machine is a single classifier that approximates the aggregate prediction of an ensemble of classifiers. Since aggregation reduces the variance of ensemble predictions, Bayes point machines tend to generalise better than other ensemble members. The thesis shows that the space of neural networks consistent with a training set concentrates on a Bayes point machine if both the network width and normalised margin are sent to infinity. This motivates the practice of returning a wide network of large normalised margin.</p>
<p>Potential applications of these ideas include novel methods for uncertainty quantification, more efficient numerical representations for neural hardware, and optimisers that transfer hyperparameters across learning problems.</p>https://thesis.library.caltech.edu/id/eprint/15041Low-Rank Matrix Recovery: Manifold Geometry and Global Convergence
https://resolver.caltech.edu/CaltechTHESIS:05302023-222447373
Authors: {'items': [{'email': 'zyzhang0907@gmail.com', 'id': 'Zhang-Ziyun', 'name': {'family': 'Zhang', 'given': 'Ziyun'}, 'orcid': '0000-0002-5794-2387', 'show_email': 'YES'}]}
Year: 2023
DOI: 10.7907/hd6q-g460
<p>Low-rank matrix recovery problems are prevalent in modern data science, machine learning, and artificial intelligence, and the low-rank property of matrices is widely exploited to extract the hidden low-complexity structure in massive datasets. Compared with Burer-Monteiro factorization in the Euclidean space, using the low-rank matrix manifold has its unique advantages, as it eliminates duplicated spurious points and reduces the polynomial order of the objective function. Yet a few fundamental questions have remained unanswered until recently. We highlight two problems here in particular, which are the global geometry of the manifold and the global convergence guarantee.</p>
<p>As for the global geometry, we point out that there exist some spurious critical points on the boundary of the low-rank matrix manifold Mᵣ, which have rank smaller than r but can serve as limit points of iterative sequences in the manifold Mᵣ. For the least squares loss function, the spurious critical points are rank-deficient matrices that capture part of the eigen spaces of the ground truth. Unlike classical strict saddle points, their Riemannian gradient is singular and their Riemannian Hessian is unbounded.</p>
<p>We show that randomly initialized Riemannian gradient descent almost surely escapes some of the spurious critical points. To prove this result, we first establish the asymptotic escape of classical strict saddle sets consisting of non-isolated strict critical submanifolds on Riemannian manifolds. We then use a dynamical low-rank approximation to parameterize the manifold Mᵣ and map the spurious critical points to strict critical submanifolds in the classical sense in the parameterized domain, which leads to the desired result. Our result is the first to partially overcome the nonclosedness of the low-rank matrix manifold without altering the vanilla gradient descent algorithm. Numerical experiments are provided to support our theoretical findings.</p>
<p>As for the global convergence guarantee, we point out that earlier approaches to many of the low-rank recovery problems only imply a geometric convergence rate toward a second-order stationary point. This is in contrast to the numerical evidence, which suggests a nearly linear convergence rate starting from a global random initialization. To establish the nearly linear convergence guarantee, we propose a unified framework for a class of low-rank matrix recovery problems including matrix sensing, matrix completion, and phase retrieval. All of them can be considered as random sensing problems of low-rank matrices with a linear measurement operator from some random ensembles. These problems share similar population loss functions that are either least squares or its variant.</p>
<p>We show that under some assumptions, for the population loss function, the Riemannian gradient descent starting from a random initialization with high probability converges to the ground truth in a nearly linear convergence rate, i.e., it takes O(log 1/ϵ + log n) iterations to reach an ϵ-accurate solution. The key to establishing a nearly optimal convergence guarantee is closely intertwined with the analysis of the spurious critical points S_# on Mᵣ. Outside the local neighborhoods of spurious critical points, we use the fundamental convergence tool by the Łojasiewicz inequality to derive a linear convergence rate. In the spurious regions in the neighborhood of spurious critical points, the Riemannian gradient becomes degenerate and the Łojasiewicz inequality could fail. By tracking the dynamics of the trajectory in three stages, we are able to show that with high probability, Riemannian gradient descent escapes the spurious regions in a small number of steps.</p>
<p>After addressing the two problems of global geometry and global convergence guarantee, we use two applications to demonstrate the broad applicability of our analytical tools. The first is the robust principal component analysis problem on the manifold Mᵣ with the Riemannian subgradient method. The second application is the convergence rate analysis of the Sobolev gradient descent method for the nonlinear Gross-Pitaevskii eigenvalue problem on the infinite dimensional sphere manifold. These two examples demonstrate that the analysis of manifold first-order algorithms can be extended beyond the previous framework, to nonsmooth functions and subgradient methods, and to infinite dimensional Hilbert manifolds. This exemplifies that the insights gained and tools developed for the low-rank matrix manifold Mᵣ can be extended to broader scientific and technological fields.</p>https://thesis.library.caltech.edu/id/eprint/15236