CaltechAUTHORS: Article
https://feeds.library.caltech.edu/people/Chandrasekaran-V/article.rss
A Caltech Library Repository Feedhttp://www.rssboard.org/rss-specificationpython-feedgenenFri, 04 Oct 2024 18:55:26 -0700Learning Markov Structure by Maximum Entropy Relaxation
https://resolver.caltech.edu/CaltechAUTHORS:20121009-082035601
Year: 2007
We propose a new approach for learning
a sparse graphical model approximation to
a specified multivariate probability distribution
(such as the empirical distribution
of sample data). The selection of sparse
graph structure arises naturally in our approach
through solution of a convex optimization
problem, which differentiates our
method from standard combinatorial approaches.
We seek the maximum entropy relaxation
(MER) within an exponential family,
which maximizes entropy subject to constraints
that marginal distributions on small
subsets of variables are close to the prescribed
marginals in relative entropy. To solve MER,
we present a modified primal-dual interior
point method that exploits sparsity of the
Fisher information matrix in models defined
on chordal graphs. This leads to a tractable,
scalable approach provided the level of relaxation
in MER is sufficient to obtain a thin
graph. The merits of our approach are investigated
by recovering the structure of some
simple graphical models from sample data.https://resolver.caltech.edu/CaltechAUTHORS:20121009-082035601Estimation in Gaussian Graphical Models Using Tractable Subgraphs: A Walk-Sum Analysis
https://resolver.caltech.edu/CaltechAUTHORS:20121005-083046659
Year: 2008
DOI: 10.1109/TSP.2007.912280
Graphical models provide a powerful formalism for statistical signal processing. Due to their sophisticated modeling capabilities, they have found applications in a variety of fields such as computer vision, image processing, and distributed sensor networks. In this paper, we present a general class of algorithms for estimation in Gaussian graphical models with arbitrary structure. These algorithms involve a sequence of inference problems on tractable subgraphs over subsets of variables. This framework includes parallel iterations such as embedded trees, serial iterations such as block Gauss-Seidel, and hybrid versions of these iterations. We also discuss a method that uses local memory at each node to overcome temporary communication failures that may arise in distributed sensor network applications. We analyze these algorithms based on the recently developed walk-sum interpretation of Gaussian inference. We describe the walks ldquocomputedrdquo by the algorithms using walk-sum diagrams, and show that for iterations based on a very large and flexible set of sequences of subgraphs, convergence is guaranteed in walk-summable models. Consequently, we are free to choose spanning trees and subsets of variables adaptively at each iteration. This leads to efficient methods for optimizing the next iteration step to achieve maximum reduction in error. Simulation results demonstrate that these nonstationary algorithms provide a significant speedup in convergence over traditional one-tree and two-tree iterations.https://resolver.caltech.edu/CaltechAUTHORS:20121005-083046659Multiscale stochastic modeling for tractable inference and data assimilation
https://resolver.caltech.edu/CaltechAUTHORS:20121004-155614638
Year: 2008
DOI: 10.1016/j.cma.2007.12.021
We consider a class of multiscale Gaussian models on pyramidally structured graphs. While such models have been considered in the past, very recent advances in inference methods for graphical models not only yield additional motivation for this class of models but also bring techniques that lead to new and powerful algorithms. We provide a brief summary of these recent advances – including so-called walk-sum analysis, methods based on Lagrangian relaxation, and a new method for "low-rank," wavelet-based, unbiased estimation of error variances – and then adapt and apply them to problems of estimation for pyramidal models. We demonstrate that our models not only capture long-range dependencies but that they also have the property that conditioned on neighboring scales, the correlation behavior within a scale is dramatically compressed. This leads to algorithms resembling multipole methods for solving partial differential equations in which we alternate computations across-scale (using an embedded tree in the pyramidal graph) with local updates within each scale. Not only are these algorithms guaranteed to converge to the correct answers but they also lead to new, adaptive methods for choosing embedded trees and subgraphs to achieve rapid convergence. This approach also leads to a solution to the so-called re-estimation problem in which we seek to update an estimate rapidly after local changes are made to the prior model or to the available data. In addition, by using a consistent probabilistic model across as well as within scales, we are able both to exploit low-rank variance estimation methods and to develop efficient iterative algorithms for parameter estimation.https://resolver.caltech.edu/CaltechAUTHORS:20121004-155614638Representation and Compression of Multidimensional Piecewise Functions Using Surflets
https://resolver.caltech.edu/CaltechAUTHORS:20121004-161000948
Year: 2009
DOI: 10.1109/TIT.2008.2008153
We study the representation, approximation, and compression of functions in M dimensions that consist of constant or smooth regions separated by smooth (M-1)-dimensional discontinuities. Examples include images containing edges, video sequences of moving objects, and seismic data containing geological horizons. For both function classes, we derive the optimal asymptotic approximation and compression rates based on Kolmogorov metric entropy. For piecewise constant functions, we develop a multiresolution predictive coder that achieves the optimal rate-distortion performance; for piecewise smooth functions, our coder has near-optimal rate-distortion performance. Our coder for piecewise constant functions employs surflets, a new multiscale geometric tiling consisting of M-dimensional piecewise constant atoms containing polynomial discontinuities. Our coder for piecewise smooth functions uses surfprints, which wed surflets to wavelets for piecewise smooth approximation. Both of these schemes achieve the optimal asymptotic approximation performance. Key features of our algorithms are that they carefully control the potential growth in surflet parameters at higher smoothness and do not require explicit estimation of the discontinuity. We also extend our results to the corresponding discrete function spaces for sampled data. We provide asymptotic performance results for both discrete function spaces and relate this asymptotic performance to the sampling rate and smoothness orders of the underlying functions and discontinuities. For approximation of discrete data, we propose a new scale-adaptive dictionary that contains few elements at coarse and fine scales, but many elements at medium scales. Simulation results on synthetic signals provide a comparison between surflet-based coders and previously studied approximation schemes based on wedgelets and wavelets.https://resolver.caltech.edu/CaltechAUTHORS:20121004-161000948A global benchmark study using affinity-based biosensors
https://resolver.caltech.edu/CaltechAUTHORS:20090901-094826364
Year: 2009
DOI: 10.1016/j.ab.2008.11.021
PMCID: PMC3793259
To explore the variability in biosensor studies, 150 participants from 20 countries were given the same protein samples and asked to determine kinetic rate constants for the interaction. We chose a protein system that was amenable to analysis using different biosensor platforms as well as by users of different expertise levels. The two proteins (a 50-kDa Fab and a 60-kDa glutathione S-transferase [GST] antigen) form a relatively high-affinity complex, so participants needed to optimize several experimental parameters, including ligand immobilization and regeneration conditions as well as analyte concentrations and injection/dissociation times. Although most participants collected binding responses that could be fit to yield kinetic parameters, the quality of a few data sets could have been improved by optimizing the assay design. Once these outliers were removed, the average reported affinity across the remaining panel of participants was 620 pM with a standard deviation of 980 pM. These results demonstrate that when this biosensor assay was designed and executed appropriately, the reported rate constants were consistent, and independent of which protein was immobilized and which biosensor was used.https://resolver.caltech.edu/CaltechAUTHORS:20090901-094826364Gaussian Multiresolution Models: Exploiting Sparse Markov and Covariance Structure
https://resolver.caltech.edu/CaltechAUTHORS:20121008-094406124
Year: 2010
DOI: 10.1109/TSP.2009.2036042
In this paper, we consider the problem of learning Gaussian multiresolution (MR) models in which data are only available at the finest scale, and the coarser, hidden variables serve to capture long-distance dependencies. Tree-structured MR models have limited modeling capabilities, as variables at one scale are forced to be uncorrelated with each other conditioned on other scales. We propose a new class of Gaussian MR models in which variables at each scale have sparse conditional covariance structure conditioned on other scales. Our goal is to learn a tree-structured graphical model connecting variables across scales (which translates into sparsity in inverse covariance), while at the same time learning sparse structure for the conditional covariance (not its inverse) within each scale conditioned on other scales. This model leads to an efficient, new inference algorithm that is similar to multipole methods in computational physics. We demonstrate the modeling and inference advantages of our approach over methods that use MR tree models and single-scale approximation methods that do not use hidden variables.https://resolver.caltech.edu/CaltechAUTHORS:20121008-094406124Counting Independent Sets Using the Bethe Approximation
https://resolver.caltech.edu/CaltechAUTHORS:20121008-092632684
Year: 2011
DOI: 10.1137/090767145
We consider the #P-complete problem of counting the number of independent sets in a given graph. Our interest is in understanding the effectiveness of the popular belief propagation (BP) heuristic. BP is a simple iterative algorithm that is known to have at least one fixed point, where each fixed point corresponds to a stationary point of the Bethe free energy (introduced by Yedidia, Freeman, and Weiss [IEEE Trans. Inform. Theory, 51 (2004), pp. 2282–2312] in recognition of Bethe's earlier work in 1935). The evaluation of the Bethe free energy at such a stationary point (or BP fixed point) leads to the Bethe approximation for the number of independent sets of the given graph. BP is not known to converge in general, nor is an efficient, convergent procedure for finding stationary points of the Bethe free energy known. Furthermore, the effectiveness of the Bethe approximation is not well understood. As the first result of this paper we propose a BP-like algorithm that always converges to a stationary point of the Bethe free energy for any graph for the independent set problem. This procedure finds an ε-approximate stationary point in O(n^2d^42^dε^(-4)log^3(nε^(-1))) iterations for a graph of n nodes with max-degree d. We study the quality of the resulting Bethe approximation using the recently developed "loop series" framework of Chertkov and Chernyak [J. Stat. Mech. Theory Exp., 6 (2006), P06009]. As this characterization is applicable only for exact stationary points of the Bethe free energy, we provide a slightly modified characterization that holds for ε-approximate stationary points. We establish that for any graph on n nodes with max-degree d and girth larger than 8d log^2 n, the multiplicative error between the number of independent sets and the Bethe approximation decays as 1+O(n^(−γ)) for some γ>0. This provides a deterministic counting algorithm that leads to strictly different results compared to a recent result of Weitz [in Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, ACM Press, New York, 2006, pp. 140–149]. Finally, as a consequence of our analysis we prove that the Bethe approximation is exceedingly good for a random 3-regular graph conditioned on the shortest cycle cover conjecture of Alon and Tarsi [SIAM J. Algebr. Discrete Methods, 6 (1985), pp. 345–350] being true.https://resolver.caltech.edu/CaltechAUTHORS:20121008-092632684Rank-Sparsity Incoherence for Matrix Decomposition
https://resolver.caltech.edu/CaltechAUTHORS:20121008-095909823
Year: 2011
DOI: 10.1137/090761793
Suppose we are given a matrix that is formed by adding an unknown sparse matrix to an unknown low-rank matrix. Our goal is to decompose the given matrix into its sparse and low-rank components. Such a problem arises in a number of applications in model and system identification and is intractable to solve in general. In this paper we consider a convex optimization formulation to splitting the specified matrix into its components by minimizing a linear combination of the ℓ_1 norm and the nuclear norm of the components. We develop a notion of rank-sparsity incoherence, expressed as an uncertainty principle between the sparsity pattern of a matrix and its row and column spaces, and we use it to characterize both fundamental identifiability as well as (deterministic) sufficient conditions for exact recovery. Our analysis is geometric in nature with the tangent spaces to the algebraic varieties of sparse and low-rank matrices playing a prominent role. When the sparse and low-rank matrices are drawn from certain natural random ensembles, we show that the sufficient conditions for exact recovery are satisfied with high probability. We conclude with simulation results on synthetic matrix decomposition problems.https://resolver.caltech.edu/CaltechAUTHORS:20121008-095909823Group Symmetry and Covariance Regularization
https://resolver.caltech.edu/CaltechAUTHORS:20121004-134807001
Year: 2012
DOI: 10.1214/12-EJS723
Statistical models that possess symmetry arise in diverse settings such as random fields associated to geophysical phenomena, exchangeable processes in Bayesian
statistics, and cyclostationary processes in engineering. We formalize the notion of
a symmetric model via group invariance. We propose projection onto a group fixed
point subspace as a fundamental way of regularizing covariance matrices in the high-
dimensional regime. In terms of parameters associated to the group we derive precise
rates of convergence of the regularized covariance matrix and demonstrate that significant statistical gains may be expected in terms of the sample complexity. We further
explore the consequences of symmetry on related model-selection problems such as the
learning of sparse covariance and inverse covariance matrices. We also verify our results
with simulations.https://resolver.caltech.edu/CaltechAUTHORS:20121004-134807001Latent Variable Graphical Model Selection via Convex Optimization
https://resolver.caltech.edu/CaltechAUTHORS:20130207-085454891
Year: 2012
DOI: 10.1214/11-AOS949
Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is "spread out" over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the ℓ_1 norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of hidden components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.https://resolver.caltech.edu/CaltechAUTHORS:20130207-085454891Feedback Message Passing for Inference in Gaussian Graphical Models
https://resolver.caltech.edu/CaltechAUTHORS:20120820-094221711
Year: 2012
DOI: 10.1109/TSP.2012.2195656
While loopy belief propagation (LBP) performs reasonably well for inference in some Gaussian graphical models with cycles, its performance is unsatisfactory for many others. In particular for some models LBP does not converge, and in general when it does converge, the computed variances are incorrect (except for cycle-free graphs for which belief propagation (BP) is non-iterative and exact). In this paper we propose feedback message passing (FMP), a message-passing algorithm that makes use of a special set of vertices (called a feedback vertex set or FVS) whose removal results in a cycle-free graph. In FMP, standard BP is employed several times on the cycle-free subgraph excluding the FVS while a special message-passing scheme is used for the nodes in the FVS. The computational complexity of exact inference is O(k^(2)n), where is the number of feedback nodes, and is the total number of nodes. When the size of the FVS is very large, FMP is computationally costly. Hence we propose approximate FMP, where a pseudo-FVS is used instead of an FVS, and where inference in the non-cycle-free graph obtained by removing the pseudo-FVS is carried out approximately using LBP. We show that, when approximate FMP converges, it yields exact means and variances on the pseudo-FVS and exact means throughout the remainder of the graph. We also provide theoretical results on the convergence and accuracy of approximate FMP. In particular, we prove error bounds on variance computation. Based on these theoretical results, we design efficient algorithms to select a pseudo-FVS of bounded size. The choice of the pseudo-FVS allows us to explicitly trade off between efficiency and accuracy. Experimental results show that using a pseudo-FVS of size no larger than log (n), this procedure converges much more often, more quickly, and provides more accurate results than LBP on the entire graph.https://resolver.caltech.edu/CaltechAUTHORS:20120820-094221711Rejoinder: Latent variable graphical model selection via convex optimization
https://resolver.caltech.edu/CaltechAUTHORS:20130205-141004858
Year: 2012
DOI: 10.1214/12-AOS1020
We thank all the discussants for their careful reading of our paper, and for their insightful critiques. We would also like to thank the editors for organizing this discussion. Our paper contributes to the area of high-dimensional statistics which has received much attention over the past several years across the statistics, machine learning and signal processing communities. In this rejoinder we clarify and comment on some of the points raised in the discussions. Finally, we also remark on some interesting challenges that lie ahead in latent variable modeling.https://resolver.caltech.edu/CaltechAUTHORS:20130205-141004858Convex Graph Invariants
https://resolver.caltech.edu/CaltechAUTHORS:20130110-100237034
Year: 2012
DOI: 10.1137/100816900
The structural properties of graphs are usually characterized in terms of invariants, which
are functions of graphs that do not depend on the labeling of the nodes. In this paper
we study convex graph invariants, which are graph invariants that are convex functions
of the adjacency matrix of a graph. Some examples include functions of a graph such as
the maximum degree, the MAXCUT value (and its semidefinite relaxation), and spectral
invariants such as the sum of the k largest eigenvalues. Such functions can be used to
construct convex sets that impose various structural constraints on graphs and thus provide
a unified framework for solving a number of interesting graph problems via convex
optimization. We give a representation of all convex graph invariants in terms of certain
elementary invariants, and we describe methods to compute or approximate convex graph
invariants tractably. We discuss the interesting subclass of spectral invariants, and also
compare convex and nonconvex invariants. Finally, we use convex graph invariants to provide
efficient convex programming solutions to graph problems such as the deconvolution of
the composition of two graphs into the individual components, hypothesis testing between
graph families, and the generation of graphs with certain desired structural properties.https://resolver.caltech.edu/CaltechAUTHORS:20130110-100237034The Convex Geometry of Linear Inverse Problems
https://resolver.caltech.edu/CaltechAUTHORS:20121004-152325296
Year: 2012
DOI: 10.1007/s10208-012-9135-7
In applications throughout science and engineering one is often faced with the challenge of solving an ill-posed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constrained structurally so that they only have a few degrees of freedom relative to their ambient dimension. This paper provides a general framework to convert notions of simplicity into convex penalty functions, resulting in convex optimization solutions to linear, underdetermined inverse problems. The class of simple models considered includes those formed as the sum of a few atoms from some (possibly infinite) elementary atomic set; examples include well-studied cases from many technical fields such as sparse vectors (signal processing, statistics) and low-rank matrices (control, statistics), as well as several others including sums of a few permutation matrices (ranked elections, multiobject tracking), low-rank tensors (computer vision, neuroscience), orthogonal matrices (machine learning), and atomic measures (system identification). The convex programming formulation is based on minimizing the norm induced by the convex hull of the atomic set; this norm is referred to as the atomic norm. The facial structure of the atomic norm ball carries a number of favorable properties that are useful for recovering simple models, and an analysis of the underlying convex geometry provides sharp estimates of the number of generic measurements required for exact and robust recovery of models from partial information. These estimates are based on computing the Gaussian widths of tangent cones to the atomic norm ball. When the atomic set has algebraic structure the resulting optimization problems can be solved or approximated via semidefinite programming. The quality of these approximations affects the number of measurements required for recovery, and this tradeoff is characterized via some examples. Thus this work extends the catalog of simple models (beyond sparse vectors and low-rank matrices) that can be recovered from limited linear information via tractable convex programming.https://resolver.caltech.edu/CaltechAUTHORS:20121004-152325296Diagonal and Low-Rank Matrix Decompositions, Correlation Matrices, and Ellipsoid Fitting
https://resolver.caltech.edu/CaltechAUTHORS:20121004-133456681
Year: 2012
DOI: 10.1137/120872516
In this paper we establish links between, and new results for, three problems that are not usually considered together. The first is a matrix decomposition problem that arises in areas such as statistical modeling and signal processing: given a matrix X formed as the sum of an unknown diagonal matrix and an unknown low rank positive semidefinite matrix, decompose X into these constituents. The second problem we consider is to determine the facial structure of the set of correlation matrices, a convex set also known as the elliptope. This convex body, and particularly its facial structure, plays a role in applications from combinatorial optimization to mathematical finance. The third problem is a basic geometric question: given points v1, v2, … , vn ∈ R^k (where n > k) determine whether there is a centered ellipsoid passing exactly through all of the points.
We show that in a precise sense these three problems are equivalent. Furthermore we establish a simple sufficient condition on a subspace U that ensures any positive semidefinite matrix L with column space U can be recovered from D+L for any diagonal matrix D using a convex optimization-based heuristic known as minimum trace factor analysis. This result leads to a new understanding of the structure of rank-deficient correlation matrices and a simple condition on a set of points that ensures there is a centered ellipsoid passing through them.https://resolver.caltech.edu/CaltechAUTHORS:20121004-133456681Computational and statistical tradeoffs via convex relaxation
https://resolver.caltech.edu/CaltechAUTHORS:20130603-131602451
Year: 2013
DOI: 10.1073/pnas.1302293110
PMCID: PMC3612621
Modern massive datasets create a fundamental problem at the intersection of the computational and statistical sciences: how to provide guarantees on the quality of statistical inference given bounds on computational resources, such as time or space. Our approach to this problem is to define a notion of "algorithmic weakening," in which a hierarchy of algorithms is ordered by both computational efficiency and statistical efficiency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing. We illustrate this approach in the setting of denoising problems, using convex relaxation as the core inferential tool. Hierarchies of convex relaxations have been widely used in theoretical computer science to yield tractable approximation algorithms to many computationally intractable tasks. In the current paper, we show how to endow such hierarchies with a statistical characterization and thereby obtain concrete tradeoffs relating algorithmic runtime to amount of data.https://resolver.caltech.edu/CaltechAUTHORS:20130603-131602451Resource Allocation for Statistical Estimation
https://resolver.caltech.edu/CaltechAUTHORS:20160121-082428259
Year: 2016
DOI: 10.1109/JPROC.2015.2494098
Statistical estimation in many contemporary settings involves the acquisition, analysis, and aggregation of data sets from multiple sources, which can have significant differences in character and in value. Due to these variations, the effectiveness of employing a given resource, e.g., a sensing device or computing power, for gathering or processing data from a particular source depends on the nature of that source. As a result, the appropriate division and assignment of a collection of resources to a set of data sources can substantially impact the overall performance of an inferential strategy. In this expository article, we adopt a general view of the notion of a resource and its effect on the quality of a data source, and we describe a framework for the allocation of a given set of resources to a collection of sources in order to optimize a specified metric of statistical efficiency. We discuss several stylized examples involving inferential tasks such as parameter estimation and hypothesis testing based on heterogeneous data sources, in which optimal allocations can be computed either in closed form or via efficient numerical procedures based on convex optimization. This work is an inferential analog of the literature in information theory on allocating power across communications channels of variable quality in order to optimize for total throughput.https://resolver.caltech.edu/CaltechAUTHORS:20160121-082428259Relative Entropy Relaxations for Signomial Optimization
https://resolver.caltech.edu/CaltechAUTHORS:20161017-134459797
Year: 2016
DOI: 10.1137/140988978
Signomial programs (SPs) are optimization problems specified in terms of signomials,
which are weighted sums of exponentials composed with linear functionals of a decision variable. SPs
are nonconvex optimization problems in general, and families of NP-hard problems can be reduced
to SPs. In this paper we describe a hierarchy of convex relaxations to obtain successively tighter
lower bounds of the optimal value of SPs. This sequence of lower bounds is computed by solving
increasingly larger-sized relative entropy optimization problems, which are convex programs specified
in terms of linear and relative entropy functions. Our approach relies crucially on the observation
that the relative entropy function, by virtue of its joint convexity with respect to both arguments,
provides a convex parametrization of certain sets of globally nonnegative signomials with efficiently
computable nonnegativity certificates via the arithmetic-geometric-mean inequality. By appealing to
representation theorems from real algebraic geometry, we show that our sequences of lower bounds
converge to the global optima for broad classes of SPs. Finally, we also demonstrate the effectiveness
of our methods via numerical experiments.https://resolver.caltech.edu/CaltechAUTHORS:20161017-134459797Primate TRIM5 proteins form hexagonal nets on HIV-1 capsids
https://resolver.caltech.edu/CaltechAUTHORS:20160613-141038489
Year: 2016
DOI: 10.7554/eLife.16269
PMCID: PMC4936896
TRIM5 proteins are restriction factors that block retroviral infections by binding viral capsids and preventing reverse transcription. Capsid recognition is mediated by C-terminal domains on TRIM5α (SPRY) or TRIMCyp (cyclophilin A), which interact weakly with capsids. Efficient capsid recognition also requires the conserved N-terminal tripartite motifs (TRIM), which mediate oligomerization and create avidity effects. To characterize how TRIM5 proteins recognize viral capsids, we developed methods for isolating native recombinant TRIM5 proteins and purifying stable HIV-1 capsids. Biochemical and EM analyses revealed that TRIM5 proteins assembled into hexagonal nets, both alone and on capsid surfaces. These nets comprised open hexameric rings, with the SPRY domains centered on the edges and the B-box and RING domains at the vertices. Thus, the principles of hexagonal TRIM5 assembly and capsid pattern recognition are conserved across primates, allowing TRIM5 assemblies to maintain the conformational plasticity necessary to recognize divergent and pleomorphic retroviral capsids.https://resolver.caltech.edu/CaltechAUTHORS:20160613-141038489Regularization for Design
https://resolver.caltech.edu/CaltechAUTHORS:20161208-120540508
Year: 2016
DOI: 10.1109/TAC.2016.2517570
When designing controllers for large-scale systems, the architectural aspects of the controller such as the placement of actuators, sensors, and the communication links between them can no longer be taken as given. The task of designing this architecture is now as important as the design of the control laws themselves. By interpreting controller synthesis (in a model matching setup) as the solution of a particular linear inverse problem, we view the challenge of obtaining a controller with a desired architecture as one of finding a structured solution to an inverse problem. Building on this conceptual connection, we formulate and analyze a framework called Regularization for Design (RFD), in which we augment the variational formulations of controller synthesis problems with convex penalty functions that induce a desired controller architecture. The resulting regularized formulations are convex optimization problems that can be solved efficiently; these convex programs provide a unified computationally tractable approach for the simultaneous co-design of a structured optimal controller and the actuation, sensing and communication architecture required to implement it. Further, these problems are natural control-theoretic analogs of prominent approaches such as the Lasso, the Group Lasso, the Elastic Net, and others that are employed in structured inference. In analogy to that literature, we show that our approach identifies optimally structured controllers under a suitable condition on a "signal-to-noise" type ratio.https://resolver.caltech.edu/CaltechAUTHORS:20161208-120540508Relative entropy optimization and its applications
https://resolver.caltech.edu/CaltechAUTHORS:20170216-104506601
Year: 2017
DOI: 10.1007/s10107-016-0998-2
In this expository article, we study optimization problems specified via linear and relative entropy inequalities. Such relative entropy programs (REPs) are convex optimization problems as the relative entropy function is jointly convex with respect to both its arguments. Prominent families of convex programs such as geometric programs (GPs), second-order cone programs, and entropy maximization problems are special cases of REPs, although REPs are more general than these classes of problems. We provide solutions based on REPs to a range of problems such as permanent maximization, robust optimization formulations of GPs, and hitting-time estimation in dynamical systems. We survey previous approaches to some of these problems and the limitations of those methods, and we highlight the more powerful generalizations afforded by REPs. We conclude with a discussion of quantum analogs of the relative entropy function, including a review of the similarities and distinctions with respect to the classical case. We also describe a stylized application of quantum relative entropy optimization that exploits the joint convexity of the quantum relative entropy function.https://resolver.caltech.edu/CaltechAUTHORS:20170216-104506601High-dimensional change-point estimation: Combining filtering with convex optimization
https://resolver.caltech.edu/CaltechAUTHORS:20170525-100137066
Year: 2017
DOI: 10.1016/j.acha.2015.11.003
We consider change-point estimation in a sequence of high-dimensional signals given noisy observations. Classical approaches to this problem such as the filtered derivative method are useful for sequences of scalar-valued signals, but they have undesirable scaling behavior in the high-dimensional setting. However, many high-dimensional signals encountered in practice frequently possess latent low-dimensional structure. Motivated by this observation, we propose a technique for high-dimensional change-point estimation that combines the filtered derivative approach from previous work with convex optimization methods based on atomic norm regularization, which are useful for exploiting structure in high-dimensional data. Our algorithm is applicable in online settings as it operates on small portions of the sequence of observations at a time, and it is well-suited to the high-dimensional setting both in terms of computational scalability and of statistical efficiency. The main result of this paper shows that our method performs change-point estimation reliably as long as the product of the smallest-sized change (the Euclidean-norm-squared of the difference between signals at a change-point) and the smallest distance between change-points (number of time instances) is larger than a Gaussian width parameter that characterizes the low-dimensional complexity of the underlying signal sequence.https://resolver.caltech.edu/CaltechAUTHORS:20170525-100137066A Statistical Graphical Model of the California Reservoir System
https://resolver.caltech.edu/CaltechAUTHORS:20171020-154910322
Year: 2017
DOI: 10.1002/2017WR020412
The recent California drought has highlighted the potential vulnerability of the state's water management infrastructure to multiyear dry intervals. Due to the high complexity of the network, dynamic storage changes in California reservoirs on a state-wide scale have previously been difficult to model using either traditional statistical or physical approaches. Indeed, although there is a significant line of research on exploring models for single (or a small number of) reservoirs, these approaches are not amenable to a system-wide modeling of the California reservoir network due to the spatial and hydrological heterogeneities of the system. In this work, we develop a state-wide statistical graphical model to characterize the dependencies among a collection of 55 major California reservoirs across the state; this model is defined with respect to a graph in which the nodes index reservoirs and the edges specify the relationships or dependencies between reservoirs. We obtain and validate this model in a data-driven manner based on reservoir volumes over the period 2003–2016. A key feature of our framework is a quantification of the effects of external phenomena that influence the entire reservoir network. We further characterize the degree to which physical factors (e.g., state-wide Palmer Drought Severity Index (PDSI), average temperature, snow pack) and economic factors (e.g., consumer price index, number of agricultural workers) explain these external influences. As a consequence of this analysis, we obtain a system-wide health diagnosis of the reservoir network as a function of PDSI.https://resolver.caltech.edu/CaltechAUTHORS:20171020-154910322Interpreting Latent Variables in Factor Models via Convex Optimization
https://resolver.caltech.edu/CaltechAUTHORS:20170614-105229927
Year: 2018
DOI: 10.1007/s10107-017-1187-7
Latent or unobserved phenomena pose a significant difficulty in data analysis as they induce complicated and confounding dependencies among a collection of observed variables. Factor analysis is a prominent multivariate statistical modeling approach that addresses this challenge by identifying the effects of (a small number of) latent variables on a set of observed variables. However, the latent variables in a factor model are purely mathematical objects that are derived from the observed phenomena, and they do not have any interpretation associated to them. A natural approach for attributing semantic information to the latent variables in a factor model is to obtain measurements of some additional plausibly useful covariates that may be related to the original set of observed variables, and to associate these auxiliary covariates to the latent variables. In this paper, we describe a systematic approach for identifying such associations. Our method is based on solving computationally tractable convex optimization problems, and it can be viewed as a generalization of the minimum-trace factor analysis procedure for fitting factor models via convex optimization. We analyze the theoretical consistency of our approach in a high-dimensional setting as well as its utility in practice via experimental demonstrations with real data.https://resolver.caltech.edu/CaltechAUTHORS:20170614-105229927Finding Planted Subgraphs with Few Eigenvalues using the
Schur-Horn Relaxation
https://resolver.caltech.edu/CaltechAUTHORS:20170614-124916669
Year: 2018
DOI: 10.1137/16M1075144
Extracting structured subgraphs inside large graphs---often known as the planted subgraph problem---is a fundamental question that arises in a range of application domains. This problem is NP-hard in general and, as a result, significant efforts have been directed towards the development of tractable procedures that succeed on specific families of problem instances. We propose a new computationally efficient convex relaxation for solving the planted subgraph problem; our approach is based on tractable semidefinite descriptions of majorization inequalities on the spectrum of a symmetric matrix. This procedure is effective at finding planted subgraphs that consist of few distinct eigenvalues, and it generalizes previous convex relaxation techniques for finding planted cliques. Our analysis relies prominently on the notion of spectrally comonotone matrices, which are pairs of symmetric matrices that can be transformed to diagonal matrices with sorted diagonal entries upon conjugation by the same orthogonal matrix.https://resolver.caltech.edu/CaltechAUTHORS:20170614-124916669Learning Semidefinite-Representable Regularizers
https://resolver.caltech.edu/CaltechAUTHORS:20170614-100357188
Year: 2019
DOI: 10.1007/s10208-018-9386-z
Regularization techniques are widely employed in optimization-based approaches for solving ill-posed inverse problems in data analysis and scientific computing. These methods are based on augmenting the objective with a penalty function, which is specified based on prior domain-specific expertise to induce a desired structure in the solution. We consider the problem of learning suitable regularization functions from data in settings in which precise domain knowledge is not directly available. Previous work under the title of 'dictionary learning' or 'sparse coding' may be viewed as learning a regularization function that can be computed via linear programming. We describe generalizations of these methods to learn regularizers that can be computed and optimized via semidefinite programming. Our framework for learning such semidefinite regularizers is based on obtaining structured factorizations of data matrices, and our algorithmic approach for computing these factorizations combines recent techniques for rank minimization problems along with an operator analog of Sinkhorn scaling. Under suitable conditions on the input data, our algorithm provides a locally linearly convergent method for identifying the correct regularizer that promotes the type of structure contained in the data. Our analysis is based on the stability properties of Operator Sinkhorn scaling and their relation to geometric aspects of determinantal varieties (in particular tangent spaces with respect to these varieties). The regularizers obtained using our framework can be employed effectively in semidefinite programming relaxations for solving inverse problems.https://resolver.caltech.edu/CaltechAUTHORS:20170614-100357188False Discovery and Its Control in Low Rank Estimation
https://resolver.caltech.edu/CaltechAUTHORS:20190626-161131951
Year: 2020
DOI: 10.1111/rssb.12387
Models specified by low rank matrices are ubiquitous in contemporary applications. In many of these problem domains, the row–column space structure of a low rank matrix carries information about some underlying phenomenon, and it is of interest in inferential settings to evaluate the extent to which the row–column spaces of an estimated low rank matrix signify discoveries about the phenomenon. However, in contrast with variable selection, we lack a formal framework to assess true or false discoveries in low rank estimation; in particular, the key source of difficulty is that the standard notion of a discovery is a discrete notion that is ill suited to the smooth structure underlying low rank matrices. We address this challenge via a geometric reformulation of the concept of a discovery, which then enables a natural definition in the low rank case. We describe and analyse a generalization of the stability selection method of Meinshausen and Bühlmann to control for false discoveries in low rank estimation, and we demonstrate its utility compared with previous approaches via numerical experiments.https://resolver.caltech.edu/CaltechAUTHORS:20190626-161131951Terracini convexity
https://resolver.caltech.edu/CaltechAUTHORS:20201016-144006753
Year: 2020
DOI: 10.1007/s10107-022-01774-y
We present a generalization of the notion of neighborliness to non-polyhedral convex cones. Although a definition of neighborliness is available in the non-polyhedral case in the literature, it is fairly restrictive as it requires all the low-dimensional faces to be polyhedral. Our approach is more flexible and includes, for example, the cone of positive-semidefinite matrices as a special case (this cone is not neighborly in general). We term our generalization Terracini convexity due to its conceptual similarity with the conclusion of Terracini's lemma from algebraic geometry. Polyhedral cones are Terracini convex if and only if they are neighborly. More broadly, we derive many families of non-polyhedral Terracini convex cones based on neighborly cones, linear images of cones of positive-semidefinite matrices, and derivative relaxations of Terracini convex hyperbolicity cones. As a demonstration of the utility of our framework in the non-polyhedral case, we give a characterization based on Terracini convexity of the tightness of semidefinite relaxations for certain inverse problems.https://resolver.caltech.edu/CaltechAUTHORS:20201016-144006753Convex graph invariant relaxations for graph edit distance
https://resolver.caltech.edu/CaltechAUTHORS:20190626-090040899
Year: 2022
DOI: 10.1007/s10107-020-01564-4
The edit distance between two graphs is a widely used measure of similarity that evaluates the smallest number of vertex and edge deletions/insertions required to transform one graph to another. It is NP-hard to compute in general, and a large number of heuristics have been proposed for approximating this quantity. With few exceptions, these methods generally provide upper bounds on the edit distance between two graphs. In this paper, we propose a new family of computationally tractable convex relaxations for obtaining lower bounds on graph edit distance. These relaxations can be tailored to the structural properties of the particular graphs via convex graph invariants. Specific examples that we highlight in this paper include constraints on the graph spectrum as well as (tractable approximations of) the stability number and the maximum-cut values of graphs. We prove under suitable conditions that our relaxations are tight (i.e., exactly compute the graph edit distance) when one of the graphs consists of few eigenvalues. We also validate the utility of our framework on synthetic problems as well as real applications involving molecular structure comparison problems in chemistry.https://resolver.caltech.edu/CaltechAUTHORS:20190626-090040899Model selection over partially ordered sets
https://authors.library.caltech.edu/records/bq29r-5a373
Year: 2024
DOI: 10.1073/pnas.2314228121
PMCID: PMC10895251
In problems such as variable selection and graph estimation, models are characterized by Boolean logical structure such as the presence or absence of a variable or an edge. Consequently, false-positive error or false-negative error can be specified as the number of variables/edges that are incorrectly included or excluded in an estimated model. However, there are several other problems such as ranking, clustering, and causal inference in which the associated model classes do not admit transparent notions of false-positive and false-negative errors due to the lack of an underlying Boolean logical structure. In this paper, we present a generic approach to endow a collection of models with partial order structure, which leads to a hierarchical organization of model classes as well as natural analogs of false-positive and false-negative errors. We describe model selection procedures that provide false-positive error control in our general setting, and we illustrate their utility with numerical experiments.https://authors.library.caltech.edu/records/bq29r-5a373