CaltechAUTHORS: Monograph

Metric based up-scaling

Year: 2005 DOI: 10.48550/arXiv.0505223 We consider divergence form elliptic operators in dimension n ≥ 2 with L∞ coefficients. Although solutions of these operators are only Hölder continuous, we show that they are differentiable (C1,α) with respect to harmonic coordinates. It follows that numerical homogenization can be extended to situations where the medium has no ergodicity at small scales and is characterized by a continuum of scales by transferring a new metric in addition to traditional averaged (homogenized) quantities from subgrid scales into computational scales and error bounds can be given. This numerical homogenization method can also be used as a compression tool for differential operators.

Stochastic Variational Partitioned Runge-Kutta Integrators for Constrained Systems

Year: 2007 DOI: 10.48550/arXiv.0709.2222 Stochastic variational integrators for constrained, stochastic mechanical systems are developed in this paper. The main results of the paper are twofold: an equivalence is established between a stochastic Hamilton-Pontryagin (HP) principle in generalized coordinates and constrained coordinates via Lagrange multipliers, and variational partitioned Runge-Kutta (VPRK) integrators are extended to this class of systems. Among these integrators are first and second-order strongly convergent RATTLE-type integrators. We prove strong order of accuracy of the methods provided. The paper also reviews the deterministic treatment of VPRK integrators from the HP viewpoint.

Ballistic Transport at Uniform Temperature

Year: 2007 DOI: 10.48550/arXiv.0710.1565 A paradigm for isothermal, mechanical rectification of stochastic uctuations is introduced in this paper. The central idea is to transform energy injected by random perturbations into rigid-body rotational kinetic energy. The prototype considered in this paper is a mechanical system consisting of a set of rigid bodies in interaction through magnetic fields. The system is stochastically forced by white noise and dissipative through mechanical friction. The Gibbs-Boltzmann distribution at a specific temperature defines the unique invariant measure under the flow of this stochastic process and allows us to define "the temperature" of the system. This measure is also ergodic and strongly mixing. Although the system does not exhibit global directed motion, it is shown that global ballistic motion is possible (the mean-squared displacement grows like t^2). More precisely, although work cannot be extracted from thermal energy by the second law of thermodynamics, it is shown that ballistic transport from thermal energy is possible. In particular, the dynamics is characterized by a meta-stable state in which the system exhibits directed motion over random time scales. This phenomenon is caused by interaction of three attributes of the system: a non at (yet bounded) potential energy landscape, a rigid body effect (coupling translational momentum and angular momentum through friction) and the degeneracy of the noise/friction tensor on the momentums (the fact that noise is not applied to all degrees of freedom).

Flux Norm Approach to Homogenization Problems with non-separated Scales

Year: 2009 DOI: 10.7907/T5DC-SN48 We consider linear divergence-form scalar elliptic equations and vectorial equations for elasticity with rough (L^∞(Ω), Ω ⊂ ℝ^d ) coefficients a(x) that, in particular, model media with non-separated scales and high contrast in material properties. While the homogenization of PDEs with periodic or ergodic coefficients and well separated scales is now well understood, we consider here the most general case of arbitrary bounded coefficients. For such problems we introduce explicit finite dimensional approximations of solutions with controlled error estimates, which we refer to as homogenization approximations. In particular, this approach allows one to analyze a given medium directly without introducing the mathematical concept of an ∈ family of media as in classical periodic homogenization. We define the flux norm as the L^2 norm of the potential part of the fluxes of solutions, which is equivalent to the usual H^1-norm. We show that in the flux norm, the error associated with approximating, in a properly defined finite-dimensional space, the set of solutions of the aforementioned PDEs with rough coefficients is equal to the error associated with approximating the set of solutions of the same type of PDEs with smooth coefficients in a standard space (e.g., piecewise polynomial). We refer to this property as the transfer property. A simple application of this property is the construction of finite dimensional approximation spaces with errors independent of the regularity and contrast of the coefficients and with optimal and explicit convergence rates. This transfer property also provides an alternative to the global harmonic change of coordinates for the homogenization of elliptic operators that can be extended to elasticity equations. The proofs of these homogenization results are based on a new class of elliptic inequalities which play the same role in our approach as the div-curl lemma in classical homogenization.

Discrete Geometric Structures in Homogenization and Inverse Homogenization with Application to EIT

Year: 2009 DOI: 10.7907/XR8W-EA85 We introduce a new geometric approach for the homogenization and inverse homogenization of the divergence form elliptic operator with rough conductivity coefficients σ(x) in dimension two. We show that conductivity coefficients are in one-to-one correspondence with divergence-free matrices and convex functions s(x) over the domain Ω. Although homogenization is a non-linear and non-injective operator when applied directly to conductivity coefficients, homogenization becomes a linear interpolation operator over triangulations of Ω when re-expressed using convex functions, and is a volume averaging operator when re-expressed with divergence-free matrices. We explicitly give the transformations which map conductivity coefficients into divergence-free matrices and convex functions, as well as their respective inverses. Using optimal weighted Delaunay triangulations for linearly interpolating convex functions, we apply this geometric framework to obtain an optimally robust homogenization algorithm for arbitrary rough coefficients, extending the global optimality of Delaunay triangulations with respect to a discrete Dirichlet energy to weighted Delaunay triangulations. Next, we consider inverse homogenization, that is, the recovery of the microstructure from macroscopic information, a problem which is known to be both non-linear and severly ill-posed. We show how to decompose this reconstruction into a linear ill-posed problem and a well-posed non-linear problem. We apply this new geometric approach to Electrical Impedance Tomography (EIT) in dimension two. It is known that the EIT problem admits at most one isotropic solution. If an isotropic solution exists, we show how to compute it from any conductivity having the same boundary Dirichlet-to-Neumann map. This is of practical importance since the EIT problem always admits a unique solution in the space of divergence-free matrices and is stable with respect to G-convergence in that space (this property fails for isotropic matrices). As such, we suggest that the space of convex functions is the natural space to use to parameterize solutions of the EIT problem.

Non-intrusive and structure preserving multiscale integration of stiff ODEs, SDEs and Hamiltonian systems with hidden slow dynamics via flow averaging

Year: 2009 DOI: 10.7907/QZNP-SR14 We introduce a new class of integrators for stiff ODEs as well as SDEs. An example of subclass of systems that we treat are ODEs and SDEs that are sums of two terms one of which has large coefficients. These integrators are (i) Multiscale: they are based on ow averaging and so do not resolve the fast variables but rather employ step-sizes determined by slow variables (ii) Basis: the method is based on averaging the ow of the given dynamical system (which may have hidden slow and fast processes) instead of averaging the instantaneous drift of assumed separated slow and fast processes. This bypasses the need for identifying explicitly (or numerically) the slow or fast variables. (iii) Non intrusive: A pre-existing numerical scheme resolving the microscopic time scale can be used as a black box and turned into one of the integrators in this paper by simply turning the large coefficients on over a microscopic timescale and off during a mesoscopic timescale. (iv) Convergent over two scales: strongly over slow processes and in the sense of measures over fast ones. We introduce the related notion of two scale ow convergence and analyze the convergence of these integrators under the induced topology. (v) Structure preserving: For stiff Hamiltonian systems (possibly on manifolds), they are symplectic, time-reversible, and symmetric (under the group action leaving the Hamiltonian invariant) in all variables. They are explicit and apply to arbitrary stiff potentials (that need not be quadratic). Their application to the Fermi-Pasta-Ulam problems shows accuracy and stability over 4 orders of magnitude of time scales. For stiff Langevin equations, they are symmetric (under a group action), time-reversible and Boltzmann-Gibbs reversible, quasi-symplectic on all variables and conformally symplectic with isotropic friction.

Structure preserving Stochastic Impulse Methods for stiff Langevin systems with a uniform global error of order 1 or 1/2 on position

Year: 2010 DOI: 10.48550/arXiv.1006.4657 Impulse methods are generalized to a family of integrators for Langevin systems with quadratic stiff potentials and arbitrary soft potentials. Uniform error bounds (independent from stiff parameters) are obtained on integrated positions allowing for coarse integration steps. The resulting integrators are explicit and structure preserving (quasi-symplectic for Langevin systems).

Temperature and Friction Accelerated Sampling of Boltzmann-Gibbs Distribution

Year: 2010 DOI: 10.48550/arXiv.1007.0995 This paper is concerned with tuning friction and temperature in Langevin dynamics for fast sampling from the canonical ensemble. We show that near-optimal acceleration is achieved by choosing friction so that the local quadratic approximation of the Hamiltonian is a critical damped oscillator. The system is also over-heated and cooled down to its final temperature. The performances of different cooling schedules are analyzed as functions of total simulation time.

Optimal Uncertainty Quantification

Year: 2010 DOI: 10.7907/TTW6-QD19 We propose a rigorous framework for Uncertainty Quantification (UQ) in which the UQ objectives and the assumptions/information set are brought to the forefront. This framework, which we call Optimal Uncertainty Quantification (OUQ), is based on the observation that, given a set of assumptions and information about the problem, there exist optimal bounds on uncertainties: these are obtained as extreme values of well-defined optimization problems corresponding to extremizing probabilities of failure, or of deviations, subject to the constraints imposed by the scenarios compatible with the assumptions and information. In particular, this framework does not implicitly impose inappropriate assumptions, nor does it repudiate relevant information. Although OUQ optimization problems are extremely large, we show that under general conditions, they have finite-dimensional reductions. As an application, we develop Optimal Concentration Inequalities (OCI) of Hoeffding and McDiarmid type. Surprisingly, contrary to the classical sensitivity analysis paradigm, these results show that uncertainties in input parameters do not necessarily propagate to output uncertainties. In addition, a general algorithmic framework is developed for OUQ and is tested on the Caltech surrogate model for hypervelocity impact, suggesting the feasibility of the framework for important complex systems.

Equivalence of concentration inequalities for linear and non-linear functions

Year: 2010 DOI: 10.48550/arXiv.1009.4913 We consider a random variable X that takes values in a (possibly infinite-dimensional) topological vector space X. We show that, with respect to an appropriate "normal distance" on X, concentration inequalities for linear and non-linear functions of X are equivalent. This normal distance corresponds naturally to the concentration rate in classical concentration results such as Gaussian concentration and concentration on the Euclidean and Hamming cubes. Under suitable assumptions on the roundness of the sets of interest, the concentration inequalities so obtained are asymptotically optimal in the high-dimensional limit.

Localized bases for finite dimensional homogenization approximations with non-separated scales and high-contrast

Year: 2010 We construct finite-dimensional approximations of solution spaces of divergence form operators with L^∞-coefficients. Our method does not rely on concepts of ergodicity or scale-separation, but on the property that the solution of space of these operators is compactly embedded in H^1 if source terms are in the unit ball of L^2 instead of the unit ball of H^−1. Approximation spaces are generated by solving elliptic PDEs on localized sub-domains with source terms corresponding to approximation bases for H^2. The H^1-error estimates show that O(h^−d)-dimensional spaces with basis elements localized to sub-domains of diameter O(h^∞ ln 1/h) (with α ∈ [1/2 , 1)) result in an O(h^(2−2α) accuracy for elliptic, parabolic and hyperbolic problems. For high-contrast media, the accuracy of the method is preserved provided that localized sub-domains contain buffer zones of width O(h^α ln 1/h ) where the contrast of the medium remains bounded. The proposed method can naturally be generalized to vectorial equations (such as elasto-dynamics).

The optimal uncertainty algorithm in the mystic framework

Year: 2012 DOI: 10.48550/arXiv.1202.1055 We have recently proposed a rigorous framework for Uncertainty Quantification (UQ) in which UQ objectives and assumption/information set are brought into the forefront, providing a framework for the communication and comparison of UQ results. In particular, this framework does not implicitly impose inappropriate assumptions nor does it repudiate relevant information. This framework, which we call Optimal Uncertainty Quantification (OUQ), is based on the observation that given a set of assumptions and information, there exist bounds on uncertainties obtained as values of optimization problems and that these bounds are optimal. It provides a uniform environment for the optimal solution of the problems of validation, certification, experimental design, reduced order modeling, prediction, extrapolation, all under aleatoric and epistemic uncertainties. OUQ optimization problems are extremely large, and even though under general conditions they have finite-dimensional reductions, they must often be solved numerically. This general algorithmic framework for OUQ has been implemented in the mystic optimization framework. We describe this implementation, and demonstrate its use in the context of the Caltech surrogate model for hypervelocity impact.

Ergodicity of Langevin Processes with Degenerate Diffusion in Momentums

Year: 2013 DOI: 10.48550/arXiv.0710.4259 This paper introduces a geometric method for proving ergodicity of degenerate noise driven stochastic processes. The driving noise is assumed to be an arbitrary Levy process with non-degenerate diffusion component (but that may be applied to a single degree of freedom of the system). The geometric conditions are the approximate controllability of the process the fact that there exists a point in the phase space where the interior of the image of a point via a secondarily randomized version of the driving noise is non void. The paper applies the method to prove ergodicity of a sliding disk governed by Langevin-type equations (a simple stochastic rigid body system). The paper shows that a key feature of this Langevin process is that even though the diffusion and drift matrices associated to the momentums are degenerate, the system is still at uniform temperature.

Conditioning Gaussian measure on Hilbert space

Year: 2015 DOI: 10.48550/arXiv.1506.04208 For a Gaussian measure on a separable Hilbert space with covariance operator C, we show that the family of conditional measures associated with conditioning on a closed subspace S^⊥ are Gaussian with covariance operator the short S(C) of the operator C to S. We provide two proofs. The first uses the theory of Gaussian Hilbert spaces and a characterization of the shorted operator by Andersen and Trapp. The second uses recent developments by Corach, Maestripieri and Stojanoff on the relationship between the shorted operator and C-symmetric oblique projections onto S^⊥. To obtain the assertion when such projections do not exist, we develop an approximation result for the shorted operator by showing, for any positive operator A, how to construct a sequence of approximating operators A^n which possess A^n- symmetric oblique projections onto S^⊥ such that the sequence of shorted operators S(A^n) converges to S(A) in the weak operator topology. This result combined with the martingale convergence of random variables associated with the corresponding approximations C^n establishes the main assertion in general. Moreover, it in turn strengthens the approximation theorem for shorted operator when the operator is trace class; then the sequence of shorted operators S(A^n) converges to S(A) in trace norm.

Brittleness of Bayesian inference and new Selberg formulas

Year: 2015 DOI: 10.48550/arXiv.1304.7046 The incorporation of priors in the Optimal Uncertainty Quantification (OUQ) framework reveals brittleness in Bayesian inference; a model may share an arbitrarily large number of finite-dimensional marginals with, or be arbitrarily close (in Prokhorov or total variation metrics) to, the data-generating distribution and still make the largest possible prediction error after conditioning on an arbitrarily large number of samples. The initial purpose of this paper is to unwrap this brittleness mechanism by providing (i) a quantitative version of the Brittleness Theorem of and (ii) a detailed and comprehensive analysis of its application to the revealing example of estimating the mean of a random variable on the unit interval [0, 1] using priors that exactly capture the distribution of an arbitrarily large number of Hausdorff moments. However, in doing so, we discovered that the free parameter associated with Markov and Kreĩn's canonical representations of truncated Hausdorff moments generates reproducing kernel identities corresponding to reproducing kernel Hilbert spaces of polynomials. Furthermore, these reproducing identities lead to biorthogonal systems of Selberg integral formulas. This process of discovery appears to be generic: whereas Karlin and Shapley used Selberg's integral formula to first compute the volume of the Hausdorff moment space (the polytope defined by the first n moments of a probability measure on the interval [0, 1]), we observe that the computation of that volume along with higher order moments of the uniform measure on the moment space, using different finite-dimensional representations of subsets of the infinite-dimensional set of probability measures on [0, 1] representing the first n moments, leads to families of equalities corresponding to classical and new Selberg identities.

On testing the simulation theory

Year: 2017 DOI: 10.48550/arXiv.1703.00058 Can the theory that reality is a simulation be tested? We investigate this question based on the assumption that if the system performing the simulation is finite (i.e. has limited resources), then to achieve low computational complexity, such a system would, as in a video game, render content (reality) only at the moment that information becomes available for observation by a player and not at the moment of detection by a machine (that would be part of the simulation and whose detection would also be part of the internal computation performed by the Virtual Reality server before rendering content to the player). Guided by this principle we describe conceptual wave/particle duality experiments aimed at testing the simulation theory.

Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis

Year: 2017 DOI: 10.48550/arXiv.1703.10761 We show how the discovery of robust scalable numerical solvers for arbitrary bounded linear operators can be automated as a Game Theory problem by reformulating the process of computing with partial information and limited resources as that of playing underlying hierarchies of adversarial information games. When the solution space is a Banach space B endowed with a quadratic norm ∥⋅∥, the optimal measure (mixed strategy) for such games (e.g. the adversarial recovery of u ∈ B, given partial measurements [ϕ_i,u] with ϕ_i ∈ B^∗, using relative error in ∥⋅∥-norm as a loss) is a centered Gaussian field ξ solely determined by the norm ∥⋅∥, whose conditioning (on measurements) produces optimal bets. When measurements are hierarchical, the process of conditioning this Gaussian field produces a hierarchy of elementary bets (gamblets). These gamblets generalize the notion of Wavelets and Wannier functions in the sense that they are adapted to the norm ∥⋅∥ and induce a multi-resolution decomposition of B that is adapted to the eigensubspaces of the operator defining the norm ∥⋅∥. When the operator is localized, we show that the resulting gamblets are localized both in space and frequency and introduce the Fast Gamblet Transform (FGT) with rigorous accuracy and (near-linear) complexity estimates. As the FFT can be used to solve and diagonalize arbitrary PDEs with constant coefficients, the FGT can be used to decompose a wide range of continuous linear operators (including arbitrary continuous linear bijections from H^s_0 to H^(−s) or to L^2) into a sequence of independent linear systems with uniformly bounded condition numbers and leads to O(NpolylogN) solvers and eigenspace adapted Multiresolution Analysis (resulting in near linear complexity approximation of all eigensubspaces).

Kernel Mode Decomposition and programmable/interpretable regression networks

Year: 2019 DOI: 10.48550/arXiv.1907.08592 Mode decomposition is a prototypical pattern recognition problem that can be addressed from the (a priori distinct) perspectives of numerical approximation, statistical inference and deep learning. Could its analysis through these combined perspectives be used as a Rosetta stone for deciphering mechanisms at play in deep learning? Motivated by this question we introduce programmable and interpretable regression networks for pattern recognition and address mode decomposition as a prototypical problem. The programming of these networks is achieved by assembling elementary modules decomposing and recomposing kernels and data. These elementary steps are repeated across levels of abstraction and interpreted from the equivalent perspectives of optimal recovery, game theory and Gaussian process regression (GPR). The prototypical mode/kernel decomposition module produces an optimal approximation (w₁,w₂,⋯,w_m) of an element (v₁,v₂,…,v_m) of a product of Hilbert subspaces of a common Hilbert space from the observation of the sum v:=v₁+⋯+v_m. The prototypical mode/kernel recomposition module performs partial sums of the recovered modes w_i based on the alignment between each recovered mode w_i and the data v. We illustrate the proposed framework by programming regression networks approximating the modes v_i=a_i(t)y_i(θ_i(t)) of a (possibly noisy) signal ∑_iv_i when the amplitudes a_i, instantaneous phases θ_i and periodic waveforms y_i may all be unknown and show near machine precision recovery under regularity and separation assumptions on the instantaneous amplitudes a_i and frequencies θ_i. The structure of some of these networks share intriguing similarities with convolutional neural networks while being interpretable, programmable and amenable to theoretical analysis.

Competitive Mirror Descent

Year: 2020 DOI: 10.48550/arXiv.2006.10179 Constrained competitive optimization involves multiple agents trying to minimize conflicting objectives, subject to constraints. This is a highly expressive modeling language that subsumes most of modern machine learning. In this work we propose competitive mirror descent (CMD): a general method for solving such problems based on first order information that can be obtained by automatic differentiation. First, by adding Lagrange multipliers, we obtain a simplified constraint set with an associated Bregman potential. At each iteration, we then solve for the Nash equilibrium of a regularized bilinear approximation of the full problem to obtain a direction of movement of the agents. Finally, we obtain the next iterate by following this direction according to the dual geometry induced by the Bregman potential. By using the dual geometry we obtain feasible iterates despite only solving a linear system at each iteration, eliminating the need for projection steps while still accounting for the global nonlinear structure of the constraint set. As a special case we obtain a novel competitive multiplicative weights algorithm for problems on the positive cone.

Learning dynamical systems from data: a simple cross-validation perspective

Year: 2020 DOI: 10.48550/arXiv.2007.05074 Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. We present variants of cross-validation (Kernel Flows [31] and its variants based on Maximum Mean Discrepancy and Lyapunov exponents) as simple approaches for learning the kernel used in these emulators.

Do ideas have shape? Plato's theory of forms as the continuous limit of artificial neural networks

Year: 2020 DOI: 10.48550/arXiv.2008.03920 We show that ResNets converge, in the infinite depth limit, to a generalization of image registration algorithms. In this generalization, images are replaced by abstractions (ideas) living in high dimensional RKHS spaces, and material points are replaced by data points. Whereas computational anatomy aligns images via deformations of the material space, this generalization aligns ideas by via transformations of their RKHS. This identification of ResNets as idea registration algorithms has several remarkable consequences. The search for good architectures can be reduced to that of good kernels, and we show that the composition of idea registration blocks with reduced equivariant multi-channel kernels (introduced here) recovers and generalizes CNNs to arbitrary spaces and groups of transformations. Minimizers of L2 regularized ResNets satisfy a discrete least action principle implying the near preservation of the norm of weights and biases across layers. The parameters of trained ResNets can be identified as solutions of an autonomous Hamiltonian system defined by the activation function and the architecture of the ANN. Momenta variables provide a sparse representation of the parameters of a ResNet. The registration regularization strategy provides a provably robust alternative to Dropout for ANNs. Pointwise RKHS error estimates lead to deterministic error estimates for ANNs.

Data-driven geophysical forecasting: Simple, low-cost, and accurate baselines with kernel methods

Year: 2021 DOI: 10.48550/arXiv.2103.10935 Modeling geophysical processes as low-dimensional dynamical systems and regressing their vector field from data is a promising approach for learning emulators of such systems. We show that when the kernel of these emulators is also learned from data (using kernel flows, a variant of cross-validation), then the resulting data-driven models are not only faster than equation-based models but are easier to train than neural networks such as the long short-term memory neural network. In addition, they are also more accurate and predictive than the latter. When trained on geophysical observational data, for example, the weekly averaged global sea-surface temperature, considerable gains are also observed by the proposed technique in comparison to classical partial differential equation-based models in terms of forecast computational cost and accuracy. When trained on publicly available re-analysis data for the daily temperature of the North-American continent, we see significant improvements over classical baselines such as climatology and persistence-based forecast techniques. Although our experiments concern specific examples, the proposed approach is general, and our results support the viability of kernel methods (with learned kernels) for interpretable and computationally efficient geophysical forecasting for a large diversity of processes.

Decision Theoretic Bootstrapping

Year: 2021 DOI: 10.48550/arXiv.2103.09982 The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite; they are imperfectly known (and possibly distinct) when the data is finite (and possibly corrupted) and this uncertainty must be taken into account for robust Uncertainty Quantification (UQ). We present a general decision-theoretic bootstrapping solution to this problem: (1) partition the available data into a training subset and a UQ subset (2) take m subsampled subsets of the training set and train m models (3) partition the UQ set into n sorted subsets and take a random fraction of them to define n corresponding empirical distributions μ_j (4) consider the adversarial game where Player I selects a model i∈{1,…,m}, Player II selects the UQ distribution μ_j and Player I receives a loss defined by evaluating the model i against data points sampled from μ_j (5) identify optimal mixed strategies (probability distributions over models and UQ distributions) for both players. These randomized optimal mixed strategies provide optimal model mixtures and UQ estimates given the adversarial uncertainty of the training and testing distributions represented by the game. The proposed approach provides (1) some degree of robustness to distributional shift in both the distribution of training data and that of the testing data (2) conditional probability distributions on the output space forming aleatory representations of the uncertainty on the output as a function of the input variable.

Uncertainty Quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball

Year: 2021 DOI: 10.48550/arXiv.2108.10517 There are essentially three kinds of approaches to Uncertainty Quantification (UQ): (A) robust optimization, (B) Bayesian, (C) decision theory. Although (A) is robust, it is unfavorable with respect to accuracy and data assimilation. (B) requires a prior, it is generally brittle and posterior estimations can be slow. Although (C) leads to the identification of an optimal prior, its approximation suffers from the curse of dimensionality and the notion of risk is one that is averaged with respect to the distribution of the data. We introduce a 4th kind which is a hybrid between (A), (B), (C), and hypothesis testing. It can be summarized as, after observing a sample x, (1) defining a likelihood region through the relative likelihood and (2) playing a minmax game in that region to define optimal estimators and their risk. The resulting method has several desirable properties (a) an optimal prior is identified after measuring the data, and the notion of risk is a posterior one, (b) the determination of the optimal estimate and its risk can be reduced to computing the minimum enclosing ball of the image of the likelihood region under the quantity of interest map (which is fast and not subject to the curse of dimensionality). The method is characterized by a parameter in [0,1] acting as an assumed lower bound on the rarity of the observed data (the relative likelihood). When that parameter is near 1, the method produces a posterior distribution concentrated around a maximum likelihood estimate with tight but low confidence UQ estimates. When that parameter is near 0, the method produces a maximal risk posterior distribution with high confidence UQ estimates. In addition to navigating the accuracy-uncertainty tradeoff, the proposed method addresses the brittleness of Bayesian inference by navigating the robustness-accuracy tradeoff associated with data assimilation.

Aggregation of Models, Choices, Beliefs, and Preferences

Year: 2021 DOI: 10.48550/arXiv.2111.11630 A natural notion of rationality/consistency for aggregating models is that, for all (possibly aggregated) models A and B, if the output of model A is f(A) and if the output model B is f(B), then the output of the model obtained by aggregating A and B must be a weighted average of f(A) and f(B). Similarly, a natural notion of rationality for aggregating preferences of ensembles of experts is that, for all (possibly aggregated) experts A and B, and all possible choices x and y, if both A and B prefer x over y, then the expert obtained by aggregating A and B must also prefer x over y. Rational aggregation is an important element of uncertainty quantification, and it lies behind many seemingly different results in economic theory: spanning social choice, belief formation, and individual decision making. Three examples of rational aggregation rules are as follows. (1) Give each individual model (expert) a weight (a score) and use weighted averaging to aggregate individual or finite ensembles of models (experts). (2) Order/rank individual model (expert) and let the aggregation of a finite ensemble of individual models (experts) be the highest-ranked individual model (expert) in that ensemble. (3) Give each individual model (expert) a weight, introduce a weak order/ranking over the set of models/experts, aggregate A and B as the weighted average of the highest-ranked models (experts) in A or B. Note that (1) and (2) are particular cases of (3). In this paper, we show that all rational aggregation rules are of the form (3). This result unifies aggregation procedures across different economic environments. Following the main representation, we show applications and extensions of our representation in various separated economics topics such as belief formation, choice theory, and social welfare economics.

Learning dynamical systems from data: A simple cross-validation perspective, part III: Irregularly-Sampled Time Series

Year: 2021 DOI: 10.48550/arXiv.2111.13037 A simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a kernel. In particular, this strategy is highly efficient (both in terms of accuracy and complexity) when the kernel is data-adapted using Kernel Flows (KF) [34] (which uses gradient-based optimization to learn a kernel based on the premise that a kernel is good if there is no significant loss in accuracy if half of the data is used for interpolation). Despite its previous successes, this strategy (based on interpolating the vector field driving the dynamical system) breaks down when the observed time series is not regularly sampled in time. In this work, we propose to address this problem by directly approximating the vector field of the dynamical system by incorporating time differences between observations in the (KF) data-adapted kernels. We compare our approach with the classical one over different benchmark dynamical systems and show that it significantly improves the forecasting accuracy while remaining simple, fast, and robust.

Aggregation of Pareto optimal models

Year: 2021 DOI: 10.48550/arXiv.2112.04161 In statistical decision theory, a model is said to be Pareto optimal (or admissible) if no other model carries less risk for at least one state of nature while presenting no more risk for others. How can you rationally aggregate/combine a finite set of Pareto optimal models while preserving Pareto efficiency? This question is nontrivial because weighted model averaging does not, in general, preserve Pareto efficiency. This paper presents an answer in four logical steps: (1) A rational aggregation rule should preserve Pareto efficiency (2) Due to the complete class theorem, Pareto optimal models must be Bayesian, i.e., they minimize a risk where the true state of nature is averaged with respect to some prior. Therefore each Pareto optimal model can be associated with a prior, and Pareto efficiency can be maintained by aggregating Pareto optimal models through their priors. (3) A prior can be interpreted as a preference ranking over models: prior π prefers model A over model B if the average risk of A is lower than the average risk of B. (4) A rational/consistent aggregation rule should preserve this preference ranking: If both priors π and π′ prefer model A over model B, then the prior obtained by aggregating π and π′ must also prefer A over B. Under these four steps, we show that all rational/consistent aggregation rules are as follows: Give each individual Pareto optimal model a weight, introduce a weak order/ranking over the set of Pareto optimal models, aggregate a finite set of models S as the model associated with the prior obtained as the weighted average of the priors of the highest-ranked models in S. This result shows that all rational/consistent aggregation rules must follow a generalization of hierarchical Bayesian modeling. Following our main result, we present applications to Kernel smoothing, time-depreciating models, and voting mechanisms.

Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes

Year: 2024 DOI: 10.48550/arXiv.2304.01294

In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered considerable interest due to their flexibility, robust theoretical guarantees, and close ties to traditional methods. They can transform the solving of general nonlinear PDEs into solving quadratic optimization problems with nonlinear, PDE-induced constraints. However, the complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel, and its \textit{partial derivatives}, a result of the PDE constraint and for which fast algorithms are scarce.

The primary goal of this paper is to provide a near-linear complexity algorithm for working with such kernel matrices. We present a sparse Cholesky factorization algorithm for these matrices based on the near-sparsity of the Cholesky factor under a novel ordering of pointwise and derivative measurements. The near-sparsity is rigorously justified by directly connecting the factor to GP regression and exponential decay of basis functions in numerical homogenization. We then employ the Vecchia approximation of GPs, which is optimal in the Kullback-Leibler divergence, to compute the approximate factor. This enables us to compute ϵ-approximate inverse Cholesky factors of the kernel matrices with complexity O(Nlogd(N/ϵ)) in space and O(Nlog2d(N/ϵ)) in time. We integrate sparse Cholesky factorizations into optimization algorithms to obtain fast solvers of the nonlinear PDE. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Ampère equations.