(PHD, 2023)

Abstract:

This thesis focuses on numerical methods for scientific computing and scientific machine learning, specifically on solving partial differential equations and inverse problems. The design of numerical algorithms usually encompasses a spectrum that ranges from specialization to generality. Classical approaches, such as finite element methods, and contemporary scientific machine learning approaches, like neural nets, can be viewed as lying at relatively opposite ends of this spectrum. Throughout this thesis, we tackle mathematical challenges associated with both ends by advancing rigorous multiscale and statistical numerical methods.

Regarding the multiscale numerical methods, we present an exponentially convergent multiscale finite element method for solving high-frequency Helmholtz’s equation with rough coefficients. To achieve this, we first identify the local low-complexity structure of Helmholtz’s equations when the resolution is smaller than the wavelength. Then, we construct local basis functions by solving local spectral problems and couple them globally through non-overlapped domain decomposition and Galerkin’s method. This results in a numerical method that achieves nearly exponentially convergent accuracy regarding the number of local basis functions, even when the solution is highly non-smooth. We also analyze the role of a subsampled lengthscale in variational multiscale methods, characterizing the tradeoff between accuracy and efficiency in the numerical upscaling of heterogeneous PDEs and scattered data approximation.

As for the statistical numerical methods, we discuss using Gaussian processes and kernel methods to solve nonlinear PDEs and inverse problems. This framework incorporates the flavor of scientific machine learning automation and extends classical meshless solvers. It transforms general PDE problems into quadratic optimization with nonlinear constraints. We present the theoretical underpinning of the methodology. For the scalability of the method, we develop state-of-the-art algorithms to handle dense kernel matrices in both low and high-dimensional scientific problems. For adaptivity, we analyze the convergence and consistency of hierarchical learning algorithms that adaptively select kernel functions. Additionally, we note that statistical numerical methods offer natural uncertainty quantification within the Bayesian framework. In this regard, our further work contributes to some new understanding of efficient statistical sampling techniques based on gradient flows.

]]>

(PHD, 2021)

Abstract:

In this thesis, we use statistical inference and competitive games to design algorithms for computational mathematics.

In the first part, comprising chapters two through six, we use ideas from Gaussian process statistics to obtain fast solvers for differential and integral equations. We begin by observing the equivalence of conditional (near-)independence of Gaussian processes and the (near-)sparsity of the Cholesky factors of its precision and covariance matrices. This implies the existence of a large class of *dense* matrices with almost *sparse* Cholesky factors, thereby greatly increasing the scope of application of sparse Cholesky factorization. Using an elimination ordering and sparsity pattern motivated by the *screening effect* in spatial statistics, we can compute approximate Cholesky factors of the covariance matrices of Gaussian processes admitting a screening effect in near-linear computational complexity. These include many popular smoothness priors such as the Matérn class of covariance functions.
In the special case of Green’s matrices of elliptic boundary value problems (with possibly unknown elliptic operators of arbitrarily high order, with possibly rough coefficients), we can use tools from numerical homogenization to prove the exponential accuracy of our method. This result improves the state-of-the-art for solving general elliptic integral equations and provides the first proof of an exponential screening effect. We also derive a fast solver for elliptic partial differential equations, with accuracy-vs-complexity guarantees that improve upon the state-of-the-art. Furthermore, the resulting solver is performant in practice, frequently beating established algebraic multigrid libraries such as AMGCL and Trilinos on a series of challenging problems in two and three dimensions.
Finally, for any given covariance matrix, we obtain a closed-form expression for its *optimal* (in terms of Kullback-Leibler divergence) approximate inverse-Cholesky factorization subject to a sparsity constraint, recovering Vecchia approximation and factorized sparse approximate inverses. Our method is highly robust, embarrassingly parallel, and further improves our asymptotic results on the solution of elliptic integral equations. We also provide a way to apply our techniques to sums of independent Gaussian processes, resolving a major limitation of existing methods based on the screening effect. As a result, we obtain fast algorithms for large-scale Gaussian process regression problems with possibly noisy measurements.

In the second part of this thesis, comprising chapters seven through nine, we study continuous optimization through the lens of competitive games. In particular, we consider *competitive optimization*, where multiple agents attempt to minimize conflicting objectives. In the single-agent case, the updates of gradient descent are minimizers of quadratically regularized linearizations of the loss function. We propose to generalize this idea by using the Nash equilibria of quadratically regularized linearizations of the competitive game as updates (*linearize the game*). We provide fundamental reasons why the natural notion of linearization for competitive optimization problems is given by the *multilinear* (as opposed to linear) approximation of the agents’ loss functions. The resulting algorithm, which we call *competitive gradient descent*, thus provides a natural generalization of gradient descent to competitive optimization. By using ideas from information geometry, we extend CGD to competitive mirror descent (CMD) that can be applied to a vast range of constrained competitive optimization problems. CGD and CMD resolve the cycling problem of simultaneous gradient descent and show promising results on problems arising in constrained optimization, robust control theory, and generative adversarial networks. Finally, we point out the *GAN-dilemma* that refutes the common interpretation of GANs as approximate minimizers of a divergence obtained in the limit of a fully trained discriminator. Instead, we argue that GAN performance relies on the *implicit competitive regularization* (ICR) due to the simultaneous optimization of generator and discriminator and support this hypothesis with results on low-dimensional model problems and GANs on CIFAR10.

]]>

(PHD, 2021)

Abstract:

A major technique in learning involves the identification of patterns and their use to make predictions. In this work, we examine the symbiotic relationship between patterns and Gaussian process regression (GPR), which is mathematically equivalent to kernel interpolation. We introduce techniques where GPR can be used to learn patterns in denoising and mode (signal) decomposition. Additionally, we present the kernel flow (KF) algorithm which learns a kernels from patterns in the data with methodology inspired by cross validation. We further show how the KF algorithm can be applied to artificial neural networks (ANNs) to make improvements to learning patterns in images.

In our denoising and mode decomposition examples, we show how kernels can be constructed to estimate patterns that may be hidden due to data corruption. In other words, we demonstrate how to learn patterns with kernels. Donoho and Johnstone proposed a near-minimax method for reconstructing an unknown smooth function *u* from noisy data *u* + ζ by translating the empirical wavelet coefficients of *u* + ζ towards zero. We consider the situation where the prior information on the unknown function *u* may not be the regularity of *u*, but that of ℒ*u* where ℒ is a linear operator, such as a partial differential equation (PDE) or a graph Laplacian. We show that a near-minimax approximation of *u* can be obtained by truncating the ℒ-gamblet (operator-adapted wavelet) coefficients of *u* + ζ. The recovery of *u* can be seen to be precisely a Gaussian conditioning of *u* + ζ on measurement functions with length scale dependent on the signal-to-noise ratio.

We next introduce kernel mode decomposition (KMD), which has been designed to learn the modes *v _{i}* =

GPR and kernel interpolation require the selection of an appropriate kernel modeling the data. We present the KF algorithm, which is a numerical-approximation approach to this selection. The main principle the method utilizes is that a “good” kernel is able to make accurate predictions with small subsets of a training set. In this way, we learn a kernel from patterns. In image classification, we show that the learned kernels are able to classify accurately using only one training image per class and show signs of unsupervised learning. Furthermore, we introduce the combination of the KF algorithm with conventional neural-network training. This combination is able to train the intermediate-layer outputs of the network simultaneously with the final-layer output. We test the proposed method on Convolutional Neural Networks (CNNs) and Wide Residual Networks (WRNs) without alteration of their structure or their output classifier. We report reduced test errors, decreased generalization gaps, and increased robustness to distribution shift without significant increase in computational complexity relative to standard CNN and WRN training (with Drop Out and Batch Normalization).

As a whole, this work highlights the interplay between kernel techniques with pattern recognition and numerical approximation.

]]>

(PHD, 2011)

Abstract:

In order to accelerate computations and improve long time accuracy of numerical simulations, this thesis develops multiscale geometric integrators.

For general multiscale stiff ODEs, SDEs, and PDEs, FLow AVeraging integratORs (FLAVORs) have been proposed for the coarse time-stepping without any identification of the slow or the fast variables. In the special case of deterministic and stochastic mechanical systems, symplectic, multisymplectic, and quasi-symplectic multiscale integrators are easily obtained using this strategy.

For highly oscillatory mechanical systems (with quasi-quadratic stiff potentials and possibly high-dimensional), a specialized symplectic method has been devised to provide improved efficiency and accuracy. This method is based on the introduction of two highly nontrivial matrix exponentiation algorithms, which are generic, efficient, and symplectic (if the exact exponential is symplectic).

For multiscale systems with Dirac-distributed fast processes, a family of symplectic, linearly-implicit and stable integrators has been designed for coarse step simulations. An application is the fast and accurate integration of constrained dynamics.

In addition, if one cares about statistical properties of an ensemble of trajectories, but not the numerical accuracy of a single trajectory, we suggest tuning friction and annealing temperature in a Langevin process to accelerate its convergence.

Other works include variational integration of circuits, efficient simulation of a nonlinear wave, and finding optimal transition pathways in stochastic dynamical systems (with a demonstration of mass effects in molecular dynamics).

]]>

(PHD, 2008)

Abstract: We show how to parameterise a homogenised conductivity in R² by a scalar function s(x), despite the fact that the conductivity parameter in the related up-scaled elliptic operator is typically tensor valued. Ellipticity of the operator is equivalent to strict convexity of s(x), and with consideration to mesh connectivity, this equivalence extends to discrete parameterisations over triangulated domains. We apply the parameterisation in three contexts: (i) sampling s(x) produces a family of stiffness matrices representing the elliptic operator over a hierarchy of scales; (ii) the curvature of s(x) directs the construction of meshes well-adapted to the anisotropy of the operator, improving the conditioning of the stiffness matrix and interpolation properties of the mesh; and (iii) using electric impedance tomography to reconstruct s(x) recovers the up-scaled conductivity, which while anisotropic, is unique. Extensions of the parameterisation to R³ are introduced.

]]>

(PHD, 2007)

Abstract:

Numerical upscaling of problems with multiple scale structures have attracted increasing attention in recent years. In particular, problems with non-separable scales pose a great challenge to mathematical analysis and simulation. Most existing methods are either based on the assumption of scale separation or heuristic arguments.

In this thesis, we present rigorous results on homogenization of partial differential equations with L^{∞} coefficients which allow for a continuum of spatial and temporal scales. We propose a new type of compensation phenomena for elliptic, parabolic, and hyperbolic equations. The main idea is the use of the so-called “harmonic coordinates” (“caloric coordinates” in the parabolic case). Under these coordinates, the solutions of these differential equations have one more degree of differentiability. It has been deduced from this compensation phenomenon that numerical homogenization methods formulated as oscillating finite elements can converge in the presence of a continuum of scales, if one uses global caloric coordinates to obtain the test functions instead of using solutions of a local cell problem.

]]>