CaltechAUTHORS: Combined

CaltechAUTHORS: Combined https://feeds.library.caltech.edu/people/Smyth-P/combined.rss A Caltech Library Repository Feed http://www.rssboard.org/rss-specification python-feedgen en Tue, 25 Feb 2025 19:30:11 -0800 Decision tree design from a communication theory standpoint https://resolver.caltech.edu/CaltechAUTHORS:20190314-130609598 Year: 1988 DOI: 10.1109/18.21221 A communication theory approach to decision tree design based on a top-town mutual information algorithm is presented. It is shown that this algorithm is equivalent to a form of Shannon-Fano prefix coding, and several fundamental bounds relating decision-tree parameters are derived. The bounds are used in conjunction with a rate-distortion interpretation of tree design to explain several phenomena previously observed in practical decision-tree design. A termination rule for the algorithm called the delta-entropy rule is proposed that improves its robustness in the presence of noise. Simulation results are presented, showing that the tree classifiers derived by the algorithm compare favourably to the single nearest neighbour classifier. https://resolver.caltech.edu/CaltechAUTHORS:20190314-130609598 An Information Theoretic Approach to Rule-Based Connectionist Expert Systems https://resolver.caltech.edu/CaltechAUTHORS:20160107-155547718 Year: 1989 We discuss in this paper architectures for executing probabilistic rule-bases in a parallel manner, using as a theoretical basis recently introduced information-theoretic models. We will begin by describing our (non-neural) learning algorithm and theory of quantitative rule modelling, followed by a discussion on the exact nature of two particular models. Finally we work through an example of our approach, going from database to rules to inference network, and compare the network's performance with the theoretical limits for specific problems. https://resolver.caltech.edu/CaltechAUTHORS:20160107-155547718 An Information Theoretic Approach to Modeling Neural Network Expert Systems https://resolver.caltech.edu/CaltechAUTHORS:20170711-165746284 Year: 1989 DOI: 10.1109/ITW.1989.761436 In this paper we propose several novel techniques for mapping rule bases, such as are used in rule based expert systems, onto neural network architectures. Our objective in doing this is to achieve a system capable of incremental learning, and distributed probabilistic inference. Such a system would be capable of performing inference many orders of magnitude faster than current serial rule based expert systems, and hence be capable of true real time operation. In addition, the rule based formalism gives the system an explicit knowledge representation, unlike current neural models. We propose an information-theoretic approach to this problem, which really has two aspects: firstly learning the model and, secondly, performing inference using this model. We will show a clear pathway to implementing an expert system starting from raw data, via a learned rule-based model, to a neural network that performs distributed inference. https://resolver.caltech.edu/CaltechAUTHORS:20170711-165746284 Objective Functions For Neural Network Classifier Design https://resolver.caltech.edu/CaltechAUTHORS:20170620-163501947 Year: 1991 DOI: 10.1109/ISIT.1991.695143 Backpropagation was originally derived in the context of minimizing a mean-squared error (MSE) objective function. More recently there has been interest in objective functions that provide accurate class probability estimates. In this talk we derive necessary and sufficient conditions on the required form of an objective function to provide probability estimates. This leads to the definition of a general class of functions which includes MSE and cross entropy (CE) as two of the simplest cases. We establish the equivalence of these functions to Maximum Likelihood estimation and the more general principle of Minimum Description Length models. Empirical results are used to demonstrate the tradeoffs associated with the choice of objective functions which minimize to a probability. https://resolver.caltech.edu/CaltechAUTHORS:20170620-163501947 Objective functions for probability estimation https://resolver.caltech.edu/CaltechAUTHORS:20190314-142000675 Year: 1991 DOI: 10.1109/ijcnn.1991.155295 Backpropagation was originally derived in the context of minimizing a mean-squared error (MSE) objective function. More recently there has been interest in objective functions that provide accurate class probability estimates. In this paper we derive necessary and sufficient conditions on the required form of an objective function to provide probability estimates. This leads to the definition of a general class of functions which includes MSE and cross cutropy (CE) as two of the simplest cases. https://resolver.caltech.edu/CaltechAUTHORS:20190314-142000675 An information theoretic approach to rule induction from databases https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127061 Year: 1992 DOI: 10.1109/69.149926 The knowledge acquisition bottleneck in obtaining rules directly from an expert is well known. Hence, the problem of automated rule acquisition from data is a well-motivated one, particularly for domains where a database of sample data exists. In this paper we introduce a novel algorithm for the induction of rules from examples. The algorithm is novel in the sense that it not only learns rules for a given concept (classification), but it simultaneously learns rules relating multiple concepts. This type of learning, known as generalized rule induction is considerably more general than existing algorithms which tend to be classification oriented. Initially we focus on the problem of determining a quantitative, well-defined rule preference measure. In particular, we propose a quantity called the J-measure as an information theoretic alternative to existing approaches. The J-measure quantifies the information content of a rule or a hypothesis. We will outline the information theoretic origins of this measure and examine its plausibility as a hypothesis preference measure. We then define the ITRULE algorithm which uses the newly proposed measure to learn a set of optimal rules from a set of data samples, and we conclude the paper with an analysis of experimental results on real-world data. https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127061 Rule-based neural networks for classification and probability estimation https://resolver.caltech.edu/CaltechAUTHORS:GOOnc92 Year: 1992 DOI: 10.1162/neco.1992.4.6.781 In this paper we propose a network architecture that combines a rule-based approach with that of the neural network paradigm. Our primary motivation for this is to ensure that the knowledge embodied in the network is explicitly encoded in the form of understandable rules. This enables the network's decision to be understood, and provides an audit trail of how that decision was arrived at. We utilize an information theoretic approach to learning a model of the domain knowledge from examples. This model takes the form of a set of probabilistic conjunctive rules between discrete input evidence variables and output class variables. These rules are then mapped onto the weights and nodes of a feedforward neural network resulting in a directly specified architecture. The network acts as parallel Bayesian classifier, but more importantly, can also output posterior probability estimates of the class variables. Empirical tests on a number of data sets show that the rule-based classifier performs comparably with standard neural network classifiers, while possessing unique advantages in terms of knowledge representation and probability estimation. https://resolver.caltech.edu/CaltechAUTHORS:GOOnc92 Self-clustering recurrent networks https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127316 Year: 1993 DOI: 10.1109/icnn.1993.298535 Recurrent neural networks have recently been shown to have the ability to learn finite state automata (FSA's) from examples. In this paper it is shown, based on empirical analyses, that second-order networks which are trained to learn FSA's tend to form discrete clusters as the state representation in the hidden unit activation space. This observation is used to define 'self-clustering' networks which automatically extract discrete state machines from the learned network. However, the problem of instability on long test strings is a factor in the generalization performance of recurrent networks - in essence, because of the analog nature of the state representation, the network gradually "forgets" where the individual state regions are. To address this problem a new network structure is introduced whereby the network uses quantization in the feedback path to force the learning of discrete states. Experimental results show that the new method learns FSA's just as well as existing methods in the literature but with the significant advantage of being stable on test strings of arbitrary length. https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127316 On loss functions which minimize to conditional expected values and posterior probabilities https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127224 Year: 1993 DOI: 10.1109/18.243457 A loss function, or objective function, is a function used to compare parameters when fitting a model to data. The loss function gives a distance between the model output and the desired output. Two common examples are the squared-error loss function and the cross entropy loss function. Minimizing the mean-square error loss function is equivalent to minimizing the mean square difference between the model output and the expected value of the output given a particular input. This property of minimization to the expected value is formalized as P-admissibility. The necessary and sufficient conditions for P-admissibility, leading to a parametric description of all P-admissible loss functions, are found. In particular, it is shown that two of the simplest members of this class of functions are the squared error and the cross entropy loss functions. One application of this work is in the choice of a loss function for training neural networks to provide probability estimates. https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127224 Learning finite state machines with self-clustering recurrent networks https://resolver.caltech.edu/CaltechAUTHORS:ZENnc93 Year: 1993 DOI: 10.1162/neco.1993.5.6.976 Recent work has shown that recurrent neural networks have the ability to learn finite state automata from examples. In particular, networks using second-order units have been successful at this task. In studying the performance and learning behavior of such networks we have found that the second-order network model attempts to form clusters in activation space as its internal representation of states. However, these learned states become unstable as longer and longer test input strings are presented to the network. In essence, the network "forgets" where the individual states are in activation space. In this paper we propose a new method to force such a network to learn stable states by introducing discretization into the network and using a pseudo-gradient learning rule to perform training. The essence of the learning rule is that in doing gradient descent, it makes use of the gradient of a sigmoid function as a heuristic hint in place of that of the hard-limiting function, while still using the discretized value in the feedback update path. The new structure uses isolated points in activation space instead of vague clusters as its internal representation of states. It is shown to have similar capabilities in learning finite state automata as the original network, but without the instability problem. The proposed pseudo-gradient learning rule may also be used as a basis for training other types of networks that have hard-limiting threshold activation functions. https://resolver.caltech.edu/CaltechAUTHORS:ZENnc93 Discrete recurrent neural networks for grammatical inference https://resolver.caltech.edu/CaltechAUTHORS:20190315-142359688 Year: 1994 DOI: 10.1109/72.279194 We describe a novel neural architecture for learning deterministic context-free grammars, or equivalently, deterministic pushdown automata. The unique feature of the proposed network is that it forms stable state representations during learning-previous work has shown that conventional analog recurrent networks can be inherently unstable in that they cannot retain their state memory for long input strings. We have recently introduced the discrete recurrent network architecture for learning finite-state automata. Here we extend this model to include a discrete external stack with discrete symbols. A composite error function is described to handle the different situations encountered in learning. The pseudo-gradient learning method (introduced in previous work) is in turn extended for the minimization of these error functions. Empirical trials validating the effectiveness of the pseudo-gradient learning method are presented, for networks both with and without an external stack. Experimental results show that the new networks are successful in learning some simple pushdown automata, though overfitting and non-convergent learning can also occur. Once learned, the internal representation of the network is provably stable; i.e., it classifies unseen strings of arbitrary length with 100% accuracy. https://resolver.caltech.edu/CaltechAUTHORS:20190315-142359688 Automating the Hunt for Volcanoes on Venus https://resolver.caltech.edu/CaltechAUTHORS:20120306-142457025 Year: 1994 DOI: 10.1109/CVPR.1994.323844 Our long-term goal is to develop a trainable tool for locating patterns of interest in large image databases. Toward this goal we have developed a prototype system, based on classical filtering and statistical pattern recognition techniques, for automatically locating volcanoes in the Magellan SAR database of Venus. Training for the specific volcano-detection task is obtained by synthesizing feature templates (via normalization and principal components analysis) from a small number of examples provided by experts. Candidate regions identified by a focus of attention (FOA) algorithm are classified based on correlations with the feature templates. Preliminary tests show performance comparable to trained human observers. https://resolver.caltech.edu/CaltechAUTHORS:20120306-142457025 Automated analysis of radar imagery of Venus: handling lack of ground truth https://resolver.caltech.edu/CaltechAUTHORS:20120306-150027112 Year: 1994 DOI: 10.1109/ICIP.1994.413852 Lack of verifiable ground truth is a common problem in remote sensing image analysis. For example, consider the synthetic aperture radar (SAR) image data of Venus obtained by the Magellan spacecraft. Planetary scientists are interested in automatically cataloging the locations of all the small volcanoes in this data set; however, the problem is very difficult and cannot be performed with perfect reliability even by human experts. Thus, training and evaluating the performance of an automatic algorithm on this data set must be handled carefully. We discuss the use of weighted free-response receiver-operating characteristics (wFROCs) for evaluating detection performance when the "ground truth" is subjective. In particular, we evaluate the relative detection performance of humans and automatic algorithms. Our experimental results indicate that proper assessment of the uncertainty in "ground truth" is essential in applications of this nature. https://resolver.caltech.edu/CaltechAUTHORS:20120306-150027112 Inferring Ground Truth from Subjective Labelling of Venus Images https://resolver.caltech.edu/CaltechAUTHORS:20150305-153627706 Year: 1995 In remote sensing applications "ground-truth" data is often used as the basis for training pattern recognition algorithms to generate thematic maps or to detect objects of interest. In practical situations, experts may visually examine the images and provide a subjective noisy estimate of the truth. Calibrating the reliability and bias of expert labellers is a non-trivial problem. In this paper we discuss some of our recent work on this topic in the context of detecting small volcanoes in Magellan SAR images of Venus. Empirical results (using the Expectation-Maximization procedure) suggest that accounting for subjective noise can be quite significant in terms of quantifying both human and algorithm detection performance. https://resolver.caltech.edu/CaltechAUTHORS:20150305-153627706 Automated analysis and exploration of image databases: Results, progress, and challenges https://resolver.caltech.edu/CaltechAUTHORS:20190723-074142687 Year: 1995 DOI: 10.1007/bf00962819 In areas as diverse as earth remote sensing, astronomy, and medical imaging, image acquisition technology has undergone tremendous improvements in recent years. The vast amounts of scientific data are potential treasure-troves for scientific investigation and analysis. Unfortunately, advances in our ability to deal with this volume of data in an effective manner have not paralleled the hardware gains. While special-purpose tools for particular applications exist, there is a dearth of useful general-purpose software tools and algorithms which can assist a scientist in exploring large scientific image databases. This paper presents our recent progress in developing interactive semi-automated image database exploration tools based on pattern recognition and machine learning technology. We first present a completed and successful application that illustrates the basic approach: the SKICAT system used for the reduction and analysis of a 3 terabyte astronomical data set. SKICAT integrates techniques from image processing, data classification, and database management. It represents a system in which machine learning played a powerful and enabling role, and solved a difficult, scientifically significant problem. We then proceed to discuss the general problem of automated image database exploration, the particular aspects of image databases which distinguish them from other databases, and how this impacts the application of off-the-shelf learning algorithms to problems of this nature. A second large image database is used to ground this discussion: Magellan's images of the surface of the planet Venus. The paper concludes with a discussion of current and future challenges. https://resolver.caltech.edu/CaltechAUTHORS:20190723-074142687 Probabilistic independence networks for hidden Markov probability models https://resolver.caltech.edu/CaltechAUTHORS:SMYnc97 Year: 1997 DOI: 10.1162/neco.1997.9.2.227 Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas, including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper presents a self-contained review of the basic principles of PINs. It is shown that the well-known forward-backward (F-B) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach. https://resolver.caltech.edu/CaltechAUTHORS:SMYnc97 Learning to Recognize Volcanoes on Venus https://resolver.caltech.edu/CaltechAUTHORS:20140730-101721831 Year: 1998 DOI: 10.1023/A:1007400206189 Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian imagery. A machine learning approach is used because it is much easier for geologists to identify examples of volcanoes in the imagery than it is to specify domain knowledge as a set of pixel-level constraints. This approach can also provide portability to other domains without the need for explicit reprogramming; the user simply supplies the system with a new set of training examples. We show how the development of such a system requires a completely different set of skills than are required for applying machine learning to "toy world" domains. This paper discusses important aspects of the application process not commonly encountered in the "toy world" including obtaining labeled training data, the difficulties of working with pixel data, and the automatic extraction of higher-level features. https://resolver.caltech.edu/CaltechAUTHORS:20140730-101721831 Gene Expression Clustering with Functional Mixture Models https://resolver.caltech.edu/CaltechAUTHORS:20160309-105810912 Year: 2004 We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course data. Each functional cluster center is a nonlinear combination of solutions of a simple linear differential equation that describes the change of individual mRNA levels when the synthesis and decay rates are constant. The mixture of continuous time parametric functional forms allows one to (a) account for the heterogeneity in the observed profiles, (b) align the profiles in time by estimating real-valued time shifts, (c) capture the synthesis and decay of mRNA in the course of an experiment, and (d) regularize noisy profiles by enforcing smoothness in the mean curves. We derive an EM algorithm for estimating the parameters of the model, and apply the proposed approach to the set of cycling genes in yeast. The experiments show consistent improvement in predictive power and within cluster variance compared to regular Gaussian mixtures. https://resolver.caltech.edu/CaltechAUTHORS:20160309-105810912