Combined Feed
https://feeds.library.caltech.edu/people/Smyth-P/combined.rss
A Caltech Library Repository Feedhttp://www.rssboard.org/rss-specificationpython-feedgenenMon, 11 Dec 2023 13:18:30 +0000Decision tree design from a communication theory standpoint
https://resolver.caltech.edu/CaltechAUTHORS:20190314-130609598
Authors: Goodman, Rodney M.; Smyth, Padhraic
Year: 1988
DOI: 10.1109/18.21221
A communication theory approach to decision tree design based on a top-town mutual information algorithm is presented. It is shown that this algorithm is equivalent to a form of Shannon-Fano prefix coding, and several fundamental bounds relating decision-tree parameters are derived. The bounds are used in conjunction with a rate-distortion interpretation of tree design to explain several phenomena previously observed in practical decision-tree design. A termination rule for the algorithm called the delta-entropy rule is proposed that improves its robustness in the presence of noise. Simulation results are presented, showing that the tree classifiers derived by the algorithm compare favourably to the single nearest neighbour classifier.https://authors.library.caltech.edu/records/m6j47-r4a59An Information Theoretic Approach to Rule-Based Connectionist Expert Systems
https://resolver.caltech.edu/CaltechAUTHORS:20160107-155547718
Authors: Goodman, Rodney M.; Miller, John W.; Smyth, Padhraic
Year: 1989
We discuss in this paper architectures for executing probabilistic rule-bases in a parallel
manner, using as a theoretical basis recently introduced information-theoretic
models. We will begin by describing our (non-neural) learning algorithm and theory
of quantitative rule modelling, followed by a discussion on the exact nature of two
particular models. Finally we work through an example of our approach, going from
database to rules to inference network, and compare the network's performance with
the theoretical limits for specific problems.https://authors.library.caltech.edu/records/tz4df-tg596An Information Theoretic Approach to Modeling Neural Network Expert Systems
https://resolver.caltech.edu/CaltechAUTHORS:20170711-165746284
Authors: Goodman, Rodney M.; Miller, John W.; Smyth, Padhraic
Year: 1989
DOI: 10.1109/ITW.1989.761436
In this paper we propose several novel techniques for mapping rule bases, such as are used in rule based expert systems, onto neural network architectures. Our objective in doing this is to achieve a system capable of incremental learning, and distributed probabilistic inference. Such a system would be capable of performing inference many orders of magnitude faster than current serial rule based expert systems, and hence be capable of true real time operation. In addition, the rule based formalism gives the system an explicit knowledge representation, unlike current neural models. We propose an information-theoretic approach to this problem, which really has two aspects: firstly learning the model and, secondly, performing inference using this model. We will show a clear pathway to implementing an expert system starting from raw data, via a learned rule-based model, to a neural network that performs distributed inference.https://authors.library.caltech.edu/records/t79vm-e8688Objective Functions For Neural Network Classifier Design
https://resolver.caltech.edu/CaltechAUTHORS:20170620-163501947
Authors: Goodman, Rod; Miller, John W.; Smyth, Padhraic
Year: 1991
DOI: 10.1109/ISIT.1991.695143
Backpropagation was originally derived in the context of minimizing a mean-squared error (MSE) objective function. More recently there has been interest in objective functions that provide accurate class probability estimates. In this talk we derive necessary and sufficient conditions on the required form of an objective function to provide probability estimates. This leads to the definition of a general class of functions which includes MSE and cross entropy (CE) as two of the simplest cases. We establish the equivalence of these functions to Maximum Likelihood estimation and the more general principle of Minimum Description Length models. Empirical results are used to demonstrate the tradeoffs associated with the choice of objective functions which minimize to a probability.https://authors.library.caltech.edu/records/q4xgq-96c04Objective functions for probability estimation
https://resolver.caltech.edu/CaltechAUTHORS:20190314-142000675
Authors: Miller, John W.; Goodman, Rod; Smyth, Padhraic
Year: 1991
DOI: 10.1109/ijcnn.1991.155295
Backpropagation was originally derived in the context of minimizing a mean-squared error (MSE) objective function. More recently there has been interest in objective functions that provide accurate class probability estimates. In this paper we derive necessary and sufficient conditions on the required form of an objective function to provide probability estimates. This leads to the definition of a general class of functions which includes MSE and cross cutropy (CE) as two of the simplest cases.https://authors.library.caltech.edu/records/85qkn-3ga06An information theoretic approach to rule induction from databases
https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127061
Authors: Smyth, Padhraic; Goodman, Rodney M.
Year: 1992
DOI: 10.1109/69.149926
The knowledge acquisition bottleneck in obtaining
rules directly from an expert is well known. Hence, the problem
of automated rule acquisition from data is a well-motivated one,
particularly for domains where a database of sample data exists.
In this paper we introduce a novel algorithm for the induction
of rules from examples. The algorithm is novel in the sense
that it not only learns rules for a given concept (classification),
but it simultaneously learns rules relating multiple concepts.
This type of learning, known as generalized rule induction is
considerably more general than existing algorithms which tend
to be classification oriented. Initially we focus on the problem of
determining a quantitative, well-defined rule preference measure.
In particular, we propose a quantity called the J-measure as
an information theoretic alternative to existing approaches. The
J-measure quantifies the information content of a rule or a
hypothesis. We will outline the information theoretic origins
of this measure and examine its plausibility as a hypothesis
preference measure. We then define the ITRULE algorithm which
uses the newly proposed measure to learn a set of optimal rules
from a set of data samples, and we conclude the paper with an
analysis of experimental results on real-world data.https://authors.library.caltech.edu/records/a0g2j-sr926Rule-based neural networks for classification and probability estimation
https://resolver.caltech.edu/CaltechAUTHORS:GOOnc92
Authors: Goodman, Rodney M.; Higgins, Charles M.; Miller, John W.; Smyth, Padhraic
Year: 1992
DOI: 10.1162/neco.1992.4.6.781
In this paper we propose a network architecture that combines a rule-based approach with that of the neural network paradigm. Our primary motivation for this is to ensure that the knowledge embodied in the network is explicitly encoded in the form of understandable rules. This enables the network's decision to be understood, and provides an audit trail of how that decision was arrived at. We utilize an information theoretic approach to learning a model of the domain knowledge from examples. This model takes the form of a set of probabilistic conjunctive rules between discrete input evidence variables and output class variables. These rules are then mapped onto the weights and nodes of a feedforward neural network resulting in a directly specified architecture. The network acts as parallel Bayesian classifier, but more importantly, can also output posterior probability estimates of the class variables. Empirical tests on a number of data sets show that the rule-based classifier performs comparably with standard neural network classifiers, while possessing unique advantages in terms of knowledge representation and probability estimation.https://authors.library.caltech.edu/records/w6w9f-eaa23Self-clustering recurrent networks
https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127316
Authors: Zeng, Zheng; Goodman, Rodney M.; Smyth, Padhraic
Year: 1993
DOI: 10.1109/icnn.1993.298535
Recurrent neural networks have recently been shown to have the ability to learn finite state automata (FSA's) from examples. In this paper it is shown, based on empirical analyses, that second-order networks which are trained to learn FSA's tend to form discrete clusters as the state representation in the hidden unit activation space. This observation is used to define 'self-clustering' networks which automatically extract discrete state machines from the learned network. However, the problem of instability on long test strings is a factor in the generalization performance of recurrent networks - in essence, because of the analog nature of the state representation, the network gradually "forgets" where the individual state regions are. To address this problem a new network structure is introduced whereby the network uses quantization in the feedback path to force the learning of discrete states. Experimental results show that the new method learns FSA's just as well as existing methods in the literature but with the significant advantage of being stable on test strings of arbitrary length.https://authors.library.caltech.edu/records/2r5dm-sjr89On loss functions which minimize to conditional expected values and posterior probabilities
https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127224
Authors: Miller, John W.; Goodman, Rod; Smyth, Padhraic
Year: 1993
DOI: 10.1109/18.243457
A loss function, or objective function, is a function used to compare parameters when fitting a model to data. The loss function gives a distance between the model output and the desired output. Two common examples are the squared-error loss function and the cross entropy loss function. Minimizing the mean-square error loss function is equivalent to minimizing the mean square difference between the model output and the expected value of the output given a particular input. This property of minimization to the expected value is formalized as P-admissibility. The necessary and sufficient conditions for P-admissibility, leading to a parametric description of all P-admissible loss functions, are found. In particular, it is shown that two of the simplest members of this class of functions are the squared error and the cross entropy loss functions. One application of this work is in the choice of a loss function for training neural networks to provide probability estimates.https://authors.library.caltech.edu/records/59wem-6hd53Learning finite state machines with self-clustering recurrent networks
https://resolver.caltech.edu/CaltechAUTHORS:ZENnc93
Authors: Zeng, Zheng; Goodman, Rodney M.; Smyth, Padhraic
Year: 1993
DOI: 10.1162/neco.1993.5.6.976
Recent work has shown that recurrent neural networks have the ability to learn finite state automata from examples. In particular, networks using second-order units have been successful at this task. In studying the performance and learning behavior of such networks we have found that the second-order network model attempts to form clusters in activation space as its internal representation of states. However, these learned states become unstable as longer and longer test input strings are presented to the network. In essence, the network "forgets" where the individual states are in activation space. In this paper we propose a new method to force such a network to learn stable states by introducing discretization into the network and using a pseudo-gradient learning rule to perform training. The essence of the learning rule is that in doing gradient descent, it makes use of the gradient of a sigmoid function as a heuristic hint in place of that of the hard-limiting function, while still using the discretized value in the feedback update path. The new structure uses isolated points in activation space instead of vague clusters as its internal representation of states. It is shown to have similar capabilities in learning finite state automata as the original network, but without the instability problem. The proposed pseudo-gradient learning rule may also be used as a basis for training other types of networks that have hard-limiting threshold activation functions.https://authors.library.caltech.edu/records/vkqth-zve91Discrete recurrent neural networks for grammatical inference
https://resolver.caltech.edu/CaltechAUTHORS:20190315-142359688
Authors: Zeng, Zheng; Goodman, Rodney M.; Smyth, Padhraic
Year: 1994
DOI: 10.1109/72.279194
We describe a novel neural architecture for learning deterministic context-free grammars, or equivalently, deterministic pushdown automata. The unique feature of the proposed network is that it forms stable state representations during learning-previous work has shown that conventional analog recurrent networks can be inherently unstable in that they cannot retain their state memory for long input strings. We have recently introduced the discrete recurrent network architecture for learning finite-state automata. Here we extend this model to include a discrete external stack with discrete symbols. A composite error function is described to handle the different situations encountered in learning. The pseudo-gradient learning method (introduced in previous work) is in turn extended for the minimization of these error functions. Empirical trials validating the effectiveness of the pseudo-gradient learning method are presented, for networks both with and without an external stack. Experimental results show that the new networks are successful in learning some simple pushdown automata, though overfitting and non-convergent learning can also occur. Once learned, the internal representation of the network is provably stable; i.e., it classifies unseen strings of arbitrary length with 100% accuracy.https://authors.library.caltech.edu/records/86tx1-2nb90Automating the Hunt for Volcanoes on Venus
https://resolver.caltech.edu/CaltechAUTHORS:20120306-142457025
Authors: Burl, M. C.; Fayyad, U. M.; Perona, P.; Smyth, P.; Burl, M. P.
Year: 1994
DOI: 10.1109/CVPR.1994.323844
Our long-term goal is to develop a trainable tool for locating patterns of interest in large image databases. Toward this goal we have developed a prototype system, based on classical filtering and statistical pattern recognition techniques, for automatically locating volcanoes in the Magellan SAR database of Venus. Training for the specific volcano-detection task is obtained by synthesizing feature templates (via normalization and principal components analysis) from a small number of examples provided by experts. Candidate regions identified by a focus of attention (FOA) algorithm are classified based on correlations with the feature templates. Preliminary tests show performance comparable to trained human observers.https://authors.library.caltech.edu/records/d52hq-zw836Automated analysis of radar imagery of Venus: handling lack of ground truth
https://resolver.caltech.edu/CaltechAUTHORS:20120306-150027112
Authors: Burl, M. C.; Fayyad, Usama M.; Perona, Pietro; Smyth, Padhraic
Year: 1994
DOI: 10.1109/ICIP.1994.413852
Lack of verifiable ground truth is a common problem in remote sensing image analysis. For example, consider the synthetic aperture radar (SAR) image data of Venus obtained by the Magellan spacecraft. Planetary scientists are interested in automatically cataloging the locations of all the small volcanoes in this data set; however, the problem is very difficult and cannot be performed with perfect reliability even by human experts. Thus, training and evaluating the performance of an automatic algorithm on this data set must be handled carefully. We discuss the use of weighted free-response receiver-operating characteristics (wFROCs) for evaluating detection performance when the "ground truth" is subjective. In particular, we evaluate the relative detection performance of humans and automatic algorithms. Our experimental results indicate that proper assessment of the uncertainty in "ground truth" is essential in applications of this nature.https://authors.library.caltech.edu/records/xa9d9-ych31Inferring Ground Truth from Subjective Labelling of Venus Images
https://resolver.caltech.edu/CaltechAUTHORS:20150305-153627706
Authors: Smyth, Padhraic; Fayyad, Usama; Burl, Michael; Perona, Pietro; Baldi, Pierre
Year: 1995
In remote sensing applications "ground-truth" data is often used as the basis for training pattern recognition algorithms to generate thematic maps or to detect objects of interest. In practical situations, experts may visually examine the images and provide a subjective noisy estimate of the truth. Calibrating the reliability
and bias of expert labellers is a non-trivial problem. In this paper we discuss some of our recent work on this topic in the context of detecting small volcanoes in Magellan SAR images of Venus. Empirical results (using the Expectation-Maximization procedure) suggest that accounting for subjective noise can be quite significant
in terms of quantifying both human and algorithm detection
performance.https://authors.library.caltech.edu/records/39t16-xsg91Automated analysis and exploration of image databases: Results, progress, and challenges
https://resolver.caltech.edu/CaltechAUTHORS:20190723-074142687
Authors: Fayyad, Usama M.; Smyth, Padhraic; Weir, Nicholas; Djorgovski, S.
Year: 1995
DOI: 10.1007/bf00962819
In areas as diverse as earth remote sensing, astronomy, and medical imaging, image acquisition technology has undergone tremendous improvements in recent years. The vast amounts of scientific data are potential treasure-troves for scientific investigation and analysis. Unfortunately, advances in our ability to deal with this volume of data in an effective manner have not paralleled the hardware gains. While special-purpose tools for particular applications exist, there is a dearth of useful general-purpose software tools and algorithms which can assist a scientist in exploring large scientific image databases. This paper presents our recent progress in developing interactive semi-automated image database exploration tools based on pattern recognition and machine learning technology. We first present a completed and successful application that illustrates the basic approach: the SKICAT system used for the reduction and analysis of a 3 terabyte astronomical data set. SKICAT integrates techniques from image processing, data classification, and database management. It represents a system in which machine learning played a powerful and enabling role, and solved a difficult, scientifically significant problem. We then proceed to discuss the general problem of automated image database exploration, the particular aspects of image databases which distinguish them from other databases, and how this impacts the application of off-the-shelf learning algorithms to problems of this nature. A second large image database is used to ground this discussion: Magellan's images of the surface of the planet Venus. The paper concludes with a discussion of current and future challenges.https://authors.library.caltech.edu/records/41grz-rf118Probabilistic independence networks for hidden Markov probability models
https://resolver.caltech.edu/CaltechAUTHORS:SMYnc97
Authors: Smyth, Padhraic; Heckerman, David; Jordan, Michael I.
Year: 1997
DOI: 10.1162/neco.1997.9.2.227
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas, including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper presents a self-contained review of the basic principles of PINs. It is shown that the well-known forward-backward (F-B) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate
the advantages of the general approach.https://authors.library.caltech.edu/records/925sy-akr15Learning to Recognize Volcanoes on Venus
https://resolver.caltech.edu/CaltechAUTHORS:20140730-101721831
Authors: Burl, Michael C.; Asker, Lars; Smyth, Padhraic; Fayyad, Usama; Perona, Pietro; Crumpler, Larry; Aubele, Jayne
Year: 1998
DOI: 10.1023/A:1007400206189
Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian imagery. A machine learning approach is used because it is much easier for geologists to identify examples of volcanoes in the imagery than it is to specify domain knowledge as a set of pixel-level constraints. This approach can also provide portability to other domains without the need for explicit reprogramming; the user simply supplies the system with a new set of training examples. We show how the development of such a system requires a completely different set of skills than are required for applying machine learning to "toy world" domains. This paper discusses important aspects of the application process not commonly encountered in the "toy world" including obtaining labeled training data, the difficulties of working with pixel data, and the automatic extraction of higher-level features.https://authors.library.caltech.edu/records/d3qhn-wd456Gene Expression Clustering with Functional Mixture Models
https://resolver.caltech.edu/CaltechAUTHORS:20160309-105810912
Authors: Chudova, Darya; Hart, Christopher; Mjolsness, Eric; Smyth, Padhraic
Year: 2004
We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course data. Each functional cluster center is a nonlinear combination of solutions of a simple
linear differential equation that describes the change of individual mRNA levels when the synthesis and decay rates are constant. The mixture of continuous time parametric functional forms allows one to (a) account for the heterogeneity in the observed profiles, (b) align the profiles in time by estimating real-valued time shifts, (c) capture the synthesis and decay of mRNA in the course of an experiment, and (d) regularize noisy profiles
by enforcing smoothness in the mean curves. We derive an EM algorithm for estimating the parameters of the model, and apply the proposed approach to the set of cycling genes in yeast. The experiments show consistent improvement in predictive power and within cluster variance compared to regular Gaussian mixtures.https://authors.library.caltech.edu/records/bcdmz-d7x31