Article records
https://feeds.library.caltech.edu/people/Smyth-P/article.rss
A Caltech Library Repository Feedhttp://www.rssboard.org/rss-specificationpython-feedgenenThu, 30 Nov 2023 18:30:52 +0000Decision tree design from a communication theory standpoint
https://resolver.caltech.edu/CaltechAUTHORS:20190314-130609598
Authors: Goodman, Rodney M.; Smyth, Padhraic
Year: 1988
DOI: 10.1109/18.21221
A communication theory approach to decision tree design based on a top-town mutual information algorithm is presented. It is shown that this algorithm is equivalent to a form of Shannon-Fano prefix coding, and several fundamental bounds relating decision-tree parameters are derived. The bounds are used in conjunction with a rate-distortion interpretation of tree design to explain several phenomena previously observed in practical decision-tree design. A termination rule for the algorithm called the delta-entropy rule is proposed that improves its robustness in the presence of noise. Simulation results are presented, showing that the tree classifiers derived by the algorithm compare favourably to the single nearest neighbour classifier.https://authors.library.caltech.edu/records/m6j47-r4a59An information theoretic approach to rule induction from databases
https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127061
Authors: Smyth, Padhraic; Goodman, Rodney M.
Year: 1992
DOI: 10.1109/69.149926
The knowledge acquisition bottleneck in obtaining
rules directly from an expert is well known. Hence, the problem
of automated rule acquisition from data is a well-motivated one,
particularly for domains where a database of sample data exists.
In this paper we introduce a novel algorithm for the induction
of rules from examples. The algorithm is novel in the sense
that it not only learns rules for a given concept (classification),
but it simultaneously learns rules relating multiple concepts.
This type of learning, known as generalized rule induction is
considerably more general than existing algorithms which tend
to be classification oriented. Initially we focus on the problem of
determining a quantitative, well-defined rule preference measure.
In particular, we propose a quantity called the J-measure as
an information theoretic alternative to existing approaches. The
J-measure quantifies the information content of a rule or a
hypothesis. We will outline the information theoretic origins
of this measure and examine its plausibility as a hypothesis
preference measure. We then define the ITRULE algorithm which
uses the newly proposed measure to learn a set of optimal rules
from a set of data samples, and we conclude the paper with an
analysis of experimental results on real-world data.https://authors.library.caltech.edu/records/a0g2j-sr926Rule-based neural networks for classification and probability estimation
https://resolver.caltech.edu/CaltechAUTHORS:GOOnc92
Authors: Goodman, Rodney M.; Higgins, Charles M.; Miller, John W.; Smyth, Padhraic
Year: 1992
DOI: 10.1162/neco.1992.4.6.781
In this paper we propose a network architecture that combines a rule-based approach with that of the neural network paradigm. Our primary motivation for this is to ensure that the knowledge embodied in the network is explicitly encoded in the form of understandable rules. This enables the network's decision to be understood, and provides an audit trail of how that decision was arrived at. We utilize an information theoretic approach to learning a model of the domain knowledge from examples. This model takes the form of a set of probabilistic conjunctive rules between discrete input evidence variables and output class variables. These rules are then mapped onto the weights and nodes of a feedforward neural network resulting in a directly specified architecture. The network acts as parallel Bayesian classifier, but more importantly, can also output posterior probability estimates of the class variables. Empirical tests on a number of data sets show that the rule-based classifier performs comparably with standard neural network classifiers, while possessing unique advantages in terms of knowledge representation and probability estimation.https://authors.library.caltech.edu/records/w6w9f-eaa23On loss functions which minimize to conditional expected values and posterior probabilities
https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127224
Authors: Miller, John W.; Goodman, Rod; Smyth, Padhraic
Year: 1993
DOI: 10.1109/18.243457
A loss function, or objective function, is a function used to compare parameters when fitting a model to data. The loss function gives a distance between the model output and the desired output. Two common examples are the squared-error loss function and the cross entropy loss function. Minimizing the mean-square error loss function is equivalent to minimizing the mean square difference between the model output and the expected value of the output given a particular input. This property of minimization to the expected value is formalized as P-admissibility. The necessary and sufficient conditions for P-admissibility, leading to a parametric description of all P-admissible loss functions, are found. In particular, it is shown that two of the simplest members of this class of functions are the squared error and the cross entropy loss functions. One application of this work is in the choice of a loss function for training neural networks to provide probability estimates.https://authors.library.caltech.edu/records/59wem-6hd53Learning finite state machines with self-clustering recurrent networks
https://resolver.caltech.edu/CaltechAUTHORS:ZENnc93
Authors: Zeng, Zheng; Goodman, Rodney M.; Smyth, Padhraic
Year: 1993
DOI: 10.1162/neco.1993.5.6.976
Recent work has shown that recurrent neural networks have the ability to learn finite state automata from examples. In particular, networks using second-order units have been successful at this task. In studying the performance and learning behavior of such networks we have found that the second-order network model attempts to form clusters in activation space as its internal representation of states. However, these learned states become unstable as longer and longer test input strings are presented to the network. In essence, the network "forgets" where the individual states are in activation space. In this paper we propose a new method to force such a network to learn stable states by introducing discretization into the network and using a pseudo-gradient learning rule to perform training. The essence of the learning rule is that in doing gradient descent, it makes use of the gradient of a sigmoid function as a heuristic hint in place of that of the hard-limiting function, while still using the discretized value in the feedback update path. The new structure uses isolated points in activation space instead of vague clusters as its internal representation of states. It is shown to have similar capabilities in learning finite state automata as the original network, but without the instability problem. The proposed pseudo-gradient learning rule may also be used as a basis for training other types of networks that have hard-limiting threshold activation functions.https://authors.library.caltech.edu/records/vkqth-zve91Discrete recurrent neural networks for grammatical inference
https://resolver.caltech.edu/CaltechAUTHORS:20190315-142359688
Authors: Zeng, Zheng; Goodman, Rodney M.; Smyth, Padhraic
Year: 1994
DOI: 10.1109/72.279194
We describe a novel neural architecture for learning deterministic context-free grammars, or equivalently, deterministic pushdown automata. The unique feature of the proposed network is that it forms stable state representations during learning-previous work has shown that conventional analog recurrent networks can be inherently unstable in that they cannot retain their state memory for long input strings. We have recently introduced the discrete recurrent network architecture for learning finite-state automata. Here we extend this model to include a discrete external stack with discrete symbols. A composite error function is described to handle the different situations encountered in learning. The pseudo-gradient learning method (introduced in previous work) is in turn extended for the minimization of these error functions. Empirical trials validating the effectiveness of the pseudo-gradient learning method are presented, for networks both with and without an external stack. Experimental results show that the new networks are successful in learning some simple pushdown automata, though overfitting and non-convergent learning can also occur. Once learned, the internal representation of the network is provably stable; i.e., it classifies unseen strings of arbitrary length with 100% accuracy.https://authors.library.caltech.edu/records/86tx1-2nb90Automated analysis and exploration of image databases: Results, progress, and challenges
https://resolver.caltech.edu/CaltechAUTHORS:20190723-074142687
Authors: Fayyad, Usama M.; Smyth, Padhraic; Weir, Nicholas; Djorgovski, S.
Year: 1995
DOI: 10.1007/bf00962819
In areas as diverse as earth remote sensing, astronomy, and medical imaging, image acquisition technology has undergone tremendous improvements in recent years. The vast amounts of scientific data are potential treasure-troves for scientific investigation and analysis. Unfortunately, advances in our ability to deal with this volume of data in an effective manner have not paralleled the hardware gains. While special-purpose tools for particular applications exist, there is a dearth of useful general-purpose software tools and algorithms which can assist a scientist in exploring large scientific image databases. This paper presents our recent progress in developing interactive semi-automated image database exploration tools based on pattern recognition and machine learning technology. We first present a completed and successful application that illustrates the basic approach: the SKICAT system used for the reduction and analysis of a 3 terabyte astronomical data set. SKICAT integrates techniques from image processing, data classification, and database management. It represents a system in which machine learning played a powerful and enabling role, and solved a difficult, scientifically significant problem. We then proceed to discuss the general problem of automated image database exploration, the particular aspects of image databases which distinguish them from other databases, and how this impacts the application of off-the-shelf learning algorithms to problems of this nature. A second large image database is used to ground this discussion: Magellan's images of the surface of the planet Venus. The paper concludes with a discussion of current and future challenges.https://authors.library.caltech.edu/records/41grz-rf118Probabilistic independence networks for hidden Markov probability models
https://resolver.caltech.edu/CaltechAUTHORS:SMYnc97
Authors: Smyth, Padhraic; Heckerman, David; Jordan, Michael I.
Year: 1997
DOI: 10.1162/neco.1997.9.2.227
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas, including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper presents a self-contained review of the basic principles of PINs. It is shown that the well-known forward-backward (F-B) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate
the advantages of the general approach.https://authors.library.caltech.edu/records/925sy-akr15Learning to Recognize Volcanoes on Venus
https://resolver.caltech.edu/CaltechAUTHORS:20140730-101721831
Authors: Burl, Michael C.; Asker, Lars; Smyth, Padhraic; Fayyad, Usama; Perona, Pietro; Crumpler, Larry; Aubele, Jayne
Year: 1998
DOI: 10.1023/A:1007400206189
Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian imagery. A machine learning approach is used because it is much easier for geologists to identify examples of volcanoes in the imagery than it is to specify domain knowledge as a set of pixel-level constraints. This approach can also provide portability to other domains without the need for explicit reprogramming; the user simply supplies the system with a new set of training examples. We show how the development of such a system requires a completely different set of skills than are required for applying machine learning to "toy world" domains. This paper discusses important aspects of the application process not commonly encountered in the "toy world" including obtaining labeled training data, the difficulties of working with pixel data, and the automatic extraction of higher-level features.https://authors.library.caltech.edu/records/d3qhn-wd456