Results 1  10
of
522
Flexible smoothing with Bsplines and penalties
 Statistical Science
, 1996
"... Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots ..."
Abstract

Cited by 236 (6 self)
 Add to MetaCart
Bsplines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent Bsplines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of Bsplines, their construction, and penalized likelihood is presented. We discuss properties of penalized Bsplines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented. Keywords: Generalized linear models, smoothing, nonparametric models, splines, density estimation. Address for correspondence: DCMR Milieudienst Rijnmond, 'sGravelandse...
Probabilistic independence networks for hidden Markov probability models
, 1996
"... Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been develop ..."
Abstract

Cited by 173 (12 self)
 Add to MetaCart
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a selfcontained review of the basic principles of PINs. It is shown that the wellknown forwardbackward (FB) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.
WordSense Disambiguation Using Decomposable Models
 In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics
, 1994
"... Most probabilistic classifiers used for wordsense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. In this paper, a different approach to formulating a probabili ..."
Abstract

Cited by 142 (19 self)
 Add to MetaCart
Most probabilistic classifiers used for wordsense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. In this paper, a different approach to formulating a probabilistic model is presented along with a case study of the performance of models produced in this manner for the disambiguafion of the noun interest. We describe a method for formulating probabilistic models that use multiple contextual features for wordsense disambiguafion, without requiring untested assumptions regarding the form of the model. Using this approach, the joint distribution of all variables is described by only the most systematic variable interactions, thereby limiting the number of parameters to be estimated, supporting computational efficiency, and providing an understanding of the data.
Pruning Adaptive Boosting
, 1997
"... The boosting algorithm AdaBoost, developed by Freund and Schapire, has exhibited outstanding performance on several benchmark problems when using C4.5 as the "weak" algorithm to be "boosted." Like other ensemble learning approaches, AdaBoost constructs a composite hypothesis by v ..."
Abstract

Cited by 114 (2 self)
 Add to MetaCart
The boosting algorithm AdaBoost, developed by Freund and Schapire, has exhibited outstanding performance on several benchmark problems when using C4.5 as the "weak" algorithm to be "boosted." Like other ensemble learning approaches, AdaBoost constructs a composite hypothesis by voting many individual hypotheses. In practice, the large amount of memory required to store these hypotheses can make ensemble methods hard to deploy in applications. This paper shows that by selecting a subset of the hypotheses, it is possible to obtain nearly the same levels of performance as the entire set. The results also provide some insight into the behavior of AdaBoost.
Detecting group differences: Mining contrast sets
 Data Mining and Knowledge Discovery
, 2001
"... A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mini ..."
Abstract

Cited by 84 (3 self)
 Add to MetaCart
A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mining contrast sets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the computational complexity. Once the contrast sets are found, we postprocess the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.
Discoverydriven Exploration of OLAP Data Cubes
 In Proc. Int. Conf. of Extending Database Technology (EDBT'98
, 1998
"... . Analysts predominantly use OLAP data cubes to identify regions of anomalies that may represent problem areas or new opportunities. The current OLAP systems support hypothesisdriven exploration of data cubes through operations such as drilldown, rollup, and selection. Using these operations, an ..."
Abstract

Cited by 84 (2 self)
 Add to MetaCart
. Analysts predominantly use OLAP data cubes to identify regions of anomalies that may represent problem areas or new opportunities. The current OLAP systems support hypothesisdriven exploration of data cubes through operations such as drilldown, rollup, and selection. Using these operations, an analyst navigates unaided through a huge search space looking at large number of values to spot exceptions. We propose a new discoverydriven exploration paradigm that mines the data for such exceptions and summarizes the exceptions at appropriate levels in advance. It then uses these exceptions to lead the analyst to interesting regions of the cube during navigation. We present the statistical foundation underlying our approach. We then discuss the computational issue of finding exceptions in data and making the process efficient on large multidimensional data bases. 1 Introduction OnLine Analytical Processing (OLAP) characterizes the operations of summarizing, consolidating, viewing, a...
Information Geometry on Hierarchy of Probability Distributions
, 2001
"... An exponential family or mixture family of probability distributions has a natural hierarchical structure. This paper gives an “orthogonal” decomposition of such a system based on information geometry. A typical example is the decomposition of stochastic dependency among a number of random variables ..."
Abstract

Cited by 73 (6 self)
 Add to MetaCart
An exponential family or mixture family of probability distributions has a natural hierarchical structure. This paper gives an “orthogonal” decomposition of such a system based on information geometry. A typical example is the decomposition of stochastic dependency among a number of random variables. In general, they have a complex structure of dependencies. Pairwise dependency is easily represented by correlation, but it is more difficult to measure effects of pure triplewise or higher order interactions (dependencies) among these variables. Stochastic dependency is decomposed quantitatively into an “orthogonal” sum of pairwise, triplewise, and further higher order dependencies. This gives a new invariant decomposition of joint entropy. This problem is important for extracting intrinsic interactions in firing patterns of an ensemble of neurons and for estimating its functional connections. The orthogonal decomposition is given in a wide class of hierarchical structures including both exponential and mixture families. As an example, we decompose the dependency in a higher order Markov chain into a sum of those in various lower order Markov chains.
Advanced Methods For Record Linkage
, 1994
"... s Service. The study showed that the fewest errors typically occur at the beginning of a string and the error rates by character position increase monotonically as the position moves to the right. The enhancement basically consisted of adjusting the string comparator value upward by a fixed amount i ..."
Abstract

Cited by 69 (15 self)
 Add to MetaCart
s Service. The study showed that the fewest errors typically occur at the beginning of a string and the error rates by character position increase monotonically as the position moves to the right. The enhancement basically consisted of adjusting the string comparator value upward by a fixed amount if the first four characters agreed; by lesser amounts if the first three, two, or one characters agreed. The string comparator examined by Budzkinsky (1991) consisted of the Jaro comparator with only the Winkler enhancement. The final enhancement due to Lynch and Winkler (1994) adjusts the string comparator value if the strings are longer than six characters and more than half the characters beyond the first four 4 agree. The final enhancement was based on detailed comparisons between versions of the comparator. The comparisons involved tens of thousands of pairs of last names, first names, and street names that did not agree on a characterbycharacter basis but were associated with truly...