Results 11  20
of
2,822
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirtythree Old and New Classification Algorithms
, 2000
"... . Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract

Cited by 225 (8 self)
 Add to MetaCart
(Show Context)
. Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, splinebased, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although splinebased statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
A longitudinal study of engineering student performance and retention. I. Success and failure in the introductory course
 J. Engr. Education
, 1993
"... A cohort of chemical engineering students has been taught in an experimental sequence of five chemical engineering courses, beginning with the introductory course in the Fall 1990 semester. Differences in academic performance have been observed between students from rural and small town backgrounds ..."
Abstract

Cited by 179 (11 self)
 Add to MetaCart
A cohort of chemical engineering students has been taught in an experimental sequence of five chemical engineering courses, beginning with the introductory course in the Fall 1990 semester. Differences in academic performance have been observed between students from rural and small town backgrounds (“rural students, ” N=55) and students from urban and suburban backgrounds (“urban students, ” N=65), with the urban students doing better on almost every measure investigated. In the introductory course, 80% of the urban students and 55 % of the rural students passed with a grade of C or better, with average grades of 2.63 for the urban students and 1.80 for the rural students (A=4.0). The urban group continued to earn higher grades in subsequent chemical engineering courses. After four years, 79 % of the urban students and 64 % of the rural students had graduated or were still enrolled in chemical engineering; the others had either transferred out of engineering or were no longer attending the university. This paper presents data on the students ’ home and school backgrounds and speculates on possible causes of observed performance differences between the two populations. * Journal of Engineering Education, 83(3), 209–217 (1994). Charts in the published version have been converted to
Clustering categorical data: An approach based on dynamical systems
, 1998
"... We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data, ” we mean tables with fields that cannot be naturally ordered by a metric e.g., the names of producers of automobiles, or the names of product ..."
Abstract

Cited by 176 (1 self)
 Add to MetaCart
We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data, ” we mean tables with fields that cannot be naturally ordered by a metric e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the cooccurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of nonlinear dynamical systems. We discuss experiments on a variety of tables of synthetic and real data; we find that our iterative methods converge quickly to prominently correlated values of various categorical fields.
Word sense disambiguation using a second language monolingual corpus
 COMPUTATIONAL LINGUISTICS
, 1994
"... This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of targe ..."
Abstract

Cited by 166 (1 self)
 Add to MetaCart
This paper presents a new approach for resolving lexical ambiguities in one language using statistical data from a monolingual corpus of another language. This approach exploits the differences between mappings of words to senses in different languages. The paper concentrates on the problem of target word selection in machine translation, for which the approach is directly applicable. The presented algorithm identifies syntactic relations between words, using a source language parser, and maps the alternative interpretations of these relations to the target language, using a bilingual lexicon. The preferred senses are then selected according to statistics on lexical relations in the target language. The selection is based on a statistical model and on a constraint propagation algorithm, which simultaneously handles all ambiguities in the sentence. The method was evaluated using three sets of Hebrew and German examples and was found to be very useful for disambiguation. The paper includes a detailed comparative analysis of statistical sense disambiguation methods.
Mplus: Statistical Analysis with Latent Variables (Version 4.21) [Computer software
, 2007
"... Chapter 3: Regression and path analysis 19 Chapter 4: Exploratory factor analysis 43 ..."
Abstract

Cited by 162 (0 self)
 Add to MetaCart
Chapter 3: Regression and path analysis 19 Chapter 4: Exploratory factor analysis 43
Unbiased recursive partitioning: a conditional inference framework
 J. Comput. Graph. Statist
, 2006
"... Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While ..."
Abstract

Cited by 154 (12 self)
 Add to MetaCart
(Show Context)
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of treestructured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds treestructured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are reanalyzed.
Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data
, 2004
"... This chapter gives an overview of recent advances in latent variable analysis. Emphasis is placed on the strength of modeling obtained by using a flexible combination of continuous and categorical latent variables. ..."
Abstract

Cited by 152 (16 self)
 Add to MetaCart
(Show Context)
This chapter gives an overview of recent advances in latent variable analysis. Emphasis is placed on the strength of modeling obtained by using a flexible combination of continuous and categorical latent variables.
Integrating personcentered and variablecentered analyses: Growth mixture modeling
 SECTION V/MODELS FOR LATENT VARIABLES with
, 2000
"... Background: Many alcohol research questions require methods that take a personcentered approach because the interest is in finding heterogeneous groups of individuals, such as those who are susceptible to alcohol dependence and those who are not. A personcentered focus also is useful with longitud ..."
Abstract

Cited by 136 (12 self)
 Add to MetaCart
(Show Context)
Background: Many alcohol research questions require methods that take a personcentered approach because the interest is in finding heterogeneous groups of individuals, such as those who are susceptible to alcohol dependence and those who are not. A personcentered focus also is useful with longitudinal data to represent heterogeneity in developmental trajectories. In alcohol, drug, and mental health research the recognition of heterogeneity has led to theories of multiple developmental pathways. Methods: This paper gives a brief overview of new methods that integrate variable and personcentered analyses. Methods discussed include latent class analysis, latent transition analysis, latent class growth analysis, growth mixture modeling, and general growth mixture modeling. These methods are presented in a general latent variable modeling framework that expands traditional latent variable modeling by including not only continuous latent variables but also categorical latent variables. Results: Four examples that use the National Longitudinal Survey of Youth (NLSY) data are presented to illustrate latent class analysis, latent class growth analysis, growth mixture modeling, and general growth mixture modeling. Latent class analysis of antisocial behavior found four classes. Four heavy drinking trajectory classes were found. The relationship between the latent classes and background variables and consequences was studied.
Ontology Matching: A Machine Learning Approach
 Handbook on Ontologies in Information Systems
, 2003
"... Finally, we describe a set of experiments on several realworld domains, and show that GLUE proposes highly accurate semantic mappings. 1 A Motivating Example: the Semantic Web The current WorldWide Web has well over 1.5 billion pages [2], but the vast majority of them are in humanreadable forma ..."
Abstract

Cited by 128 (2 self)
 Add to MetaCart
(Show Context)
Finally, we describe a set of experiments on several realworld domains, and show that GLUE proposes highly accurate semantic mappings. 1 A Motivating Example: the Semantic Web The current WorldWide Web has well over 1.5 billion pages [2], but the vast majority of them are in humanreadable format only (e.g., HTML). As Work done while the author was at the University of Washington, Seattle 2 AnHai Doan et al. a consequence software agents (softbots) cannot understand and process this information, and much of the potential of the Web has so far remained untapped. In response, researchers have created the vision of the Semantic Web [5], where data has structure and ontologies describe the semantics of the data. When data is marked up using ontologies, softbots can better understand the semantics and therefore more intelligently locate and integrate data for a wide variety of tasks. The following example illustrates the vision of the Semantic Web. Example 1. Suppose you want to fi