Results 1 - 10
of
2,308
A Program for Aligning Sentences in Bilingual Corpora
, 1993
"... This paper will describe a method and a program (align) for aligning sentences based on a simple statistical model of character lengths. The program uses the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend ..."
Abstract
-
Cited by 529 (5 self)
- Add to MetaCart
(Show Context)
This paper will describe a method and a program (align) for aligning sentences based on a simple statistical model of character lengths. The program uses the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend to be translated into shorter sentences. A probabilistic score is assigned to each proposed correspondence of sentences, based on the scaled difference of lengths of the two sentences (in characters) and the variance of this difference. This probabilistic score is used in a dynamic programming framework to find the maximum likelihood alignment of sentences. It is remarkable that such a simple approach works as well as it does. An evaluation was performed based on a trilingual corpus of economic reports issued by the Union Bank of Switzerland (UBS) in English, French, and German. The method correctly aligned all but 4% of the sentences. Moreover, it is possible to extract a large subcorpus that has a much smaller error rate. By selecting the best-scoring 80% of the alignments, the error rate is reduced from 4% to 0.7%. There were more errors on the English-French subcorpus than on the English-German subcorpus, showing that error rates will depend on the corpus considered; however, both were small enough to hope that the method will be useful for many language pairs. To further research on bilingual corpora, a much larger sample of Canadian Hansards (approximately 90 million words, half in English and and half in French) has been aligned with the align program and will be available through the Data Collection Initiative of the Association for Computational Linguistics (ACL/DCI). In addition, in order to facilitate replication of the align program, an appendix is provided with ...
A review of methods for the assessment of prediction errors in conservation presence/absence models.
- Environmental Conservation
, 1997
"... Summary Predicting the distribution of endangered species from habitat data is frequently perceived to be a useful technique. Models that predict the presence or absence of a species are normally judged by the number of prediction errors. These may be of two types: false positives and false negativ ..."
Abstract
-
Cited by 463 (1 self)
- Add to MetaCart
Summary Predicting the distribution of endangered species from habitat data is frequently perceived to be a useful technique. Models that predict the presence or absence of a species are normally judged by the number of prediction errors. These may be of two types: false positives and false negatives. Many of the prediction errors can be traced to ecological processes such as unsaturated habitat and species interactions. Consequently, if prediction errors are not placed in an ecological context the results of the model may be misleading. The simplest, and most widely used, measure of prediction accuracy is the number of correctly classified cases. There are other measures of prediction success that may be more appropriate. Strategies for assessing the causes and costs of these errors are discussed. A range of techniques for measuring error in presence/absence models, including some that are seldom used by ecologists (e.g. ROC plots and cost matrices), are described. A new approach to estimating prediction error, which is based on the spatial characteristics of the errors, is proposed. Thirteen recommendations are made to enable the objective selection of an error assessment technique for ecological presence/absence models.
Decision Combination in Multiple Classifier Systems
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 16. NO. I. JANUARY 1994
, 1994
"... A multiple classifier system is a powerful solution to difficult pattern recognition problems involving large class sets and noisy input because it allows simultaneous use of arbitrary feature descriptors and classification procedures. Decisions by the classifiers can be represented as rankings of ..."
Abstract
-
Cited by 377 (5 self)
- Add to MetaCart
A multiple classifier system is a powerful solution to difficult pattern recognition problems involving large class sets and noisy input because it allows simultaneous use of arbitrary feature descriptors and classification procedures. Decisions by the classifiers can be represented as rankings of classes so that they are comparable across different types of classifiers and different instances of a problem. The rankings can be combined by methods that either reduce or rerank a given set of classes. An intersection method and a union method are proposed for class set reduction. Three methods based on the highest rank, the Borda count, and logistic regression are proposed for class set reranking. These methods have been tested in applications on degraded machine-printed characters and words from large lexicons, resulting in substantial improvement in overall correctness.
Metaanalysis of theory-of-mind development. The truth about false belief.
- Child Development,
, 2001
"... Research on theory of mind increasingly encompasses apparently contradictory findings. In particular, in initial studies, older preschoolers consistently passed false-belief tasks-a so-called "definitive" test of mentalstate understanding-whereas younger children systematically erred. Mor ..."
Abstract
-
Cited by 280 (7 self)
- Add to MetaCart
(Show Context)
Research on theory of mind increasingly encompasses apparently contradictory findings. In particular, in initial studies, older preschoolers consistently passed false-belief tasks-a so-called "definitive" test of mentalstate understanding-whereas younger children systematically erred. More recent studies, however, have found evidence of false-belief understanding in 3-year-olds or have demonstrated conditions that improve children's performance. A meta-analysis was conducted ( N ϭ 178 separate studies) to address the empirical inconsistencies and theoretical controversies. When organized into a systematic set of factors that vary across studies, false-belief results cluster systematically with the exception of only a few outliers. A combined model that included age, country of origin, and four task factors (e.g., whether the task objects were transformed in order to deceive the protagonist or not) yielded a multiple R of .74 and an R 2 of .55; thus, the model accounts for 55% of the variance in false-belief performance. Moreover, false-belief performance showed a consistent developmental pattern, even across various countries and various task manipulations: preschoolers went from below-chance performance to above-chance performance. The findings are inconsistent with early competence proposals that claim that developmental changes are due to tasks artifacts, and thus disappear in simpler, revised false-belief tasks; and are, instead, consistent with theoretical accounts that propose that understanding of belief, and, relatedly, understanding of mind, exhibit genuine conceptual change in the preschool years.
Maximum Entropy Models for Natural Language Ambiguity Resolution
, 1998
"... The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope th ..."
Abstract
-
Cited by 234 (1 self)
- Add to MetaCart
The best aspect of a research environment, in my opinion, is the abundance of bright people with whom you argue, discuss, and nurture your ideas. I thank all of the people at Penn and elsewhere who have given me the feedback that has helped me to separate the good ideas from the bad ideas. I hope that Ihave kept the good ideas in this thesis, and left the bad ideas out! Iwould like toacknowledge the following people for their contribution to my education: I thank my advisor Mitch Marcus, who gave me the intellectual freedom to pursue what I believed to be the best way to approach natural language processing, and also gave me direction when necessary. I also thank Mitch for many fascinating conversations, both personal and professional, over the last four years at Penn. I thank all of my thesis committee members: John La erty from Carnegie Mellon University, Aravind Joshi, Lyle Ungar, and Mark Liberman, for their extremely valuable suggestions and comments about my thesis research. I thank Mike Collins, Jason Eisner, and Dan Melamed, with whom I've had many stimulating and impromptu discussions in the LINC lab. Iowe them much gratitude for their valuable feedback onnumerous rough drafts of papers and thesis chapters.
Exploring the Relationships between Design Measures and Software Quality in Object-Oriented Systems
, 1998
"... The first goal of this paper is to empirically explore the relationships between existing object-oriented coupling, cohesion, and inheritance measures and the probability of fault detection in system classes during testing. In other words, we wish to better understand the relationship between exi ..."
Abstract
-
Cited by 188 (10 self)
- Add to MetaCart
The first goal of this paper is to empirically explore the relationships between existing object-oriented coupling, cohesion, and inheritance measures and the probability of fault detection in system classes during testing. In other words, we wish to better understand the relationship between existing design measurement in OO systems and the quality of the software developed. The second goal is to propose an investigation and analysis strategy to make these kind of studies more repeatable and comparable, a problem which is pervasive in the literature on quality measurement. Results show that many of the measures capture similar dimensions in the data set, thus reflecting the fact that many of them are based on similar principles and hypotheses. However, it is shown that by using a subset of measures, accurate models can be built to predict which classes contain most of the existing faults. When predicting fault-prone classes, the best model shows a percentage of correct clas...
A New Approach to Measuring Financial Contagion
, 2002
"... contagion captures the co-incidence of extreme return shocks across countries within a region the extent of contagion, its economic significance, and its determinants using a multinomial 1990s, we find that contagion, when measured by the co-incidence within and across regions of changes, and condit ..."
Abstract
-
Cited by 185 (4 self)
- Add to MetaCart
contagion captures the co-incidence of extreme return shocks across countries within a region the extent of contagion, its economic significance, and its determinants using a multinomial 1990s, we find that contagion, when measured by the co-incidence within and across regions of changes, and conditional stock return volatility. Evidence that contagion is stronger for extreme September 2001 Associate Professor, College of Business Administration, Korea University ** Professor of Finance and Dean's Distinguished Research Professor, Fisher College of Business, Ohio State University *** Professor of Finance and Reese Chair of Banking and Monetary Economics, Fisher College of Business, Ohio State University, Research Associate, National Bureau of Economic Research. The authors are grateful to the Dice Center for Research on Financial Economics for support. We thank Tom Santner, Mark Berliner, Bob Leone, and Stan Lemeshow for useful discussions on methodology, Steve Cecchetti, Peter Christoffersen, Craig Doidge, Barry Eichengreen, Vihang Errunza, David Hirshleifer, Roberto Rigobon, Richard Roll, Karen Wruck and, especially, an anonymous referee and the editor, Cam Harvey, for comments. Comments from seminar participants at Hong Kong University of Science and Technology, Korea University, McGill University, Yale University, Michigan State University, Universiteit Maasstricht, Ohio State University, Rice University, Monte Verita Risk Management Conference (Ascona, Switzerland), Federal Reserve Bank of Chicago Annual Conference on Bank Structure and Competition, Global Investment Conference on International Investing (Whistler), and the NYSE Conference on Global Equity Markets in Transition (Hawaii) improved the paper.
Identification of regulatory regions which confer muscle-specific gene expression
- J. Mol. Biol
, 1998
"... For many newly sequenced genes, sequence analysis of the putative protein yields no clue on function. It would be bene®cial to be able to identify in the genome the regulatory regions that confer temporal and spatial expression patterns for the uncharacterized genes. Additionally, it would be advant ..."
Abstract
-
Cited by 181 (13 self)
- Add to MetaCart
(Show Context)
For many newly sequenced genes, sequence analysis of the putative protein yields no clue on function. It would be bene®cial to be able to identify in the genome the regulatory regions that confer temporal and spatial expression patterns for the uncharacterized genes. Additionally, it would be advantageous to identify regulatory regions within genes of known expression pattern without performing the costly and time consuming laboratory studies now required. To achieve these goals, the wealth of case studies performed over the past 15 years will have to be collected into predictive models of expression. Extensive studies of genes expressed in skeletal muscle have identi®ed speci®c transcription factors which bind to regulatory elements to control gene expression. However, potential binding sites for these factors occur with suf®cient frequency that it is rare for a gene to be found without one. Analysis of experimentally determined muscle regulatory sequences indicates that muscle expression requires multiple elements in close proximity. A model is generated with predictive capability for identifying these muscle-speci®c regulatory modules. Phylogenetic footprinting, the identi®cation of sequences conserved between distantly related species, complements the statistical predictions. Through the use of logistic regression analysis, the model promises to be easily modi®ed to take advantage of the elucidation of additional factors, cooperation rules, and spacing constraints.
A tutorial on MM algorithms
- Amer. Statist
, 2004
"... Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function ..."
Abstract
-
Cited by 154 (6 self)
- Add to MetaCart
Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the loglikelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to part of the standard toolkit of professional statisticians. The current article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition to surveying previous work on MM algorithms, this article introduces some new material on constrained optimization and standard error estimation. Key words and phrases: constrained optimization, EM algorithm, majorization, minorization, Newton-Raphson 1 1