Results 1 - 10
of
124
A Systematic Comparison of Various Statistical Alignment Models
- Computational Linguistics
, 2003
"... this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods in ..."
Abstract
-
Cited by 805 (22 self)
- Add to MetaCart
this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods introduced by Brown, Della Pietra, Della Pietra, and Mercer (1993) by using refined statistical models for the translation process. The basic idea of this approach is to develop a model of the translation process with the word alignment as a hidden variable of this process, to apply statistical estimation theory to compute the "optimal" model parameters, and to perform alignment search to compute the best word alignment
An interactive clustering-based approach to integrating source query interfaces on the deep web
- In SIGMOD
, 2004
"... An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these s ..."
Abstract
-
Cited by 73 (14 self)
- Add to MetaCart
An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these sources, we consider the integration of their query interfaces. More specifically, we focus on the crucial step of the integration: accurately matching the interfaces. While the integration of query interfaces has received more attentions recently, current approaches are not sufficiently general: (a) they all model interfaces with flat schemas; (b) most of them only consider 1:1 mappings of fields over the interfaces; (c) they all perform the integration in a blackbox-like fashion and the whole process has to be restarted from scratch if anything goes wrong; and (d) they often require laborious parameter tuning. In this paper, we propose an interactive, clustering-based approach to matching query interfaces. The hierarchical nature of interfaces is captured with ordered trees. Varied types of complex mappings of fields are examined and several approaches are proposed to effectively identify these mappings. We put the human integrator back in the loop and propose several novel approaches to the interactive learning of parameters and the resolution of uncertain mappings. Extensive experiments are conducted and results show that our approach is highly effective. 1.
A discriminative matching approach to word alignment
- In Proceedings of HLT-EMNLP
, 2005
"... We present a discriminative, largemargin approach to feature-based matching for word alignment. In this framework, pairs of word tokens receive a matching score, which is based on features of that pair, including measures of association between the words, distortion between their positions, similari ..."
Abstract
-
Cited by 64 (5 self)
- Add to MetaCart
We present a discriminative, largemargin approach to feature-based matching for word alignment. In this framework, pairs of word tokens receive a matching score, which is based on features of that pair, including measures of association between the words, distortion between their positions, similarity of the orthographic form, and so on. Even with only 100 labeled training examples and simple features which incorporate counts from a large unlabeled corpus, we achieve AER performance close to IBM Model 4, in much less time. Including Model 4 predictions as features, we achieve a relative AER reduction of 22 % in over intersected Model 4 alignments. 1
Magnetic resonance image tissue classification using a partial volume model
- NEUROIMAGE
, 2001
"... We describe a sequence of low-level operations to isolate and classify brain tissue within T1-weighted magnetic resonance images (MRI). Our method first removes nonbrain tissue using a combination of anisotropic diffusion filtering, edge detection, and mathematical morphology. We compensate for imag ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
We describe a sequence of low-level operations to isolate and classify brain tissue within T1-weighted magnetic resonance images (MRI). Our method first removes nonbrain tissue using a combination of anisotropic diffusion filtering, edge detection, and mathematical morphology. We compensate for image nonuniformities due to magnetic field inhomogeneities by fitting a tricubic B-spline gain field to local estimates of the image nonuniformity spaced throughout the MRI volume. The local estimates are computed by fitting a partial volume tissue measurement model to histograms of neighborhoods about each estimate point. The measurement model uses mean tissue intensity and noise variance values computed from the global image and a multiplicative bias parameter that is estimated for each region during the histogram fit. Voxels in the intensity-normalized image are then classified into six tissue types using a maximum a posteriori classifier. This classifier combines the partial volume tissue measurement model with a Gibbs prior that models the spatial properties of the brain. We validate each stage of our algorithm on real and phantom data. Using data from the 20 normal MRI brain data sets of the Internet Brain Segmentation Repository, our method achieved average � indices of ��0.746 � 0.114 for gray matter (GM) and ��0.798 � 0.089 for white matter (WM) compared to expert labeled data. Our method achieved average � indices �� 0.893 � 0.041 for GM and ��0.928 � 0.039 for WM compared to the ground truth labeling on 12 volumes from the Montreal Neurological Institute’s BrainWeb phantom.
Simultaneous Truth and Performance Level Estimation (STAPLE): An Algorithm for the Validation of Image Segmentation
- IEEE TRANS. MED. IMAG
, 2004
"... Characterizing the performance of image segmentation approaches has been a persistent challenge. Performance analysis is important since segmentation algorithms often have limited accuracy and precision. Interactive drawing of the desired segmentation by human raters has often been the only acceptab ..."
Abstract
-
Cited by 54 (4 self)
- Add to MetaCart
Characterizing the performance of image segmentation approaches has been a persistent challenge. Performance analysis is important since segmentation algorithms often have limited accuracy and precision. Interactive drawing of the desired segmentation by human raters has often been the only acceptable approach, and yet suffers from intrarater and inter-rater variability. Automated algorithms have been sought in order to remove the variability introduced by raters, but such algorithms must be assessed to ensure they are suitable for the task. The performance of raters...
Inter-Coder Agreement for Computational Linguistics
- COMPUTATIONAL LINGUISTICS
, 2008
"... This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in Computational Linguistics, may be more appropriate for many corpus annotation tasks – but that their use makes the interpretation of the value of the coefficient even harder.
Brainsuite: An automated cortical surface identification tool
- Med. Image Anal
, 2002
"... We describe a new magnetic resonance (MR) image analysis tool that produces cortical surface representations with spherical topology from MR images of the human brain. The tool provides a sequence of low-level operations in a single package that can produce accurate brain segmentations in clinical t ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
We describe a new magnetic resonance (MR) image analysis tool that produces cortical surface representations with spherical topology from MR images of the human brain. The tool provides a sequence of low-level operations in a single package that can produce accurate brain segmentations in clinical time. The tools include skull and scalp removal, image nonuniformity compensation, voxel-based tissue classification, topological correction, rendering, and editing functions. The collection of tools is designed to require minimal user interaction to produce cortical representations. In this paper we describe the theory of each stage of the cortical surface identification process. We then present classification validation results using real and phantom data. We also present a study of interoperator variability.
Manual annotation of translational equivalence: The Blinker project
, 1998
"... Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the researchcommunity from ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
Bilingual annotators were paid to link roughly sixteen thousand corresponding words between on-line versions of the Bible in modern French and modern English. These annotations are freely available to the researchcommunity from
The pyramid method: incorporating human content selection variation in summarization evaluation
- ACM Transactions on Speech and Language Processing
, 2007
"... Human variation in content selection in summarization has given rise to some fundamental research questions: How can one incorporate the observed variation in suitable evaluation measures? How can such measures reflect the fact that summaries conveying different content can be equally good and infor ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
Human variation in content selection in summarization has given rise to some fundamental research questions: How can one incorporate the observed variation in suitable evaluation measures? How can such measures reflect the fact that summaries conveying different content can be equally good and informative? In this paper we address these very questions by proposing a method for analysis of multiple human abstracts into semantic content units. Such analysis allows us not only to quantify human variation in content selection, but also to assign empirical importance weight to different content units. It serves as the basis for an evaluation method, the Pyramid Method, that incorporates the observed variation and is predictive of different equally informative summaries. We discuss the reliability of content unit annotation, the properties of Pyramid scores, and their correlation with other evaluation methods.

