## Knowledge lean word-sense disambiguation (1998)

### Cached

### Download Links

- [www.d.umn.edu]
- [www.seas.smu.edu]
- [www.csc.calpoly.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Fifteenth National Conference on Artificial Intelligence |

Citations: | 36 - 5 self |

### BibTeX

@INPROCEEDINGS{Pedersen98knowledgelean,

author = {Ted Pedersen},

title = {Knowledge lean word-sense disambiguation},

booktitle = {In Proceedings of the Fifteenth National Conference on Artificial Intelligence},

year = {1998},

pages = {800--805},

publisher = {AAAI Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a corpus{based approach to word{sense disambiguation that only requires information that can be automatically extracted from untagged text. We use unsupervised techniques to estimate the parameters of a model describing the conditional distribution of the sense group given the known contextual features. Both the EM algorithm and Gibbs Sampling are evaluated to determine which is most appropriate for our data. We compare their disambiguation accuracy in an experiment with thirteen di erent words and three feature sets. Gibbs Sampling results in small but consistent improvement in disambiguation accuracy over the EM algorithm.

### Citations

9193 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

4100 |
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...loped, could be easily applied to other parametric forms in the class of decomposable models. We employ the Expectation Maximization (EM) algorithm (Dempster, Laird, & Rubin 1977) and Gibbs Sampling (=-=Geman & Geman 1984-=-) to estimate model parameters from untagged data. Both are well known and widely used iterative algorithms for estimating model parameters in the presence of missing data; in our case, the missing da... |

355 | Evaluating the accuracy of sampling-based approaches to calculating posterior moments
- Geweke
- 1992
(Show Context)
Citation Context ...early iterations be discarded. This process is commonly known as a "burn--in". We use a 500 iteration burn--in and monitor the following 1000 iterations for convergence using the measure pro=-=posed in (Geweke 1992-=-). If the chains have not converged, then additional iterations are performed until they do. Below we show the general procedure for Gibbs Sampling with the Naive Bayes model. burn in represents the n... |

256 | Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach
- Ng, Lee
- 1996
(Show Context)
Citation Context ...xt) and are selected to produce low dimensional event spaces. Local--context features have been used successfully in a variety of supervised approaches to disambiguation (e.g., (Bruce & Wiebe 1994), (=-=Ng & Lee 1996-=-)). Feature Sets A, B and C The 3 feature sets used in these experiments are designated A, B and C pand are formulated as shown below. The particular feature combinations chosen were found to yield re... |

243 | A method for disambiguating word senses in a large corpus. Computers and the humanities - Gale, Church, et al. - 1992 |

232 |
The EM algorithm for graphical association models with missing data
- Lauritzen
- 1995
(Show Context)
Citation Context ...uced in (Dempster, Laird, & Rubin 1977). The Naive Bayes model is a decomposable model which is a member of the exponential family with special properties that simplify the formulation of the E-step (=-=Lauritzen 1995-=-). The EM algorithm for Naive Bayes proceeds as follows: 1. randomly initialize p(F i jS), set k = 1 2. E--step: count(F i ; S) = p(SjF i ) \Theta count(F i ) 3. M--step: re--estimate p(F i jS) = coun... |

143 | Word sense disambiguation using decomposable models
- Bruce, Wiebe
- 1994
(Show Context)
Citation Context ...l window (local--context) and are selected to produce low dimensional event spaces. Local--context features have been used successfully in a variety of supervised approaches to disambiguation (e.g., (=-=Bruce & Wiebe 1994-=-), (Ng & Lee 1996)). Feature Sets A, B and C The 3 feature sets used in these experiments are designated A, B and C pand are formulated as shown below. The particular feature combinations chosen were ... |

114 | Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning
- Mooney
- 1996
(Show Context)
Citation Context ...The advantage of this approach istwo-fold: (1) there is a large body of evidence recommending the use of the Naive Bayes model in word{sense disambiguation (e.g., (Leacock, Towell, & Voorhees 1993), (=-=Mooney 1996-=-), (Ng 1997)) and (2) unsupervised techniques for parameter estimation, once developed, could be easily applied to other parametric forms in the class of decomposable models. We employ the Expectation... |

79 | Corpus-based statistical sense resolution - Leacock, Towell, et al. - 1993 |

76 | Distinguishing word senses in untagged text
- Pedersen, Bruce
- 1997
(Show Context)
Citation Context ...ns as seeds in an iterative bootstrapping approach. A comparison of the EM algorithm and two agglomerative clustering algorithms as applied to unsupervised word--sense disambiguation is discussed in (=-=Pedersen & Bruce 1997-=-). Using the same data used in this study, (Pedersen & Bruce 1997) found that McQuitty's agglomerative algorithm is significantly more accurate for adjectives and verbs while the EM algorithm is signi... |

50 | Exemplar-based word sense disambiguation: Some recent improvements
- Ng
- 1997
(Show Context)
Citation Context ...this approach is two-fold: (1) there is a large body of evidence recommending the use of the Naive Bayes model in word--sense disambiguation (e.g., (Leacock, Towell, & Voorhees 1993), (Mooney 1996), (=-=Ng 1997-=-)) and (2) unsupervised techniques for parameter estimation, once developed, could be easily applied to other parametric forms in the class of decomposable models. We employ the Expectation Maximizati... |

26 | G.: Lambda-Calculus Models and - Hindley, Longo - 1980 |

26 | Discrimination Decisions for 100,000-Dimensional Spaces - Gale, Yarowsky - 1992 |

26 | A new supervised learning algorithm for word sense disambiguation
- Pedersen, Bruce
- 1997
(Show Context)
Citation Context ... & Wiebe 1997). The Naive Mix, a new supervised learning algorithm that builds an averaged probabilistic model, is introduced and shown to be competitive with well--known machine learning algorithms (=-=Pedersen & Bruce 1997-=-). In the absence of sense--tagged text, the sense of an ambiguous word is treated as a feature with a missing value. The observable features are those that can be automatically identified such as par... |

10 | Automatic Word Sense Discrimination. Computational Linguistics - Schütze - 1998 |

4 | The EM algorithm – an old folk–song sung to a new fast tune (with discussion - Meng, Dyk - 1997 |

2 |
Comparative experiments on disambiguatingword senses: An illustration of the role of bias in machine learning
- Mooney
- 1996
(Show Context)
Citation Context ...e advantage of this approach is two-fold: (1) there is a large body of evidence recommending the use of the Naive Bayes model in word--sense disambiguation (e.g., (Leacock, Towell, & Voorhees 1993), (=-=Mooney 1996-=-), (Ng 1997)) and (2) unsupervised techniques for parameter estimation, once developed, could be easily applied to other parametric forms in the class of decomposable models. We employ the Expectation... |