#### DMCA

## An Intrinsic Information Content Metric for Semantic Similarity in WordNet. (2004)

### Cached

### Download Links

Venue: | In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI-04), |

Citations: | 119 - 6 self |

### Citations

10912 | A Mathematical Theory of Communication - Shannon - 1948 |

1976 | Introduction to WordNet: An on-line lexical database". - Miller, Beckwith, et al. - 1990 |

1243 | An information-theoretic definition of similarity,”
- Lin
- 1998
(Show Context)
Citation Context ...es from which statistical data is gathered. Experimentation will show that this new metric delivers better results when we substitute our IC values with the corpus derived ones in previously established formulations of SS. These formulations, that make use of IC values, are generally known as Information Theoretic formulas, thus our main focus throughout the paper shall be on these. Nevertheless, when analyzing our results we consider alternative approaches in order to exhaustively evaluate our metric. 2 Information Theoretic Approaches Previous information theoretic approaches ([4], [10] and [6]) obtain the needed IC values by statistically analyzing corpora. They associate probabilities to each concept in the taxonomy based on word occurrences in a given corpus. The IC value is then obtained by considering the negative log likelihood: icres(c) = −log p(c) (1) 1 Department of Computer Science, University College Dublin, Ireland email: {nuno.seco, tony.veale, jer.hayes}@ucd.ie where c is some concept in WordNet and p(c) is the probability of encountering c in a given corpus. Philip Resnik [10] was the first to consider the use of this formula for the purpose of SS judgments. The basic... |

1097 | Using information content to evaluate semantic similarity in a taxonomy,”
- Resnik
- 1995
(Show Context)
Citation Context ...l resources from which statistical data is gathered. Experimentation will show that this new metric delivers better results when we substitute our IC values with the corpus derived ones in previously established formulations of SS. These formulations, that make use of IC values, are generally known as Information Theoretic formulas, thus our main focus throughout the paper shall be on these. Nevertheless, when analyzing our results we consider alternative approaches in order to exhaustively evaluate our metric. 2 Information Theoretic Approaches Previous information theoretic approaches ([4], [10] and [6]) obtain the needed IC values by statistically analyzing corpora. They associate probabilities to each concept in the taxonomy based on word occurrences in a given corpus. The IC value is then obtained by considering the negative log likelihood: icres(c) = −log p(c) (1) 1 Department of Computer Science, University College Dublin, Ireland email: {nuno.seco, tony.veale, jer.hayes}@ucd.ie where c is some concept in WordNet and p(c) is the probability of encountering c in a given corpus. Philip Resnik [10] was the first to consider the use of this formula for the purpose of SS judgments. T... |

1047 | Introduction to latent semantic analysis - Landauer, Foltz, et al. - 1998 |

873 | Semantic similarity based on corpus statistics and lexical taxonomy,”
- Jiang, Conrath
- 1997
(Show Context)
Citation Context ...ternal resources from which statistical data is gathered. Experimentation will show that this new metric delivers better results when we substitute our IC values with the corpus derived ones in previously established formulations of SS. These formulations, that make use of IC values, are generally known as Information Theoretic formulas, thus our main focus throughout the paper shall be on these. Nevertheless, when analyzing our results we consider alternative approaches in order to exhaustively evaluate our metric. 2 Information Theoretic Approaches Previous information theoretic approaches ([4], [10] and [6]) obtain the needed IC values by statistically analyzing corpora. They associate probabilities to each concept in the taxonomy based on word occurrences in a given corpus. The IC value is then obtained by considering the negative log likelihood: icres(c) = −log p(c) (1) 1 Department of Computer Science, University College Dublin, Ireland email: {nuno.seco, tony.veale, jer.hayes}@ucd.ie where c is some concept in WordNet and p(c) is the probability of encountering c in a given corpus. Philip Resnik [10] was the first to consider the use of this formula for the purpose of SS judgme... |

609 | Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language." - Resnik - 1999 |

551 | Verb semantics and lexical selection,”
- Wu, Palmer
- 1994
(Show Context)
Citation Context ... a scale from 0 (no similarity) to 4 (perfect synonymy). The average rating for each pair represents a good estimate of how similar the two words are. In order to make fair comparisons we decided to use a independent software package that would calculate similarity values using previously established strategies while allowing the use of WordNet 2.0. One freely available package is that of Siddharth Patwardhan and Ted Pederson2; which implement semantic relatedness measures described by Leacock and Chodorow [5], Jiang and Conrath [4], Resnik [10], Lin [6], Hirst and St. Onge [3], Wu and Palmer [12], the adapted gloss overlap measure by Banerjee and Pedersen [1]. In addition to these we also used Latent Semantic Analysis (LSA) to perform similarity judgments by means of a web interface available at the LSA website3. Table 1 presents the similarity obtained using the chosen algorithms and their correlation coefficient (γ) with the human judgments. The first column states the algorithm used in obtaining similarity scores and the second the correlation between the algorithm and human ratings. The last three rows correspond to algorithms using our IC values. It should be noted that for the s... |

480 |
Semantic memory.
- Quillian
- 1968
(Show Context)
Citation Context ...a wholly intrinsic measure of IC that relies on hierarchical structure alone. We report that this measure is consequently easier to calculate, yet when used as the basis of a similarity mechanism it yields judgments that correlate more closely with human assessments than other, extrinsic measures of IC that additionally employ corpus analysis. 1 Introduction Semantic similarity (SS) has for a long time been a subject of intense scholarship in the fields of Artificial Intelligence, Psychology and Cognitive Science. Computational models trying to imitate this human ability date back to Quillian [9] and the spreading activation algorithm. Nowadays, these computational models of similarity are being included in many software applications with the intent of making these seem more intelligent or even creative (see [2]). The use of SS has also found its way into the Bio-Informatics domain. Recently, Lord [7] studied the effect of using SS strategies when querying DNA and protein sequence databases. Hence, we present a novel metric of IC that is completely derived from WordNet without the need for external resources from which statistical data is gathered. Experimentation will show that this ... |

426 |
Combining local context and wordnet similarity for word sense identification.
- Leacock, Chodorow
- 1998
(Show Context)
Citation Context ...re given 30 pairs of nouns and were asked to rate similarity of meaning for each pair on a scale from 0 (no similarity) to 4 (perfect synonymy). The average rating for each pair represents a good estimate of how similar the two words are. In order to make fair comparisons we decided to use a independent software package that would calculate similarity values using previously established strategies while allowing the use of WordNet 2.0. One freely available package is that of Siddharth Patwardhan and Ted Pederson2; which implement semantic relatedness measures described by Leacock and Chodorow [5], Jiang and Conrath [4], Resnik [10], Lin [6], Hirst and St. Onge [3], Wu and Palmer [12], the adapted gloss overlap measure by Banerjee and Pedersen [1]. In addition to these we also used Latent Semantic Analysis (LSA) to perform similarity judgments by means of a web interface available at the LSA website3. Table 1 presents the similarity obtained using the chosen algorithms and their correlation coefficient (γ) with the human judgments. The first column states the algorithm used in obtaining similarity scores and the second the correlation between the algorithm and human ratings. The last t... |

381 | Lexical chains as representations of context for the detection and correction of malapropisms." Fellbaum.
- Hirst, St-Onge
- 1998
(Show Context)
Citation Context ...ng for each pair on a scale from 0 (no similarity) to 4 (perfect synonymy). The average rating for each pair represents a good estimate of how similar the two words are. In order to make fair comparisons we decided to use a independent software package that would calculate similarity values using previously established strategies while allowing the use of WordNet 2.0. One freely available package is that of Siddharth Patwardhan and Ted Pederson2; which implement semantic relatedness measures described by Leacock and Chodorow [5], Jiang and Conrath [4], Resnik [10], Lin [6], Hirst and St. Onge [3], Wu and Palmer [12], the adapted gloss overlap measure by Banerjee and Pedersen [1]. In addition to these we also used Latent Semantic Analysis (LSA) to perform similarity judgments by means of a web interface available at the LSA website3. Table 1 presents the similarity obtained using the chosen algorithms and their correlation coefficient (γ) with the human judgments. The first column states the algorithm used in obtaining similarity scores and the second the correlation between the algorithm and human ratings. The last three rows correspond to algorithms using our IC values. It should be ... |

357 |
Contextual correlates of semantic similarity,”
- Miller, Charles
- 1991
(Show Context)
Citation Context ...es that IC values are in [0, .., 1]. The above formulation guarantees that the information content decreases monotonically. Moreover, the information content of the imaginary top node of WordNet would yield an information content value of 0. 4 Empirical Studies In order to evaluate our IC metric we decided to use the three formulations of SS presented in section 2 and substituted Resnik’s IC metric with the one presented in equation 5. In accordance with previous research, we evaluated the results by correlating our similarity scores with that of human judgments provided by Miller and Charles [8]. In their study, 38 undergraduate subjects were given 30 pairs of nouns and were asked to rate similarity of meaning for each pair on a scale from 0 (no similarity) to 4 (perfect synonymy). The average rating for each pair represents a good estimate of how similar the two words are. In order to make fair comparisons we decided to use a independent software package that would calculate similarity values using previously established strategies while allowing the use of WordNet 2.0. One freely available package is that of Siddharth Patwardhan and Ted Pederson2; which implement semantic relatedne... |

310 | Sweetening ontologies with dolce,” - Gangemi, Guarino, et al. - 2002 |

264 | Extended gloss overlaps as a measure of semantic relatedness.
- Banerjee, Pedersen
- 2003
(Show Context)
Citation Context ...age rating for each pair represents a good estimate of how similar the two words are. In order to make fair comparisons we decided to use a independent software package that would calculate similarity values using previously established strategies while allowing the use of WordNet 2.0. One freely available package is that of Siddharth Patwardhan and Ted Pederson2; which implement semantic relatedness measures described by Leacock and Chodorow [5], Jiang and Conrath [4], Resnik [10], Lin [6], Hirst and St. Onge [3], Wu and Palmer [12], the adapted gloss overlap measure by Banerjee and Pedersen [1]. In addition to these we also used Latent Semantic Analysis (LSA) to perform similarity judgments by means of a web interface available at the LSA website3. Table 1 presents the similarity obtained using the chosen algorithms and their correlation coefficient (γ) with the human judgments. The first column states the algorithm used in obtaining similarity scores and the second the correlation between the algorithm and human ratings. The last three rows correspond to algorithms using our IC values. It should be noted that for the sake of coherence of our implementations we normalized and applie... |

182 | Manning and Hinrich Schutze. Foundations of statistical natural language processing. - Christopher - 2001 |

95 |
Semantic similarity measures as tools for exploring the gene ontology’,
- Lord, Stevens, et al.
- 2003
(Show Context)
Citation Context ...ditionally employ corpus analysis. 1 Introduction Semantic similarity (SS) has for a long time been a subject of intense scholarship in the fields of Artificial Intelligence, Psychology and Cognitive Science. Computational models trying to imitate this human ability date back to Quillian [9] and the spreading activation algorithm. Nowadays, these computational models of similarity are being included in many software applications with the intent of making these seem more intelligent or even creative (see [2]). The use of SS has also found its way into the Bio-Informatics domain. Recently, Lord [7] studied the effect of using SS strategies when querying DNA and protein sequence databases. Hence, we present a novel metric of IC that is completely derived from WordNet without the need for external resources from which statistical data is gathered. Experimentation will show that this new metric delivers better results when we substitute our IC values with the corpus derived ones in previously established formulations of SS. These formulations, that make use of IC values, are generally known as Information Theoretic formulas, thus our main focus throughout the paper shall be on these. Never... |

82 | A second-order hidden markov model for part-of-speech tagging,” in - Thede, Harper - 1999 |

12 | The importance of retrieval in creative design analogies.
- Gomes, Seco, et al.
- 2003
(Show Context)
Citation Context ... correlate more closely with human assessments than other, extrinsic measures of IC that additionally employ corpus analysis. 1 Introduction Semantic similarity (SS) has for a long time been a subject of intense scholarship in the fields of Artificial Intelligence, Psychology and Cognitive Science. Computational models trying to imitate this human ability date back to Quillian [9] and the spreading activation algorithm. Nowadays, these computational models of similarity are being included in many software applications with the intent of making these seem more intelligent or even creative (see [2]). The use of SS has also found its way into the Bio-Informatics domain. Recently, Lord [7] studied the effect of using SS strategies when querying DNA and protein sequence databases. Hence, we present a novel metric of IC that is completely derived from WordNet without the need for external resources from which statistical data is gathered. Experimentation will show that this new metric delivers better results when we substitute our IC values with the corpus derived ones in previously established formulations of SS. These formulations, that make use of IC values, are generally known as Inform... |

7 | The Analogical Thesaurus: An Emerging Application at the Juncture of Lexical Metaphor and Information Retrieval - Veale |

5 | Bayesian nets in syntactic categorization of novel words - Peshkin, Pfeffer, et al. - 2003 |

1 | Semantic Memory, 227–270 - Quillian - 1968 |