Results 1 - 10
of
93
CRYSTAL: Inducing a Conceptual Dictionary
, 1995
"... One of the central knowledge sources of an information extraction (IE) system is a dictionary of linguistic patterns that can be used to identify references to relevant information in a text. Automatic creation of conceptual dictionaries is important for portability and scalability of an IE system. ..."
Abstract
-
Cited by 136 (11 self)
- Add to MetaCart
One of the central knowledge sources of an information extraction (IE) system is a dictionary of linguistic patterns that can be used to identify references to relevant information in a text. Automatic creation of conceptual dictionaries is important for portability and scalability of an IE system. This paper describes CRYSTAL, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus. Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances. Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules. 1 Information Extraction An information extraction (IE) system analyzes unrestricted natural language text and produces a representation of the information from the text whichis cons...
Mining the Biomedical Literature in the Genomic Era: An Overview
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2003
"... The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last f ..."
Abstract
-
Cited by 72 (2 self)
- Add to MetaCart
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of Genomics and Proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years there is a lot of interest within the scientific community in literature-mining tools to help sort through this abundance of literature, and find the nuggets of information most relevant and useful for specific analysis tasks. This paper
An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation
- Journal of the American Society for Information Science
, 1995
"... This paper presents a framework for knowledge discovery and concept exploration. In order to enhance the concept exploration capability of knowledge-based systems and to alleviate the limitations of the manual browsing approach, we have developed two spreading activation-based algo-rithms for concep ..."
Abstract
-
Cited by 61 (18 self)
- Add to MetaCart
This paper presents a framework for knowledge discovery and concept exploration. In order to enhance the concept exploration capability of knowledge-based systems and to alleviate the limitations of the manual browsing approach, we have developed two spreading activation-based algo-rithms for concept exploration in large, heterogeneous net-works of concepts (e.g., multiple thesauri). One algorithm, which is based on the symbolic Al paradigm, performs a conventional branch-and-bound search on a semantic net representation to identify other highly relevant concepts (a serial, optimal search process). The second algorithm, which is based on the neural network approach, executes the Hopfield net parallel relaxation and convergence pro-cess to identify “convergent ” concepts for some initial queries (a parallel, heuristic search process). Both algo-rithms can be adopted for automatic, multiple-thesauri consultation. We tested these two algorithms on a large text-based knowledge network of about 13,000 nodes (terms) and 80,000 directed links in the area of computing technologies. This knowledge network was created from two external thesauri and one automatically generated thesaurus. We conducted experiments to compare the be-haviors and performances of the two algorithms with the hypertext-like browsing process. Our experiment revealed that manual browsing achieved higher-term recall but lower-term precision in comparison to the algorithmic sys-tems. However, it was also a much more laborious and cog-nitively demanding process. In document retrieval, there were no statistically significant differences in document re-call and precision between the algorithms and the manual browsing process. In light of the effort required by the man-ual browsing process, our proposed algorithmic approach presents a viable option for efficiently traversing large-scale, multiple thesauri (knowledge network). 1
Large-Scale Repositories of Highly Expressive Reusable Knowledge
, 1999
"... We describe an ongoing project to develop technology that will support collaborative construction and effective use of distributed large-scale repositories of highly expressive reusable ontologies. We are focusing on developing a distributed server architecture for ontology construction and use, re ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
We describe an ongoing project to develop technology that will support collaborative construction and effective use of distributed large-scale repositories of highly expressive reusable ontologies. We are focusing on developing a distributed server architecture for ontology construction and use, representation formalisms that remove key barriers to expressing essential knowledge in and about ontologies, ontology construction tools, and tools for obtaining domain models for use in applications from large-scale ontology repositories. We are building on the results of the DARPA Knowledge Sharing Effort, specifically by using the Knowledge Interchange Format (KIF) as a core representation language and the Ontolingua system as a core ontology development environment. In order to enable distributed ontology repositories and services, we are developing a distributed server architecture for ontology construction and use based on ontology servers which provide access via a network API to the ...
Learning Text Analysis Rules For Domain-Specific Natural Language Processing
, 1997
"... An enormous amount of knowledge is needed to infer the meaning of unrestricted natural language. The problem can be reduced to a manageable size by restricting attention to a specific domain, which is a corpus of texts together with a predefined set of concepts that are of interest to that domain. T ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
An enormous amount of knowledge is needed to infer the meaning of unrestricted natural language. The problem can be reduced to a manageable size by restricting attention to a specific domain, which is a corpus of texts together with a predefined set of concepts that are of interest to that domain. Two widely different domains are used to illustrate this domain-specific approach. One domain is a collection of Wall Street Journal articles in which the target concept is management succession events: identifying persons moving into corporate management positions or moving out. A second domain is a collection of hospital discharge summaries in which the target concepts are various classes of diagnosis or symptom.
A Terminology Server For Medical Language and Medical Information Systems
, 1994
"... GALEN is developing a Terminology Server to support the development and integration of clinical systems through a range of key terminological services, built around a language-independent, reusable, shared system of concepts - the CORE model. The focus is on supporting applications for medical recor ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
GALEN is developing a Terminology Server to support the development and integration of clinical systems through a range of key terminological services, built around a language-independent, reusable, shared system of concepts - the CORE model. The focus is on supporting applications for medical records, clinical user interfaces and clinical information systems, but also includes systems for natural language understanding, clinical decision support, management of coding and classification schemes, and bibliographic retrieval. The Terminology Server integrates three modules: the Concept Module which implements the GRAIL formalism and manages the internal representation of concept entities, the Multilingual Module which manages the mapping of concept entities to natural language, and the Code Conversion Module which manages the mapping of concept entities to and from existing coding and classification schemes. The Terminology Server also provides external referencing to concept entities, c...
Using concepts in literature-based discovery: Simulating Swanson’s Raynaud-fish oil and migrainemagnesium discoveries
- J. Am. Soc. Inf. Sci. Tech
, 2001
"... Literature-based discovery has resulted in new knowledge. ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Literature-based discovery has resulted in new knowledge.
HealthDoc: Customizing patient information and health education by medical condition and personal characteristics
- In First International Workshop on Artificial Intelligence in Patient Education
, 1995
"... The HealthDoc project aims to provide a comprehensive approach to the customization of patient-information and health-education materials through the development of sophisticated natural language generation systems. We adopt a model of patient education that takes into account patient information ra ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
The HealthDoc project aims to provide a comprehensive approach to the customization of patient-information and health-education materials through the development of sophisticated natural language generation systems. We adopt a model of patient education that takes into account patient information ranging from simple medical data to complex cultural beliefs, so that our work provides both an impetus and testbed for research in multicultural health communication. We propose a model of language generation, `generation by selection and repair ', that relies on a `master-document' representation that pre-determines the basic form and content of a text, yet is amenable to editing and revision for customization. The implementation of this model has so far led to the design of a sentence planner that integrates multiple complex planning tasks and a rich set of ontological and linguistic knowledge sources. 1 Customizing patient-education material Present-day health-education and patient-infor...
Automatic template creation for information extraction
, 1998
"... Information Extraction (IE) approaches currently assume that a template exists which sufficiently defines the requirements of the task. Substantial human effort is required to generate these basic templates and to provide a development corpus. In the two principal IE competitions, the Message Unders ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
Information Extraction (IE) approaches currently assume that a template exists which sufficiently defines the requirements of the task. Substantial human effort is required to generate these basic templates and to provide a development corpus. In the two principal IE competitions, the Message Understanding Conference (MUC) and Tipster, the templates were constructed directly from the experience of analysts. This manual approach cannot always be assumed. This proposal concerns the automatic construction of MUC-style templates, substantially reducing the human effort required. The approach will carry out a corpus-based analysis of task-relevant documents, identifying and analysing the interaction between the fundamental elements. A resource which defines semantic relationships will be necessary to identify and categorise these fundamental elements. This application is of particular interest to researchers in the field of IE and automatic abstracting. 1 1.
Ontology and the Lexicon
- In Handbook on Ontologies in Information Systems
, 2003
"... ly have a separate entry for each category; for example, flap would have one entry as a noun and another as a verb. Separate entries are usually also appropriate for each of the senses of a homonym---a word that has more than one unrelated sense even within a single syntactic category; for example, ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
ly have a separate entry for each category; for example, flap would have one entry as a noun and another as a verb. Separate entries are usually also appropriate for each of the senses of a homonym---a word that has more than one unrelated sense even within a single syntactic category; for example, the noun pen would have distinct entries for the senses writing instrument, animal enclosure,andswan. Polysemy--- related or overlapping senses---is a more-complex situation; sometimes the senses may be discrete enough that we can treat them as distinct: for example, window as both opening in wall and glass pane in opening in wall (fall through the window; break the window). But this is not always so; the word open, for example, has many overlapping senses concerning unfolding, expanding, revealing, moving to an open position, making openings in, and so on, and separating them into discrete senses, as the writers of dictionary definitions try to do, is not possible (see also sections 1.2.3 a

