Results 1 -
6 of
6
Data Mining on Symbolic Knowledge Extracted from the Web
, 2000
"... Information extractors and classifiers operating on unrestricted, unstructured texts are an errorful source of large amounts of potentially useful information, especially when combined with a crawler which automatically augments the knowledge base from the world-wide web. At the same time, there is ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
Information extractors and classifiers operating on unrestricted, unstructured texts are an errorful source of large amounts of potentially useful information, especially when combined with a crawler which automatically augments the knowledge base from the world-wide web. At the same time, there is much structured information on the World Wide Web. Wrapping the web-sites which provide this kind of information provide us with a second source of information; possibly less up-to-date, but reliable as facts. We give a case study of combining information from these two kinds of sources in the context of learning facts about companies. We provide results of association rules, propositional and relational learning, which demonstrate that data-mining can help us improve our extractors, and that using information from two kinds of sources improves the reliability of data-mined rules. 1. INTRODUCTION The World Wide Web has become a significant source of information. Most of this computer-retri...
Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections
- IN PROCEEDINGS OF THE IEEE FORUM ON RESEARCH AND TECHNOLOGY ADVANCES IN DIGITAL LIBRARIES
, 1998
"... Traditionally, texts have been analysed using various information retrieval related methods, such as full-text analysis, and natural language processing. However, only few examples of data mining in text, particularly in full text, are available. In this paper we show that general data mining metho ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Traditionally, texts have been analysed using various information retrieval related methods, such as full-text analysis, and natural language processing. However, only few examples of data mining in text, particularly in full text, are available. In this paper we show that general data mining methods are applicable to text analysis tasks such as descriptive phrase extraction. Moreover, we present a general framework for text mining. The framework follows the general knowledge discovery process, thus containing steps from preprocessing to the utilization of the results. The data mining method that we apply is based on generalized episodes and episode rules. We give concrete examples of how to preprocess texts based on the intended use of the discovered results and we introduce a weighting scheme that helps in pruning out redundant or non-descriptive phrases. We also present results from real-life data experiments.
Knowledge Discovery for Automatic Query Expansion on the World Wide Web
- In Workshop on the World-Wide Web and Conceptual Modeling (WWWCM'99), LNCS 1727
, 1999
"... The World-Wide Web is an enormous, distributed, and heterogeneous information space. Currently, with the growth of available data, finding interesting information is difficult. Search engines like Altavista are useful, but their results are not always satisfactory. In this paper, we present a method ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The World-Wide Web is an enormous, distributed, and heterogeneous information space. Currently, with the growth of available data, finding interesting information is difficult. Search engines like Altavista are useful, but their results are not always satisfactory. In this paper, we present a method called Knowledge Discovery on the Web for extracting connections between terms. The knowledge in these connections is used for query expansion. We present experiments performed with our system, which is based on the SMART retrieval system. We used the comparative precision method for evaluating our system against three well-known Web search engines on a collection of 60,000 Web pages.
Mining Knowledge from Text Collections Using Automatically Generated Metadata
- Proceedings of the Fourth International Conference on Practical Aspects of Knowledge Management (PAKM-2002
, 2002
"... Data mining is typically applied to large databases of highly structured information in order to discover new knowledge. In businesses and institutions, the amount of information existing in repositories of text documents usually rivals or surpasses the amount found in relational databases. Though t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Data mining is typically applied to large databases of highly structured information in order to discover new knowledge. In businesses and institutions, the amount of information existing in repositories of text documents usually rivals or surpasses the amount found in relational databases. Though the amount of potentially valuable knowledge contained in document collections can be great, they are often difficult to analyze. Therefore, it is important to develop methods to efficiently discover knowledge embedded in these document repositories. In this paper we describe an approach for mining knowledge from text collections by applying data mining techniques to metadata records generated via automated text categorization. By controlling the set of metadata fields as well as the set of assigned categories we can customize the knowledge discovery task to address specific questions. As an example, we apply the approach to a large collection of product reviews and evaluate the performance of the knowledge discovery.
Introducing Text Mining
"... With the growing importance of electronic content and electronic media for storing and exchanging text documents, there is also a growing interest in tools that can help finding and sorting information included in the text documents. Text mining, also known as document mining, is an emerging technol ..."
Abstract
- Add to MetaCart
With the growing importance of electronic content and electronic media for storing and exchanging text documents, there is also a growing interest in tools that can help finding and sorting information included in the text documents. Text mining, also known as document mining, is an emerging technology for analyzing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. It can be envisaged as a leap from data mining or knowledge discovery from (structured) databases.
EUSFLAT- LFA 2005 Text Mining: Intermediate Forms for Knowledge Representation
"... In this paper we review the main intermediate forms proposed in text mining, and we briefly study some fuzzy counterparts. The concept of intermediate form applies to any knowledge representation employed to represent in a structured way the semantic content of a text corpus. Intermediate forms play ..."
Abstract
- Add to MetaCart
In this paper we review the main intermediate forms proposed in text mining, and we briefly study some fuzzy counterparts. The concept of intermediate form applies to any knowledge representation employed to represent in a structured way the semantic content of a text corpus. Intermediate forms play a central role in the text mining process since it is necessary to transform plain text into a form in order to apply mining techniques. Since the semantics of text use to be imprecise, the use of fuzzy intermediate forms seems to be a natural solution in many cases. We discuss about fuzzy intermediate forms and the corresponding fuzzy text mining techniques that may be applicable on them.

