Results 1 - 10
of
25
Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE
, 1998
"... ..."
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation
- Communication Cognition and Artificial Intelligence, Spring
, 1998
"... : The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an information-rich society into a nightmare of info-gluts. Many researchers believe that turning an info-glu ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
: The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an information-rich society into a nightmare of info-gluts. Many researchers believe that turning an info-glut into a useful digital library requires automated techniques for organizing and categorizing large-scale information. This paper presents research in which we sought to develop a scaleable textual classification and categorization system based on the Kohonen's self-organizing feature map (SOM) algorithm. In our paper, we show how self-organization can be used for automatic thesaurus generation. Our proposed data structure and algorithm took advantage of the sparsity of coordinates in the document input vectors and reduced the SOM computational complexity by several order of magnitude. The proposed Scaleable SOM (SSOM) algorithm makes large-scale textual categorization tasks a possibility. A...
WEBSOM for Textual Data Mining
- Artificial Intelligence Review
, 1999
"... New methods that axe user-friendly and efficient axe needed for guidance among the masses of textual information available in the Internet and the World Wide Web. We have developed a method and a tool called the WEBSON which utilizes the self-organizing map algorithm (SON) for organizing laxge colle ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
New methods that axe user-friendly and efficient axe needed for guidance among the masses of textual information available in the Internet and the World Wide Web. We have developed a method and a tool called the WEBSON which utilizes the self-organizing map algorithm (SON) for organizing laxge collections of text documents onto visual document maps. The approach to processing text is statistically oriented, computationally feasible, and scalablever a million text documents have been ordered on a single map. In the axticle we consider different kinds of information needs and tasks regaxding organizing, visualizing, seaxching, categorizing and filtering textual data. Furthermore, we discuss and illustrate with examples how document maps can aid in these situations. An example is presented where a document map is utilized as a tool for visualizing and filtering a stream of incoming electronic mail messages.
PEBL: Web Page Classification without Negative Examples
- IEEE Transactions on Knowledge and Data Engineering
, 2004
"... Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious preprocessing such as collecting positive and ne ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious preprocessing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of nonhomepages (negative examples). In particular, collecting negative training examples requires arduous work and caution to avoid bias. This paper presents a framework, called Positive Example Based Learning (PEBL), for Web page classification which eliminates the need for manually collecting negative training examples in preprocessing. The PEBL framework applies an algorithm, called Mapping-Convergence (M-C), to achieve high classification accuracy (with positive and unlabeled data) as high as that of a traditional SVM (with positive and negative data). M-C runs in two stages: the mapping stage and convergence stage. In the mapping stage, the algorithm uses a weak classifier that draws an initial approximation of "strong" negative data. Based on the initial approximation, the convergence stage iteratively runs an internal classifier (e.g., SVM) which maximizes margins to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. We present the M-C algorithm with supporting theoretical and experimental justifications. Our experiments show that, given the same set of positive examples, the M-C algorithm outperforms one-class SVMs, and it is almost as accurate as the traditional SVMs.
Semantic Indexing for a Complete Subject Discipline
- In 4th Int ACM Conf on Digital Libraries
, 1999
"... As part of the Illinois Digital Library Initiative (DLI) project we developed "scalable semantics" technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developin ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
As part of the Illinois Digital Library Initiative (DLI) project we developed "scalable semantics" technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developing an integrated analysis environment, the Interspace Prototype, that uses "semantic indexing" as the foundation for supporting concept navigation. These semantic indexes record the contextual correlation of noun phrases, and are computed generically, independent of subject domain. Using this technology, we were able to compute semantic indexes for a subject discipline. In particular, in the summer of 1998, we computed concept spaces for 9.3M MEDLINE bibliographic records from the National Library of Medicine (NLM) which extensively covered the biomedical literature for the period from 1966 to 1997. In this experiment, we first partitioned the collection into smaller collections (repositorie...
Creating a Large-Scale Content-Based Airphoto Image Digital Library
- IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2000
"... This paper describes a content-based image retrieval digital library that supports geographical image retrieval over a testbed of 800 aerial photographs, each 25 megabytes in size. In addition, this paper also introduces a methodology to evaluate the performance of the algorithms in the prototype sy ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper describes a content-based image retrieval digital library that supports geographical image retrieval over a testbed of 800 aerial photographs, each 25 megabytes in size. In addition, this paper also introduces a methodology to evaluate the performance of the algorithms in the prototype system. The major contributions of this paper are two. 1) We suggest an approach that incorporates various image processing techniques including Gabor filters, image enhancement, and image compression, as well as information analysis technique such as self-organizing map (SOM) into an effective large-scale geographical image retrieval system. 2) We present two experiments that evaluate the performance of the Gaborfilter -extracted features along with the corresponding similarity measure against that of human perception, addressing the lack of studies in assessing the consistency between an image representation algorithm or an image categorization method and human mental model.
Web Mining: Machine Learning for Web Applications
- Annual Review of Information Science and Technology
, 2004
"... With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich
A Collection of Visual Thesauri for Browsing Large Collections of Geographic Images
, 1997
"... Digital libraries of geo-spatial multimedia content are currently deficient in providing fuzzy, concept-based retrieval mechanisms to users. The main challenge being that indexing and thesaurus creation are extremely labor-intensive processes for text documents and especially for images. Recently, 8 ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Digital libraries of geo-spatial multimedia content are currently deficient in providing fuzzy, concept-based retrieval mechanisms to users. The main challenge being that indexing and thesaurus creation are extremely labor-intensive processes for text documents and especially for images. Recently, 800,000 declassified satellite photographs were made available by the United States Geological Survey. Additionally, millions of satellite and aerial photographs are archived in national and local map libraries. Such enormous collections make human indexing and thesaurus generation methods impossible to utilize. In this paper we propose a scalable method to automatically generate visual thesauri of large collections of geo-spatial media using fuzzy, unsupervised machine learning techniques. Contents 1 Introduction 1 2 Texture Based Visual Thesaurus Creation 1 2.1 Step 1: Create Image Tiles and Reduced Resolution Images . . . . . . . . . . 2 2.2 Step 2: Extract Features . . . . . . . . . ....
Longitudinal Patent Analysis for Nanoscale Science and Engineering: Country, Institution and Techonlogy Field
- Journal of Nanoparticle Research
, 2003
"... Nanoscale science and engineering (NSE) and related areas have seen rapid growth in recent years. The speed and scope of development in the field have made it essential for researchers to be informed on the progress across different laboratories, companies, industries and countries. In this project, ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Nanoscale science and engineering (NSE) and related areas have seen rapid growth in recent years. The speed and scope of development in the field have made it essential for researchers to be informed on the progress across different laboratories, companies, industries and countries. In this project, we experimented with several analysis and visualization techniques on NSE-related United States patent documents to support various knowledge tasks. This paper presents results on the basic analysis of nanotechnology patents between 1976 and 2002, content map analysis and citation network analysis. The data have been obtained on individual countries, institutions and technology fields. The top 10 countries with the largest number of nanotechnology patents are the United States, Japan, France, the United Kingdom, Taiwan, Korea, the Netherlands, Switzerland, Italy and Australia. The fastest growth in the last 5 years has been in chemical and pharmaceutical fields, followed by semiconductor devices. The results demonstrate potential of information-based discovery and visualization technologies to capture knowledge regarding nanotechnology performance, transfer of knowledge and trends of development through analyzing the patent documents.
Medical Data Mining on the Internet: Research on a Cancer Information System
, 1999
"... This paper discusses several data mining algorithms and techniques that we have developed at the University of Arizona Artificial Intelligence Lab. We have implemented these algorithms and techniques into several prototypes, one of which focuses on medical information developed in cooperation with t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper discusses several data mining algorithms and techniques that we have developed at the University of Arizona Artificial Intelligence Lab. We have implemented these algorithms and techniques into several prototypes, one of which focuses on medical information developed in cooperation with the National Cancer Institute (NCI) and the University of Illinois at Urbana-Champaign. We propose an architecture for medical knowledge information systems that will permit data mining across several medical information sources and discuss a suite of data mining tools that we are developing to assist NCI in improving public access to and use of their existing vast cancer information collections.

