Results 1 -
8 of
8
WEBSOM - Self-Organizing Maps of Document Collections
- Neurocomputing
, 1997
"... Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable efforts have been made to develop alternative methods, but their practical applicability has been l ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable efforts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of miscellaneous document collections. The WEBSOM method organizes a document collection on a map display that provides an overview of the collection and facilitates interactive browsing. Interesting documents can be retrieved by a content addressable search of interesting map locations. The interesting locations could also be marked as filters for collecting interesting new documents.
Clustering in Massive Data Sets
- Handbook of massive data sets
, 1999
"... We review the time and storage costs of search and clustering algorithms. We exemplify these, based on case-studies in astronomy, information retrieval, visual user interfaces, chemical databases, and other areas. Sections 2 to 6 relate to nearest neighbor searching, an elemental form of clustering, ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We review the time and storage costs of search and clustering algorithms. We exemplify these, based on case-studies in astronomy, information retrieval, visual user interfaces, chemical databases, and other areas. Sections 2 to 6 relate to nearest neighbor searching, an elemental form of clustering, and a basis for clustering algorithms to follow. Sections 7 to 11 review a number of families of clustering algorithm. Sections 12 to 14 relate to visual or image representations of data sets, from which a number of interesting algorithmic developments arise.
Overcoming the Curse of Dimensionality in Clustering by means of the Wavelet Transform
- The Computer Journal
, 2000
"... We use a redundant wavelet transform analysis to detect clusters in high-dimensional data spaces. We overcome Bellman's \curse of dimensionality" in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet t ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We use a redundant wavelet transform analysis to detect clusters in high-dimensional data spaces. We overcome Bellman's \curse of dimensionality" in such problems by (i) using some canonical ordering of observation and variable (document and term) dimensions in our data, (ii) applying a wavelet transform to such canonically ordered data, (iii) modeling the noise in wavelet space, (iv) dening signicant component parts of the data as opposed to insignicant or noisy component parts, and (v) reading o the resultant clusters. The overall complexity of this innovative approach is linear in the data dimensionality. We describe a number of examples and test cases, including the clustering of high-dimensional hypertext data. 1 Introduction Bellman's (1961) [1] \curse of dimensionality" refers to the exponential growth of hypervolume as a function of dimensionality. All problems become tougher as the dimensionality increases. Nowhere is this more evident than in problems related to ...
Maps of Information Spaces: Assessments from Astronomy
, 1999
"... We discuss the implementation of a cartographic user interface to bibliographic and other information subspaces in astronomy. This includes a front-end to two of the ve premier scholarly journals in astronomy. We present a range of comparative assessments, in operational frameworks, of this appr ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We discuss the implementation of a cartographic user interface to bibliographic and other information subspaces in astronomy. This includes a front-end to two of the ve premier scholarly journals in astronomy. We present a range of comparative assessments, in operational frameworks, of this approach to accessing and retrieving astronomical information. Finally we discuss the particular role that such cartographic user interfaces can play in Web-based information seeking, and contrast this with widely-used currently available search technologies. Keywords: Concept spaces, Distributed information retrieval, Information Author for correspondence. Email f.murtagh@qub.ac.uk 1 seeking, Internet, Kohonen self-organizing feature maps, Maps, Neural networks, Resource discovery, User interfaces 1 Introduction Information retrieval by means of \semantic road maps" was rst detailed by Doyle (1961). The spatial metaphor is a very powerful one in human information processing. The sp...
Computational Astronomy: Current Directions And Future Perspectives
"... . We review data analysis, pursuing the following lines of enquiry: traditional, numeric data analysis, based on graphical means; \active" data analysis, where the results provide new graphical user interfaces, or where the results are used to facilitate navigation in information spaces; and newly d ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. We review data analysis, pursuing the following lines of enquiry: traditional, numeric data analysis, based on graphical means; \active" data analysis, where the results provide new graphical user interfaces, or where the results are used to facilitate navigation in information spaces; and newly developed tools and techniques for the processing of image and other signal objects. 1. Data Analysis for Visualization Frequently the analyst must interact with the data. This means that one type of display is made, followed by a dierent visualization of some subset of the data. The term \exploratory data analysis" is most closely associated with the name of Tukey (Princeton). Interactive statistics is another term used, and this activity may be supported by computer software. A prime example is the S language (or software environment) originating in ATT Bell Labs, and enhanced as the S-Plus package by MathSoft Inc. (formerly StatSci Inc.). Figures 1, 2 and 3 illustrate complementary view...
Input Data Coding in Multivariate Data Analysis: Techniques and Practice in Correspondence Analysis
"... this article, including doubling, complete disjunctive form, and fuzzy coding, all lead to equally-weighted observations. At the expense of more storage space, we see therefore that the data become considerably more well-behaved and interpretable. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
this article, including doubling, complete disjunctive form, and fuzzy coding, all lead to equally-weighted observations. At the expense of more storage space, we see therefore that the data become considerably more well-behaved and interpretable.
Information integration and retrieval: the CDS hub
"... The Centre de Données astronomiques de Strasbourg (CDS) develops a set of value-added services, widely used for information retrieval, observation preparation, data interpretation,... SIMBAD, VizieR, Aladin, and the ‘Dictionary of Nomenclature’, integrate heterogeneous, selected information from obs ..."
Abstract
- Add to MetaCart
The Centre de Données astronomiques de Strasbourg (CDS) develops a set of value-added services, widely used for information retrieval, observation preparation, data interpretation,... SIMBAD, VizieR, Aladin, and the ‘Dictionary of Nomenclature’, integrate heterogeneous, selected information from observatory archives, sky surveys and publications. Each service organizes information in a different way (astronomical objects, tables, images with overlays, nomenclature), and the CDS hub allows versatile information retrieval, e.g. looking for known information in a given region of the sky, including observations of ground- and space-based instruments, or searching by criteria in large data sets. Links among the CDS services, and with other reference on-line information systems, such as observatory and survey archives and publications, permit comprehensive searches in a wide variety of resources. Shared exchange standards and generic tools such as the GLU are essential for the building of links. XML is a tool for further information integration, and Aladin is a precursor of an integration tool, relying on FITS and XML. New functionalities will be developed at CDS in the context of the Virtual Observatory, e.g. for data mining and management of very large catalogues. A prototype set of interoperable archives will be implemented in the frame of the European Astronomical Virtual Observatory project.

