• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 58,639
Next 10 →

Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

by Christos Faloutsos, King-Ip (David) Lin , 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several ..."
Abstract - Cited by 502 (22 self) - Add to MetaCart
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several

RCV1: A new benchmark collection for text categorization research

by David D. Lewis, Yiming Yang, Tony G. Rose, Fan Li - JOURNAL OF MACHINE LEARNING RESEARCH , 2004
"... Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which the data ..."
Abstract - Cited by 663 (11 self) - Add to MetaCart
errorful data. We refer to the original data as RCV1-v1, and the corrected data as RCV1-v2. We benchmark several widely used supervised learning methods on RCV1-v2, illustrating the collection’s properties, suggesting new directions for research, and providing baseline results for future studies. We make

From Data Mining to Knowledge Discovery in Databases.

by Usama Fayyad , Gregory Piatetsky-Shapiro , Padhraic Smyth - AI Magazine, , 1996
"... ■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in database ..."
Abstract - Cited by 538 (0 self) - Add to MetaCart
in KDD systems. Why Do We Need KDD? The traditional method of turning data into knowledge relies on manual analysis and interpretation. For example, in the health-care industry, it is common for specialists to periodically analyze current trends and changes in health-care data, say, on a quarterly basis

Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics

by Geir Evensen - J. Geophys. Res , 1994
"... . A new sequential data assimilation method is discussed. It is based on forecasting the error statistics using Monte Carlo methods, a better alternative than solving the traditional and computationally extremely demanding approximate error covariance equation used in the extended Kalman filter. The ..."
Abstract - Cited by 800 (23 self) - Add to MetaCart
. A new sequential data assimilation method is discussed. It is based on forecasting the error statistics using Monte Carlo methods, a better alternative than solving the traditional and computationally extremely demanding approximate error covariance equation used in the extended Kalman filter

Estimating Wealth Effects without Expenditure Data— or Tears

by Deon Filmer, Lant Pritchett - Policy Research Working Paper 1980, The World , 1998
"... Abstract: We use the National Family Health Survey (NFHS) data collected in Indian states in 1992 and 1993 to estimate the relationship between household wealth and the probability a child (aged 6 to 14) is enrolled in school. A methodological difficulty to overcome is that the NFHS, modeled closely ..."
Abstract - Cited by 871 (16 self) - Add to MetaCart
Abstract: We use the National Family Health Survey (NFHS) data collected in Indian states in 1992 and 1993 to estimate the relationship between household wealth and the probability a child (aged 6 to 14) is enrolled in school. A methodological difficulty to overcome is that the NFHS, modeled

Data Preparation for Mining World Wide Web Browsing Patterns

by Robert Cooley, Bamshad Mobasher, Jaideep Srivastava - KNOWLEDGE AND INFORMATION SYSTEMS , 1999
"... The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of tra#c and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An i ..."
Abstract - Cited by 567 (43 self) - Add to MetaCart
is the application of data mining techniques to usage logs of large Web data repositories in order to produce results that can be used in the design tasks mentioned above. However, there are several preprocessing tasks that must be performed prior to applying data mining algorithms to the data collected from

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

by Paolo Ciaccia, Marco Patella, Pavel Zezula , 1997
"... A new access meth d, called M-tree, is proposed to organize and search large data sets from a generic "metric space", i.e. whE4 object proximity is only defined by a distance function satisfyingth positivity, symmetry, and triangle inequality postulates. We detail algorith[ for insertion o ..."
Abstract - Cited by 663 (38 self) - Add to MetaCart
are reported, considering as th performance criteria th number of page I/O's and th number of distance computations. Th results demonstratethm th Mtree indeed extendsth domain of applicability beyond th traditional vector spaces, performs reasonably well inhE[94Kv#E44V[vh data spaces, and scales well

Estimation of probabilities from sparse data for the language model component of a speech recognizer

by Slava M. Katz - IEEE Transactions on Acoustics, Speech and Signal Processing , 1987
"... Abstract-The description of a novel type of rn-gram language model is given. The model offers, via a nonlinear recursive procedure, a com-putation and space efficient solution to the problem of estimating prob-abilities from sparse data. This solution compares favorably to other proposed methods. Wh ..."
Abstract - Cited by 799 (2 self) - Add to MetaCart
, and it is a problem that one always encounters while collecting fre-quency statistics on words and word sequences (m-grams) from a text of finite size. This means that even for a very large data col-lection, the maximum likelihood estimation method does not allow Turing’s estimate PT for a probability of a

Fast subsequence matching in time-series databases

by Christos Faloutsos, M. Ranganathan, Yannis Manolopoulos - PROCEEDINGS OF THE 1994 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA , 1994
"... We present an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract - Cited by 533 (24 self) - Add to MetaCart
We present an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature

Analysis of Recommendation Algorithms for E-Commerce

by Badrul Sarwar, George Karypis, Joseph Konstan, John Rield , 2000
"... Recommender systems apply statistical and knowledge discovery techniques to the problem of making product recommendations during a live customer interaction and they are achieving widespread success in E-Commerce nowadays. In this paper, we investigate several techniques for analyzing large-scale pu ..."
Abstract - Cited by 523 (22 self) - Add to MetaCart
-scale purchase and preference data for the purpose of producing useful recommendations to customers. In particular, we apply a collection of algorithms such as traditional data mining, nearest-neighbor collaborative ltering, and dimensionality reduction on two dierent data sets. The rst data set was derived from
Next 10 →
Results 1 - 10 of 58,639
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University