• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 249
Next 10 →

iSAX: Indexing and Mining Terabyte Sized Time Series, SIGKDD. pp

by Jin Shieh, Eamonn Keogh , 2008
"... Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, it has not led to algorithms that can scale to the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a nove ..."
Abstract - Cited by 54 (5 self) - Add to MetaCart
Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, it has not led to algorithms that can scale to the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a novel multiresolution symbolic representation can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. Our approach allows both fast exact search and ultra fast approximate search. We show how to exploit the combination of both types of search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing millions of time series.

Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized

by Dragomir Yankov Eamonn Keogh
"... The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astrono ..."
Abstract - Cited by 20 (6 self) - Add to MetaCart
, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk/tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can

cgmOLAP: Efficient Parallel Generation and Querying of Terabyte Size ROLAP Data Cubes

by Y. Chen, A. Rau-Chaplin, F. Dehne, T. Eavis, D. Green, E. Sithirasenan - In Proc. 22nd International Conference on Data Engineering (ICDE , 2006
"... We present the cgmOLAP server, the first fully functional parallel OLAP system able to build data cubes at a rate of more than 1 Terabyte per hour. cgmOLAP incorporates a variety of novel approaches for the parallel computation of full cubes, partial cubes, and iceberg cubes as well as new parallel ..."
Abstract - Cited by 7 (3 self) - Add to MetaCart
We present the cgmOLAP server, the first fully functional parallel OLAP system able to build data cubes at a rate of more than 1 Terabyte per hour. cgmOLAP incorporates a variety of novel approaches for the parallel computation of full cubes, partial cubes, and iceberg cubes as well as new parallel

F́ısréal: A low cost terabyte search engine

by Paul Ferguson, Cathal Gurrin, Peter Wilkins, Alan F. Smeaton - In Proceeding of European Conference in IR , 2005
"... Abstract. In this poster we describe the development of a distributed search engine, referred to as Físréal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from the structure ..."
Abstract - Cited by 7 (6 self) - Add to MetaCart
Abstract. In this poster we describe the development of a distributed search engine, referred to as Físréal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from

The TREC Terabyte retrieval track

by Charles Clarke, Nick Craswell, Ian Soboroff - SIGIR Forum , 2005
"... The Terabyte Retrieval Track of the Text REtrieval Conference (TREC) provides an opportunity to test retrieval techniques and evaluation methodologies in the context of a terabyte-scale corpus. Given the size of the corpus, the track also provides a vehicle for participants to investigate query and ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
The Terabyte Retrieval Track of the Text REtrieval Conference (TREC) provides an opportunity to test retrieval techniques and evaluation methodologies in the context of a terabyte-scale corpus. Given the size of the corpus, the track also provides a vehicle for participants to investigate query

Effective smoothing for a terabyte of text

by Jaap Kamps - In The Fourteenth Text REtrieval Conference (TREC 2005). National Institute of Standards and Technology. NIST Special Publication , 2006
"... track, we conducted a range of experiments investigating the effects of larger collections. Our main findings can be summarized as follows. First, we tested whether our retrieval system scales up to terabyte-scale collections. We found that our retrieval system can handle 25 million documents, altho ..."
Abstract - Cited by 14 (12 self) - Add to MetaCart
track, we conducted a range of experiments investigating the effects of larger collections. Our main findings can be summarized as follows. First, we tested whether our retrieval system scales up to terabyte-scale collections. We found that our retrieval system can handle 25 million documents

Hive- A Warehousing Solution Over a Map-Reduce Framework

by Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, Raghotham Murthy - IN VLDB '09: PROCEEDINGS OF THE VLDB ENDOWMENT , 2009
"... The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, mak- ing traditional warehousing solutions prohibitively expen- sive. Hadoop [3] is a popular open-source map-reduce im- plementation which is being used as an alternative to store and pr ..."
Abstract - Cited by 265 (1 self) - Add to MetaCart
The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, mak- ing traditional warehousing solutions prohibitively expen- sive. Hadoop [3] is a popular open-source map-reduce im- plementation which is being used as an alternative to store

THE FEASIBILITY OF MOVING TERABYTE FILES BETWEEN CAMPUS AND CLOUD

by Adam H. Villa, Elizabeth Varki
"... Cloud/Grid computing is envisioned to be a predominant computing model of the future. The movement of files between cloud and client is intrinsic to this model. With the creation of ever expanding data sets, the sizes of files have increased dramatically. Consequently, terabyte file transfers are ex ..."
Abstract - Add to MetaCart
Cloud/Grid computing is envisioned to be a predominant computing model of the future. The movement of files between cloud and client is intrinsic to this model. With the creation of ever expanding data sets, the sizes of files have increased dramatically. Consequently, terabyte file transfers

Searching a Terabyte of Text Using Partial Replication

by Zhihong Lu, Kathryn S. McKinley , 1999
"... The explosion of content in distributed information retrieval (IR) systems requires new mechanisms in order to attain timely and accurate retrieval of unstructured text. In this paper, we investigate using partial replication to search a terabyte of text in our distributed IR system. We use a rep ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
loads with partial replication on a terabyte text database. We further investigate query locality with respect to time, replica size, and replica updating costs using real logs from THOMAS and Excite, and discuss the sensitivity of our results to these sample points.

Fermilab Terabyte IDE RAID-5 Disk Arrays

by D A Sanders , L M Cremaldi , V Eschenburg , R Godang , C N Lawrence , C Riley , D J Summers , D L Petravick
"... High energy physics experiments are currently recording large amounts of data and in a few years will be recording prodigious quantities of data. New methods must be developed to handle this data and make analysis at universities possible. We examine some techniques that exploit recent developments ..."
Abstract - Add to MetaCart
developments in commodity hardware. We report on tests of redundant arrays of integrated drive electronics (IDE) disk drives for use in offline high energy physics data analysis. IDE redundant array of inexpensive disks (RAID) prices now are less than the cost per terabyte of million-dollar tape robots
Next 10 →
Results 1 - 10 of 249
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University