Results 1 - 10
of
249
iSAX: Indexing and Mining Terabyte Sized Time Series, SIGKDD. pp
, 2008
"... Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, it has not led to algorithms that can scale to the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a nove ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, it has not led to algorithms that can scale to the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a novel multiresolution symbolic representation can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. Our approach allows both fast exact search and ultra fast approximate search. We show how to exploit the combination of both types of search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing millions of time series.
Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized
"... The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astrono ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk/tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can
cgmOLAP: Efficient Parallel Generation and Querying of Terabyte Size ROLAP Data Cubes
- In Proc. 22nd International Conference on Data Engineering (ICDE
, 2006
"... We present the cgmOLAP server, the first fully functional parallel OLAP system able to build data cubes at a rate of more than 1 Terabyte per hour. cgmOLAP incorporates a variety of novel approaches for the parallel computation of full cubes, partial cubes, and iceberg cubes as well as new parallel ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We present the cgmOLAP server, the first fully functional parallel OLAP system able to build data cubes at a rate of more than 1 Terabyte per hour. cgmOLAP incorporates a variety of novel approaches for the parallel computation of full cubes, partial cubes, and iceberg cubes as well as new parallel
F́ısréal: A low cost terabyte search engine
- In Proceeding of European Conference in IR
, 2005
"... Abstract. In this poster we describe the development of a distributed search engine, referred to as Físréal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from the structure ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Abstract. In this poster we describe the development of a distributed search engine, referred to as Físréal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from
The TREC Terabyte retrieval track
- SIGIR Forum
, 2005
"... The Terabyte Retrieval Track of the Text REtrieval Conference (TREC) provides an opportunity to test retrieval techniques and evaluation methodologies in the context of a terabyte-scale corpus. Given the size of the corpus, the track also provides a vehicle for participants to investigate query and ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The Terabyte Retrieval Track of the Text REtrieval Conference (TREC) provides an opportunity to test retrieval techniques and evaluation methodologies in the context of a terabyte-scale corpus. Given the size of the corpus, the track also provides a vehicle for participants to investigate query
Effective smoothing for a terabyte of text
- In The Fourteenth Text REtrieval Conference (TREC 2005). National Institute of Standards and Technology. NIST Special Publication
, 2006
"... track, we conducted a range of experiments investigating the effects of larger collections. Our main findings can be summarized as follows. First, we tested whether our retrieval system scales up to terabyte-scale collections. We found that our retrieval system can handle 25 million documents, altho ..."
Abstract
-
Cited by 14 (12 self)
- Add to MetaCart
track, we conducted a range of experiments investigating the effects of larger collections. Our main findings can be summarized as follows. First, we tested whether our retrieval system scales up to terabyte-scale collections. We found that our retrieval system can handle 25 million documents
Hive- A Warehousing Solution Over a Map-Reduce Framework
- IN VLDB '09: PROCEEDINGS OF THE VLDB ENDOWMENT
, 2009
"... The size of data sets being collected and analyzed in the
industry for business intelligence is growing rapidly, mak-
ing traditional warehousing solutions prohibitively expen-
sive. Hadoop [3] is a popular open-source map-reduce im-
plementation which is being used as an alternative to store
and pr ..."
Abstract
-
Cited by 265 (1 self)
- Add to MetaCart
The size of data sets being collected and analyzed in the
industry for business intelligence is growing rapidly, mak-
ing traditional warehousing solutions prohibitively expen-
sive. Hadoop [3] is a popular open-source map-reduce im-
plementation which is being used as an alternative to store
THE FEASIBILITY OF MOVING TERABYTE FILES BETWEEN CAMPUS AND CLOUD
"... Cloud/Grid computing is envisioned to be a predominant computing model of the future. The movement of files between cloud and client is intrinsic to this model. With the creation of ever expanding data sets, the sizes of files have increased dramatically. Consequently, terabyte file transfers are ex ..."
Abstract
- Add to MetaCart
Cloud/Grid computing is envisioned to be a predominant computing model of the future. The movement of files between cloud and client is intrinsic to this model. With the creation of ever expanding data sets, the sizes of files have increased dramatically. Consequently, terabyte file transfers
Searching a Terabyte of Text Using Partial Replication
, 1999
"... The explosion of content in distributed information retrieval (IR) systems requires new mechanisms in order to attain timely and accurate retrieval of unstructured text. In this paper, we investigate using partial replication to search a terabyte of text in our distributed IR system. We use a rep ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
loads with partial replication on a terabyte text database. We further investigate query locality with respect to time, replica size, and replica updating costs using real logs from THOMAS and Excite, and discuss the sensitivity of our results to these sample points.
Fermilab Terabyte IDE RAID-5 Disk Arrays
"... High energy physics experiments are currently recording large amounts of data and in a few years will be recording prodigious quantities of data. New methods must be developed to handle this data and make analysis at universities possible. We examine some techniques that exploit recent developments ..."
Abstract
- Add to MetaCart
developments in commodity hardware. We report on tests of redundant arrays of integrated drive electronics (IDE) disk drives for use in offline high energy physics data analysis. IDE redundant array of inexpensive disks (RAID) prices now are less than the cost per terabyte of million-dollar tape robots
Results 1 - 10
of
249