• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Ranked Similarity Search of Scientific Datasets (2014)

by V M Megler
Add To MetaCart

Tools

Sorted by:
Results 1 - 2 of 2

Data Like This: Ranked Search of Genomic Data Vision Paper

by V. M. Megler, David Maier, Daniel Bottomly, Libbey White, Shannon Mcweeney Beth - in Proceedings of the Second International Workshop on Databases and the Web , 2015
"... High-throughput genetic sequencing produces the ultimate “big data”: a human genome sequence contains more than 3B base pairs, and more and more characteristics, or annotations, are being recorded at the base-pair level. Locating areas of interest within the genome is a challenge for researchers, li ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
High-throughput genetic sequencing produces the ultimate “big data”: a human genome sequence contains more than 3B base pairs, and more and more characteristics, or annotations, are being recorded at the base-pair level. Locating areas of interest within the genome is a challenge for researchers, limiting their investigations. We describe our vision of adapting “big data” ranked search to the problem of searching the genome. Our goal is to make searching for data as easy for scientists as searching the Internet. Categories and Subject Descriptors
(Show Context)

Citation Context

...racteristics.sIn prior research, we successfullysapplied this approach to searching aslarge archive of numeric data stored insdiverse formats in an ocean observatorys(“Data Near Here” (DNH), at CMOP)s=-=[9, 10]-=-. Figure 1 shows our high-levelsarchitecture, adapted from Internetssearch architectures. The AsynchronoussIndexing component performs a onetime scan of each dataset in an archivesto construct a summa...

adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011 Challenges for Dataset Search

by David Maier, V. M. Megler, Kristin Tufte
"... Abstract. Ranked search of datasets has emerged as a need as shared scientific archives grow in size and variety. Our own investigations have shown that IR-style, feature-based relevance scoring can be an effective tool for data discovery in scientific archives. However, maintaining interactive resp ..."
Abstract - Add to MetaCart
Abstract. Ranked search of datasets has emerged as a need as shared scientific archives grow in size and variety. Our own investigations have shown that IR-style, feature-based relevance scoring can be an effective tool for data discovery in scientific archives. However, maintaining interactive response times as ar-chives scale will be a challenge. We report here on our exploration of perfor-mance techniques for Data Near Here, a dataset search service. We present a sample of results evaluating filter-restart techniques in our system, including two variations, adaptive relaxation and contraction. We then outline further di-rections for research in this domain.
(Show Context)

Citation Context

..., and lastly into ansentire cruise dataset. The most relevant portion to Lynda is shown shaded on the left in theshierarchy, while the most relevant portion to Joel is shown shaded on the right. From =-=[13]-=-.s3 Performance Studys3.1 Filter-RestartsWhile most searches of the current CMOP archive take a few seconds, we are interested in understanding the effects of further growth on interactive response ti...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University