Samplingbased estimation of the number of distinct values of an attribute
 In Proc. of VLDB
, 1995
Random Sampling for Histogram Construction: How much is enough?
, 1998
"... Random sampling is a standard technique for constructing (approximate) histograms for query optimization. However, any real implementation in commercial products requires solving the hard problem of determining "How much sampling is enough?" We address this critical question in the context ..."
Cited by 115 (11 self)
Random sampling is a standard technique for constructing (approximate) histograms for query optimization. However, any real implementation in commercial products requires solving the hard problem of determining "How much sampling is enough?" We address this critical question in the context of equiheight histograms used in many commercial products, including Microsoft SQL Server. We introduce a conservative error metric capturing the intuition that for an approximate histogram to have low error, the error must be small in all regions of the histogram. We then present a result establishing an optimal bound on the amount of sampling required for prespecified error bounds. We also describe an adaptive page sampling algorithm which achieves greater efficiency by using all values in a sampled page but adjusts the amount of sampling depending on clustering of values in pages. Next, we establish that the problem of estimating the number of distinct values is provably difficult, but propose ...
Does size matter? The relationship between pond area and biodiversity
 Biol. Cons
, 2002
"... Larger areas support more species. To test the application of this biogeographic principle to ponds, we consider the relationship between size and diversity for 80 ponds in Switzerland, using richness (number of species) and conservation value (score for all species present, according to their degre ..."
Cited by 10 (1 self)
Larger areas support more species. To test the application of this biogeographic principle to ponds, we consider the relationship between size and diversity for 80 ponds in Switzerland, using richness (number of species) and conservation value (score for all species present, according to their degree of rarity) of aquatic plants, molluscs (Gastropoda, Sphaeriidae), Coleoptera, Odonata (adults) and Amphibia. Pond size was found to be important only for Odonata and explained 31 % of the variability of their species richness. Pond size showed only a feeble relationship with the species richness of all other groups, particularly the Coleoptera and Amphibia. The weakness of this relationship was also indicated by the low zvalues obtained (< 0.13). The SLOSS analyses showed that a set of ponds of small size has more species and has a higher conservation value than a single large pond of the same total area. But we also show that large ponds harbour species missing in the smaller ponds. Finally, we conclude that in a global con
Robust estimation of population size in closed animal populations from capture~recapture experiments
, 1983
"... This paper considers the problem of finding robust estimators of population size in closed Ksample capturerecapture experimerts.Particular attention is paid to models where heterogeneity of capture probabilities is allowed. First a general estimation procedure is given which does not depend on ass ..."
Cited by 9 (0 self)
This paper considers the problem of finding robust estimators of population size in closed Ksample capturerecapture experimerts.Particular attention is paid to models where heterogeneity of capture probabilities is allowed. First a general estimation procedure is given which does not depend on assuming anything about the form of the distribution of capture probabilities. This is followed by a detailed discussion of the usefulness of the generalized jackknife technique to reduce bias. Numerical comparisons of the bias and variance of various estimators are given. Finally a general discussion is given with several recommendations on estimators to be used in practice. Key words: Capturerecapture sampling; Population size estimation; Heterogeneity;
Estimating size and composition of biological communities by modeling the occurrence of species
 Journal of the American Statistical Association
, 2005
"... We develop a model that uses repeated observations of a biological community to estimate the number and composition of species in the community. Estimators of communitylevel attributes are constructed from modelbased estimators of occurrence of individual species that incorporate imperfect detecti ..."
Cited by 8 (1 self)
We develop a model that uses repeated observations of a biological community to estimate the number and composition of species in the community. Estimators of communitylevel attributes are constructed from modelbased estimators of occurrence of individual species that incorporate imperfect detection of individuals. Data from the North American Breeding Bird Survey are analyzed to illustrate the variety of ecologicallyimportant quantities that are easily constructed and estimated using our modelbased estimators of species occurrence. In particular, we compute sitespecific estimates of species richness that honor classical notions of speciesarea relationships. We suggest extensions of our model to estimate maps of occurrence of individual species and to compute inferences related to the temporal and spatial dynamics of biological communities.
1754 Population estimation with sparse data: the role of estimators versus indices revisited
"... Abstract: The use of indices to evaluate smallmammal populations has been heavily criticized, yet a review of smallmammal studies published from 1996 through 2000 indicated that indices are still the primary methods employed for measuring populations. The literature review also found that 98 % of t ..."
Cited by 5 (1 self)
Abstract: The use of indices to evaluate smallmammal populations has been heavily criticized, yet a review of smallmammal studies published from 1996 through 2000 indicated that indices are still the primary methods employed for measuring populations. The literature review also found that 98 % of the samples collected in these studies were too small for reliable selection among populationestimation models. Researchers therefore generally have a choice between
Natural diversity of Frankia strains in actinorhizal root nodules from promiscuous hosts in the family Myricaceae
 Appl. Environ
, 1999
"... Actinorhizal plants invade nitrogenpoor soils because of their ability to form root nodule symbioses with N2fixing actinomycetes known as Frankia. Frankia strains are difficult to isolate, so the diversity of strains inhabiting nodules in nature is not known. To address this problem, we have used ..."
Cited by 5 (0 self)
Actinorhizal plants invade nitrogenpoor soils because of their ability to form root nodule symbioses with N2fixing actinomycetes known as Frankia. Frankia strains are difficult to isolate, so the diversity of strains inhabiting nodules in nature is not known. To address this problem, we have used the variability in bacterial 16S rRNA gene sequences amplified from root nodules as a means to estimate molecular diversity. Nodules were collected from 96 sites primarily in northeastern North America; each site contained one of three species of the family Myricaceae. Plants in this family are considered to be promiscuous hosts because several species are effectively nodulated by most isolated strains of Frankia in the greenhouse. We found that strain evenness varies greatly between the plant species so that estimating total strain richness of Frankia within myricaceous nodules with the sample size used was problematical. Nevertheless, Myrica pensylvanica, the common bayberry, was found to have sufficient diversity to serve as a reservoir host for Frankia strains that infect plants from other actinorhizal families. Myrica gale, sweet gale, yielded a few dominant sequences, indicating either symbiont specialization or niche selection of particular ecotypes. Strains in Comptonia peregrina nodules had an intermediate level of diversity and were all from a single major group of Frankia. Actinorhizal plants are defined by their ability to form N2fixing root nodule symbioses with actinomycetes from the ge
A Bayesian CaptureRecapture Population Model With Simultaneous Estimation of Heterogeneity
 Journal of the American Statistical Association
, 2008
"... We develop a Bayesian capture–recapture model that provides estimates of abundance as well as timevarying and heterogeneous survival and capture probability distributions. The model uses a statespace approach by incorporating an underlying population model and an observation model, and here is app ..."
Cited by 2 (0 self)
We develop a Bayesian capture–recapture model that provides estimates of abundance as well as timevarying and heterogeneous survival and capture probability distributions. The model uses a statespace approach by incorporating an underlying population model and an observation model, and here is applied to photoidentification data to estimate trends in the abundance and survival of a population of bottlenose dolphins (Tursiops truncatus) in northeast Scotland. Novel features of the model include simultaneous estimation of timevarying survival and capture probability distributions, estimation of heterogeneity effects for survival and capture, use of separate data to inflate the number of identified animals to the total abundance, and integration of separate observations of the same animals from right and left side photographs. A Bayesian approach using Markov chain Monte Carlo methods allows for uncertainty in measurement and parameters, and simulations confirm the model’s validity.
Assessing the efficacy of singlepass backpack electrofishing to characterize fish assemblage structure
 Transactions of the American Fisheries Society
, 2003
"... Abstract.—Twopass backpack electrofishing data collected as part of the U.S. Geological Survey’s National WaterQuality Assessment Program were analyzed to assess the efficacy of singlepass backpack electrofishing. A twocapture removal model was used to estimate, within 10 river basins across the ..."
Cited by 2 (1 self)
Abstract.—Twopass backpack electrofishing data collected as part of the U.S. Geological Survey’s National WaterQuality Assessment Program were analyzed to assess the efficacy of singlepass backpack electrofishing. A twocapture removal model was used to estimate, within 10 river basins across the United States, proportional fish species richness from onepass electrofishing and probabilities of detection for individual fish species. Mean estimated species richness from firstpass sampling (p̂s1) ranged from 80.7 % to 100 % of estimated total species richness for each river basin, based on at least seven samples per basin. However, p̂s1 values for individual sites ranged from 40 % to 100 % of estimated total species richness. Additional species unique to the second pass were collected in 50.3 % of the samples. Of these, cyprinids and centrarchids were collected most frequently. Proportional fish species richness estimated for the first pass increased significantly with decreasing stream width for 1 of the 10 river basins. When used to calculate probabilities of detection of individual fish species, the removal model failed 48 % of the time because the number of individuals of a species was greater in the second pass than in the first pass. Singlepass backpack electrofishing data alone may make it difficult to determine whether characterized fish community structure data are real or spurious. The twopass removal model can