Results 1 
8 of
8
Beyond Market Baskets: Generalizing Association Rules To Dependence Rules
, 1998
"... One of the more wellstudied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market bask ..."
Abstract

Cited by 489 (7 self)
 Add to MetaCart
One of the more wellstudied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chisquared test for independence from classical statistics. This leads to a measure that is upwardclosed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm’s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.
Analysis of a very large AltaVista query log
, 1998
"... In this paper we present an analysis of a 280 GB AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents approximately 285 million user sessions, each an attempt to fill a single information need. We present an a ..."
Abstract

Cited by 164 (2 self)
 Add to MetaCart
In this paper we present an analysis of a 280 GB AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents approximately 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. Furthermore we present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques might not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result indicates it may be useful for search engines to consider search terms as parts of phrases even if the user did not explicitly specify them as such. 1
Milestones in the history of thematic cartography, statistical graphics, and data visualization
 13TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2002), AIX EN PROVENCE
, 1995
"... ..."
D I G I T a L
"... In this paper we present an analysis of a 280 GB AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents approximately 285 million user sessions, each an attempt to fill a single information need. We present ..."
Abstract
 Add to MetaCart
In this paper we present an analysis of a 280 GB AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents approximately 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. Furthermore we present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques might not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases....
and
, 2008
"... In this paper, we consider identifying codes in binary Hamming spaces F n, i.e., in binary hypercubes. The concept of identifying codes was introduced by Karpovsky, Chakrabarty and Levitin in 1998. Currently, the subject forms a topic of its own with several possible applications, for example, to se ..."
Abstract
 Add to MetaCart
In this paper, we consider identifying codes in binary Hamming spaces F n, i.e., in binary hypercubes. The concept of identifying codes was introduced by Karpovsky, Chakrabarty and Levitin in 1998. Currently, the subject forms a topic of its own with several possible applications, for example, to sensor networks. Let C ⊆ F n. For any X ⊆ F n, denote by Ir(X) = Ir(C; X) the set of elements of C within distance r from at least one x ∈ X. Now C ⊆ F n is called an (r, ≤ ℓ)identifying code if the sets Ir(X) are distinct for all X ⊆ F n of size at most ℓ. Let us denote by M (≤ℓ) r (n) the smallest possible cardinality of an (r, ≤ ℓ)identifying code. In [14], it is shown for ℓ = 1 that 1 lim n→∞ n log (≤ℓ) 2 M r (n) = 1 − h(ρ) where r = ⌊ρn⌋, ρ ∈ [0, 1) and h(x) is the binary entropy function. In this paper, we prove that this result holds for any fixed ℓ ≥ 1 when ρ ∈ [0, 1/2). We also show that M (≤ℓ) r (n) = O(n 3/2) for every fixed ℓ and r slightly less than n/2, and give an explicit construction of small (r, ≤ 2)identifying codes for r = ⌊n/2 ⌋ − 1. ∗ Research supported by the Academy of Finland under grant 111940. 1 1
Maty’s Biography of Abraham De Moivre, Translated, Annotated and Augmented
, 708
"... Abstract. November 27, 2004, marked the 250th anniversary of the death of Abraham De Moivre, best known in statistical circles for his famous largesample approximation to the binomial distribution, whose generalization is now referred to as the Central Limit Theorem. De Moivre was one of the great ..."
Abstract
 Add to MetaCart
Abstract. November 27, 2004, marked the 250th anniversary of the death of Abraham De Moivre, best known in statistical circles for his famous largesample approximation to the binomial distribution, whose generalization is now referred to as the Central Limit Theorem. De Moivre was one of the great pioneers of classical probability theory. He also made seminal contributions in analytic geometry, complex analysis and the theory of annuities. The first biography of De Moivre, on which almost all subsequent ones have since relied, was written in French by Matthew Maty. It was published in 1755 in the Journal britannique. The authors provide here, for the first time, a complete translation into English of Maty’s biography of De Moivre. New material, much of it taken from modern sources, is given in footnotes, along with numerous annotations designed to provide additional clarity to Maty’s biography for contemporary readers.
and
, 2008
"... In this paper, we consider identifying codes in binary Hamming spaces F n, i.e., in binary hypercubes. The concept of identifying codes was introduced by Karpovsky, Chakrabarty and Levitin in 1998. Currently, the subject forms a topic of its own with several possible applications, for example, to se ..."
Abstract
 Add to MetaCart
In this paper, we consider identifying codes in binary Hamming spaces F n, i.e., in binary hypercubes. The concept of identifying codes was introduced by Karpovsky, Chakrabarty and Levitin in 1998. Currently, the subject forms a topic of its own with several possible applications, for example, to sensor networks. Let C ⊆ F n. For any X ⊆ F n, denote by Ir(X) = Ir(C; X) the set of elements of C within distance r from at least one x ∈ X. Now C ⊆ F n is called an (r, ≤ ℓ)identifying code if the sets Ir(X) are distinct for all X ⊆ F n of size at most ℓ. Let us denote by M (≤ℓ) r (n) the smallest possible cardinality of an (r, ≤ ℓ)identifying code. In [14], it is shown for ℓ = 1 that 1 lim n→ ∞ n log2 M(≤ℓ) r (n) = 1 − h(ρ) where r = ⌊ρn⌋, ρ ∈ [0, 1) and h(x) is the binary entropy function. In this paper, we prove that this result holds for any fixed ℓ ≥ 1 when ρ ∈ [0, 1/2). We also show that M (≤ℓ) r (n) = O(n 3/2) for every fixed ℓ and r slightly less than n/2, and give an explicit construction of small (r, ≤ 2)identifying codes for r = ⌊n/2 ⌋ − 1. ∗ Research supported by the Academy of Finland under grant 111940. 1 1