Extracting Relations from Large PlainText Collections
, 2000
"... Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables fr ..."
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection. We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plaintext documents. At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention, and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a t...
Graphical models, exponential families, and variational inference
, 2008
"... The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fiel ..."
The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical
On the Construction of EnergyEfficient Broadcast and Multicast Trees in Wireless Networks
, 2000
Predicting How People Play Games: Reinforcement Learning . . .
 AMERICAN ECONOMIC REVIEW
, 1998
Logical foundations of objectoriented and framebased languages
 JOURNAL OF THE ACM
, 1995
"... We propose a novel formalism, called Frame Logic (abbr., Flogic), that accounts in a clean and declarative fashion for most of the structural aspects of objectoriented and framebased languages. These features include object identity, complex objects, inheritance, polymorphic types, query methods, ..."
We propose a novel formalism, called Frame Logic (abbr., Flogic), that accounts in a clean and declarative fashion for most of the structural aspects of objectoriented and framebased languages. These features include object identity, complex objects, inheritance, polymorphic types, query methods, encapsulation, and others. In a sense, Flogic stands in the same relationship to the objectoriented paradigm as classical predicate calculus stands to relational programming. Flogic has a modeltheoretic semantics and a sound and complete resolutionbased proof theory. A small number of fundamental concepts that come from objectoriented programming have direct representation in Flogic; other, secondary aspects of this paradigm are easily modeled as well. The paper also discusses semantic issues pertaining to programming with a deductive objectoriented language based on a subset of Flogic.
ROCK: A Robust Clustering Algorithm for Categorical Attributes
 In Proc.ofthe15thInt.Conf.onDataEngineering
, 2000
"... Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than point ..."
Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than
Nonprojective dependency parsing using spanning tree algorithms
 In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
, 2005
"... We formalize weighted dependency parsing as searching for maximum spanning trees (MSTs) in directed graphs. Using this representation, the parsing algorithm of Eisner (1996) is sufficient for searching over all projective trees in O(n 3) time. More surprisingly, the representation is extended natura ..."
naturally to nonprojective parsing using ChuLiuEdmonds (Chu and Liu, 1965; Edmonds, 1967) MST algorithm, yielding an O(n 2) parsing algorithm. We evaluate these methods on the Prague Dependency Treebank using online largemargin learning techniques (Crammer et al., 2003; McDonald et al., 2005) and show
Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh
, 2003
"... In recent years, overlay networks have become an effective alternative to IP multicast for efficient point to multipoint communication across the Internet. Typically, nodes selforganize with the goal of forming an efficient overlay tree, one that meets performance targets without placing undue burd ..."
burden on the underlying network. In this paper, we target highbandwidth data distribution from a single source to a large number of receivers. Applications include largefile transfers and realtime multimedia streaming. For these applications, we argue that an overlay mesh, rather than a tree, can
Polynomial time approximation schemes for Euclidean TSP and other geometric problems
 In Proceedings of the 37th IEEE Symposium on Foundations of Computer Science (FOCS’96
, 1996
"... Abstract. We present a polynomial time approximation scheme for Euclidean TSP in fixed dimensions. For every fixed c � 1 and given any n nodes in � 2, a randomized version of the scheme finds a (1 � 1/c)approximation to the optimum traveling salesman tour in O(n(log n) O(c) ) time. When the nodes a ..."
to Christofides) achieves a 3/2approximation in polynomial time. We also give similar approximation schemes for some other NPhard Euclidean problems: Minimum Steiner Tree, kTSP, and kMST. (The running times of the algorithm for kTSP and kMST involve an additional multiplicative factor k.) The previous best
