IDENTIFICATION OF GENETIC NETWORKS FROM A SMALL NUMBER OF GENE EXPRESSION PATTERNS UNDER THE BOOLEAN NETWORK MODEL
 PACIFIC SYMPOSIUM ON BIOCOMPUTING 4:1728 (1999)
, 1999
"for inferring genetic network architectures from state transition tables which correspond to time series of gene expression patterns, using the Boolean network model. Their results of computational experiments suggested that a small number of state transition (INPUT/OUTPUT) pairs are sufficient"
Abstract

Cited by 181 (16 self)
... for inferring genetic network architectures from state transition tables which correspond to time series of gene expression patterns, using the Boolean network model. Their results of computational experiments suggested that a small number of state transition (INPUT/OUTPUT) pairs are sufficient in order to infer the original Boolean network correctly. This paper gives a mathematical proof for their observation. Precisely, this paper devises a much simpler algorithm for the same problem and proves that, if the indegree of each node (i.e., the number of input nodes to each node) is bounded by a constant, only O(log n) state transition pairs (from 2n pairs) are necessary and sufficient to identify the original Boolean network of n nodes correctly with high probability. We made computational experiments in order to expose the constant factor involved in O(log n) notation. The computational results show that the Boolean network of size 100,000 can be identified by our algorithm from about 100 INPUT/OUTPUT pairs if the maximum indegree is bounded by 2. It is also a merit of our algorithm that the algorithm is conceptually so simple that it is extensible for more realistic network models.
Learning to resolve natural language ambiguities: A unified approach
 In Proceedings of the National Conference on Artificial Intelligence. 806813. Segond F., Schiller A., Grefenstette & Chanod F.P
, 1998
"distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context word associations and syntactic patterns in this case are sufficient"
Abstract

Cited by 169 (84 self)
distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context word associations and synand machine learning algorithms for natural language tactic patterns in this case are sufflcicnt to identify disambiguation tasks and observe tha they can bc recast as learning linear separators in the feature space. the correct form. Each of the methods makes a priori assumptions, which Many of these arc important standalone problems it employs, given the data, when searching for its hy but even more important is thei role in many applicapothesis. Nevertheless, as we show, it searches a space tions including speech recognition, machine translation, that is as rich as the space of all linear separators. information extraction and intelligent humanmachine We use this to build an argument for a data driven interaction. Most of the ambiguity resolution problems approach which merely searches for a good linear sepa are at the lower level of the natural language inferences rator in the feature space, without further assumptions chain; a wide range and a large number of ambigui
Learning with Labeled and Unlabeled Data
, 2001
"In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as"
Abstract

Cited by 165 (3 self)
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
Mapreduce for machine learning on multicore
 In Proceedings of NIPS
, 2007
"We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we dev"
Abstract

Cited by 138 (7 self)
We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallel programming method, one that is easily applied to many different learning algorithms. Our work is in distinct contrast to the tradition in machine learning of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show that algorithms that fit the Statistical Query model [15] can be written in a certain “summation form, ” which allows them to be easily parallelized on multicore computers. We adapt Google’s mapreduce [7] paradigm to demonstrate this parallel speed up technique on a variety of learning algorithms including locally weighted linear regression (LWLR), kmeans, logistic regression
FloatBoost Learning and Statistical Face Detection
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2004
"A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential fun"
Abstract

Cited by 125 (4 self)
A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential function of the margin as in the traditional AdaBoost algorithms. A second contribution of the paper is a novel statistical model for learning best weak classifiers using a stagewise approximation of the posterior probability. These novel techniques lead to a classifier which requires fewer weak classifiers than AdaBoost yet achieves lower error rates in both training and testing, as demonstrated by extensive experiments. Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first realtime multiview face detection system reported.
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
Exploiting Task Relatedness for Multiple Task Learning
, 2003
"The approach of learning of multiple "related" tasks simultaneously has proven quite successful in practice; however, theoretical justification for this success has remained elusive. The starting point of previous work on multiple task learning has been that the tasks to be learnt jointly are someho"
Abstract

Cited by 88 (1 self)
The approach of learning of multiple "related" tasks simultaneously has proven quite successful in practice; however, theoretical justification for this success has remained elusive. The starting point of previous work on multiple task learning has been that the tasks to be learnt jointly are somehow "algorithmically related", in the sense that the results of applying a specific learning algorithm to these tasks are assumed to be similar. We take a logical step backwards and offer a data generating mechanism through which our notion of taskrelatedness is defined.
Ontological Semantics
, 2004
"This book introduces ontological semantics, a comprehensive approach to the treatment of text meaning by computer. Ontological semantics is an integrated complex of theories, methodologies, descriptions and implementations. In ontological semantics, a theory is viewed as a set of statements determin"
Abstract

Cited by 85 (27 self)
This book introduces ontological semantics, a comprehensive approach to the treatment of text meaning by computer. Ontological semantics is an integrated complex of theories, methodologies, descriptions and implementations. In ontological semantics, a theory is viewed as a set of statements determining the format of descriptions of the phenomena with which the theory deals. A theory is associated with a methodology used to obtain the descriptions. Implementations are computer systems that use the descriptions to solve specific problems in text processing. Implementations of ontological semantics are combined with other processing systems to produce applications, such as information extraction or machine translation. The theory of ontological semantics is built as a society of microtheories covering such diverse ground as specific language phenomena, world knowledge organization, processing heuristics and issues relating to knowledge representation and implementation system architecture. The theory briefly sketched above is a toplevel microtheory, the ontological semantics theory per se. Descriptions in ontological semantics include text meaning representations, lexical entries, ontological concepts and instances as well as procedures for manipulating texts and their meanings. Methodologies in ontological semantics are sets of techniques and instructions for acquiring and
Self Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization
, 2001
"We present a stochastic clustering algorithm which uses pairwise similarity of elements, and show how it can be used to address various problems in computer vision, including the lowlevel image segmentation, midlevel perceptual grouping, and highlevel image database organization. The clustering p"
Abstract

Cited by 76 (4 self)
We present a stochastic clustering algorithm which uses pairwise similarity of elements, and show how it can be used to address various problems in computer vision, including the lowlevel image segmentation, midlevel perceptual grouping, and highlevel image database organization. The clustering problem is viewed as a graph partitioning problem, where nodes represent data elements and the weights of the edges represent pairwise similarities. We generate samples of cuts in this graph, by using Karger's contraction algorithm, and compute an "average" cut which provides the basis for our solution to the clustering problem. The stochastic nature of our method makes it robust against noise, including accidental edges and small spurious clusters. The complexity of our algorithm is very low: O(E log² N) for N objects, E similarity relations and a fixed accuracy level. In addition, and without additional computational cost, our algorithm provides a hierarchy of nested partitions. We demonstrate the superiority of our method for image segmentation on a few synthetic and real images, B&W and color. Our other examples include the concatenation of edges in a cluttered scene (perceptual grouping), and the organization of an image database for the purpose of multiview 3D object recognition.
Analysis of a greedy active learning strategy
, 2005
"We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sample complexity. We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning r"
Abstract

Cited by 76 (3 self)
We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sample complexity. We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning rule is approximately as good as any other strategy for minimizing this number of labels.