Results 1  10
of
161
From frequency to meaning : Vector space models of semantics
 Journal of Artificial Intelligence Research
, 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract

Cited by 116 (2 self)
 Add to MetaCart
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
Conventional Wisdom on Measurement: A Structural Equation Perspective
 Psychological Bulletin
, 1991
"... The applicability of 5 conventional guidelines for construct measurement is critically examined: (a) Construct indicators should be internally consistent for valid measures, (b) there are optimal magnitudes of correlations between items, (c) the validity of measures depends on the adequacy with whic ..."
Abstract

Cited by 73 (0 self)
 Add to MetaCart
The applicability of 5 conventional guidelines for construct measurement is critically examined: (a) Construct indicators should be internally consistent for valid measures, (b) there are optimal magnitudes of correlations between items, (c) the validity of measures depends on the adequacy with which a specified domain is sampled, (d) withinconstruct correlations must be greater than betweenconstruct correlations, and (e) linear composites of indicators can replace latent variables. A structural equation perspective is used, showing that without an explicit measurement model relating indicators to latent variables and measurement errors, none of these conventional beliefs hold without qualifications. Moreover, a “causal ” indicator model is presented that sometimes better corresponds to the relation of indicators to a construct than does the classical test theory “effect ” indicator model. Factor analysis (Spearman, 1904) and classical test theory (Lord & Novick, 1968; Spearman, 1910) have influenced perspectives on measurement not only in psychology but in most of the social sciences. These traditions have given rise to criteria to select “good ” measures and to a number of beliefs about the
Combining Content and Link for Classification using Matrix Factorization
, 2007
"... The world wide web contains rich textual contents that are interconnected via complex hyperlinks. This huge database violates the assumption held by most of conventional statistical methods that each web page is considered as an independent and identical sample. It is thus difficult to apply traditi ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
The world wide web contains rich textual contents that are interconnected via complex hyperlinks. This huge database violates the assumption held by most of conventional statistical methods that each web page is considered as an independent and identical sample. It is thus difficult to apply traditional mining or learning methods for solving web mining problems, e.g., web page classification, by exploiting both the content and the link structure. The research in this direction has recently received considerable attention but are still in an early stage. Though a few methods exploit both the link structure or the content information, some of them combine the only authority information with the content information, and the others first decompose the link structure into hub and authority features, then apply them as additional document features. Being practically attractive for its great simplicity, this paper aims to design an algorithm that exploits both the content and linkage information, by carrying out a joint factorization on both the linkage adjacency matrix and the documentterm matrix, and derives a new representation for web pages in a lowdimensional factor space, without explicitly separating them as content, hub or authority factors. Further analysis can be performed based on the compact representation of web pages. In the experiments, the proposed method is compared with stateoftheart methods and demonstrates an excellent accuracy in hypertext classification on the WebKB and Cora benchmarks.
of Labor The Economics and Psychology of Personality Traits
"... Any opinions expressed here are those of the author(s) and not those of IZA. Research published in this series may include views on policy, but the institute itself takes no institutional policy positions. The Institute for the Study of Labor (IZA) in Bonn is a local and virtual international resear ..."
Abstract

Cited by 37 (10 self)
 Add to MetaCart
Any opinions expressed here are those of the author(s) and not those of IZA. Research published in this series may include views on policy, but the institute itself takes no institutional policy positions. The Institute for the Study of Labor (IZA) in Bonn is a local and virtual international research center and a place of communication between science, politics and business. IZA is an independent nonprofit organization supported by Deutsche Post World Net. The center is associated with the University of Bonn and offers a stimulating research environment through its international network, workshops and conferences, data service, project support, research visits and doctoral program. IZA engages in (i) original and internationally competitive research in all fields of labor economics, (ii) development of policy concepts, and (iii) dissemination of research results and concepts to the interested public. IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper should account for its provisional character. A revised version may be available directly from the author. IZA Discussion Paper No. 3333
Weak and Strong Cross Section Dependence and Estimation of Large Panels
, 2009
"... This paper introduces the concepts of timespecific weak and strong cross section dependence. A doubleindexed process is said to be cross sectionally weakly dependent at a given point in time, t, if its weighted average along the cross section dimension (N) converges to its expectation in quadratic ..."
Abstract

Cited by 36 (18 self)
 Add to MetaCart
This paper introduces the concepts of timespecific weak and strong cross section dependence. A doubleindexed process is said to be cross sectionally weakly dependent at a given point in time, t, if its weighted average along the cross section dimension (N) converges to its expectation in quadratic mean, as N is increased without bounds for all weights that satisfy certain ‘granularity’ conditions. Relationship with the notions of weak and strong common factors is investigated and an application to the estimation of panel data models with an infinite number of weak factors and a finite number of strong factors is also considered. The paper concludes with a set of Monte Carlo experiments where the small sample properties of estimators based on principal components and CCE estimators are investigated and compared under various assumptions on the nature of the unobserved common effects.
Beyond the Turing Test
 J. Logic, Language & Information
"... Abstract. We define the main factor of intelligence as the ability to comprehend, formalising this ability with the help of new constructs based on descriptional complexity. The result is a comprehension test, or Ctest, exclusively defined in terms of universal descriptional machines (e.g universal ..."
Abstract

Cited by 33 (18 self)
 Add to MetaCart
Abstract. We define the main factor of intelligence as the ability to comprehend, formalising this ability with the help of new constructs based on descriptional complexity. The result is a comprehension test, or Ctest, exclusively defined in terms of universal descriptional machines (e.g universal Turing machines). Despite the absolute and nonanthropomorphic character of the test it is equally applicable to both humans and machines. Moreover, it correlates with classical psychometric tests, thus establishing the first firm connection between information theoretic notions and traditional IQ tests. The Turing Test is compared with the Ctest and their joint combination is discussed. As a result, the idea of the Turing Test as a practical test of intelligence should be left behind, and substituted by computational and factorial tests of different cognitive abilities, a much more useful approach for artificial intelligence progress and for many other intriguing questions that are presented beyond the Turing Test.
Geometric Methods for Feature Extraction and Dimensional Reduction
 In L. Rokach and O. Maimon (Eds.), Data
, 2005
"... Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component anal ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, and oriented PCA; and for the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian eigenmaps and spectral clustering. The Nyström method, which links several of the algorithms, is also reviewed. The goal is to provide a selfcontained review of the concepts and mathematics underlying these algorithms.
A Formal Definition of Intelligence Based on an Intensional Variant of Algorithmic Complexity
 In Proceedings of the International Symposium of Engineering of Intelligent Systems (EIS'98
, 1998
"... Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like lcalculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & Hernndez 1993] ..."
Abstract

Cited by 30 (17 self)
 Add to MetaCart
Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like lcalculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & Hernndez 1993] with two remarkable features inherited from its objectoriented coding in C++: it is easily tunable for our needs, and it is efficient. We have made it even more reduced, removing any operand in the instruction set, even for the loop operations. We have only three registers which are AX (the accumulator), BX and CX. The operations Q b we have used for our experiment are in Table 1: LOOPTOP Decrements CX. If it is not equal to the first element jump to the program top.
Algebraic factor analysis: tetrads, pentads and beyond
"... Factor analysis refers to a statistical model in which observed variables are conditionally independent given fewer hidden variables, known as factors, and all the random variables follow a multivariate normal distribution. The parameter space of a factor analysis model is a subset of the cone of po ..."
Abstract

Cited by 28 (12 self)
 Add to MetaCart
Factor analysis refers to a statistical model in which observed variables are conditionally independent given fewer hidden variables, known as factors, and all the random variables follow a multivariate normal distribution. The parameter space of a factor analysis model is a subset of the cone of positive definite matrices. This parameter space is studied from the perspective of computational algebraic geometry. Gröbner bases and resultants are applied to compute the ideal of all polynomial functions that vanish on the parameter space. These polynomials, known as model invariants, arise from rank conditions on a symmetric matrix under elimination of the diagonal entries of the matrix. Besides revealing the geometry of the factor analysis model, the model invariants also furnish useful statistics for testing goodnessoffit. 1
Investigating the querying and browsing behavior of advanced search engine users
 In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2007
"... One way to help all users of commercial Web search engines be more successful in their searches is to better understand what those users with greater search expertise are doing, and use this knowledge to benefit everyone. In this paper we study the interaction logs of advanced search engine users (a ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
One way to help all users of commercial Web search engines be more successful in their searches is to better understand what those users with greater search expertise are doing, and use this knowledge to benefit everyone. In this paper we study the interaction logs of advanced search engine users (and those not so advanced) to better understand how these user groups search. The results show that there are marked differences in the queries, result clicks, postquery browsing, and search success of users we classify as advanced (based on their use of query operators), relative to those classified as nonadvanced. Our findings have implications for how advanced users should be supported during their searches, and how their interactions could be used to help searchers of all experience levels find more relevant information and learn improved searching strategies. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: query formulation, search process, relevance feedback.