• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Updatable PAT-Tree approach to Chinese key phrase extraction using mutual information: a linguistic foundation for knowledge management (1999)

by T Ong, H Chen
Venue:Proceedings of the Second Asian Digital Library Conference
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

A graph-based recommender system for digital library

by Zan Huang, Wingyan Chung, Thian-huat Ong, Hsinchun Chen - In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries , 2002
"... Research shows that recommendations comprise a valuable service for users of a digital library [11]. While most existing recommender systems rely either on a content-based approach or a collaborative approach to make recommendations, there is potential to improve recommendation quality by using a co ..."
Abstract - Cited by 18 (4 self) - Add to MetaCart
Research shows that recommendations comprise a valuable service for users of a digital library [11]. While most existing recommender systems rely either on a content-based approach or a collaborative approach to make recommendations, there is potential to improve recommendation quality by using a combination of both approaches (a hybrid approach). In this paper, we report how we tested the idea of using a graph-based recommender system that naturally combines the content-based and collaborative approaches. Due to the similarity between our problem and a concept retrieval task, a Hopfield net algorithm was used to exploit high-degree book-book, useruser and book-user associations. Sample hold-out testing and preliminary subject testing were conducted to evaluate the system, by which it was found that the system gained improvement with respect to both precision and recall by combining content-based and collaborative approaches. However, no significant improvement was observed by exploiting high-degree associations.

A Graph Model for E-Commerce Recommender Systems

by Zan Huang, Wingyan Chung, Hsinchun Chen - Journal of the American Society for Information Science and Technology , 2004
"... this article, we review previous research in recommender systems to identify frequently used approaches and representations. Four recommendation approaches were examined: knowledge engineering, collaborative filtering, a content-based approach, and a hybrid approach. Different recommendation approac ..."
Abstract - Cited by 17 (5 self) - Add to MetaCart
this article, we review previous research in recommender systems to identify frequently used approaches and representations. Four recommendation approaches were examined: knowledge engineering, collaborative filtering, a content-based approach, and a hybrid approach. Different recommendation approaches can be implemented using different analytical methods. Commonly used methods are neighborhood formation, association rule mining, machine learning techniques, etc

Web Mining: Machine Learning for Web Applications

by Hsinchun Chen, Michael Chau - Annual Review of Information Science and Technology , 2004
"... With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich ..."
Abstract - Cited by 9 (7 self) - Add to MetaCart
With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich

Internet Searching and Browsing in a Multilingual World: An Experiment on the Chinese Business Intelligence Portal (CBizPort)

by Wingyan Chung, Yiwen Zhang, Zan Huang, Gang Wang, Thian-Huat Ong, Hsinchun Chen , 2004
"... this paper, we propose a generic and integrated approach to searching and browsing the Internet in a multilingual world. Based on this approach, we have developed the Chinese Business Intelligence Portal (CBizPort) , a meta-search engine that searches for business information of mainland China, Taiw ..."
Abstract - Cited by 7 (3 self) - Add to MetaCart
this paper, we propose a generic and integrated approach to searching and browsing the Internet in a multilingual world. Based on this approach, we have developed the Chinese Business Intelligence Portal (CBizPort) , a meta-search engine that searches for business information of mainland China, Taiwan, and Hong Kong. Additional functions provided by CBizPort include encoding conversion (between Simplified Chinese and Traditional Chinese), summarization, and categorization. Experimental results of our user evaluation study show that the searching and browsing performance of CBizPort was comparable to that of regional Chinese search engines, and CBizPort could significantly augment these search engines. Subjects' verbal comments indicate that CBizPort performed best in terms of analysis functions, cross-regional searching, and user-friendliness, whereas regional search engines were more efficient and more popular. Subjects especially liked CBizPort's summarizer and categorizer, which helped in understanding search results. These encouraging results suggest a promising future of our approach to Internet searching and browsing in a multilingual world

Improving Entropy Estimation and the Inference of Genetic Regulatory Networks

by Jean Hausser, Dépt Biosciences, Bâtiment Louis Pasteur, Avenue Jean Capelle, F- Villeurbanne Cedex , 2006
"... This paper explores how entropy and other information theoretic quantities may be used to reverseengineer genetic regulatory networks from repeated microarray data. The problem of differentiating genes that undergo direct coregulation from genes whose expression is similar because they belong to the ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This paper explores how entropy and other information theoretic quantities may be used to reverseengineer genetic regulatory networks from repeated microarray data. The problem of differentiating genes that undergo direct coregulation from genes whose expression is similar because they belong to the same regulatory pathway is studied from a graphical modeling viewpoint. This leads to the criteria of conditional independence which can be evaluated by computing the conditional mutual information. The latter is completely characterized by the sum of the entropies of joint variables, underlining the need for an entropy estimator that is accurate even in low sampling conditions. We introduce a new plug-in entropy estimator obtained from shrinking maximum likelihood multinomial proportions estimates to the maximum entropy target. We derive the closely related ZIPshrink and ZINBshrink entropy estimators which enhance the shrinkage estimator by first adjusting the shrinkage target depending on the fraction of structural zeros in the multinomial model. The fraction of structural zeros is estimated using a Zero-Inflated Poisson or Zero-Inflated Negative Binomial distribution to model the histogram of bin counts. We compare these three new estimators to state of the art estimators. We show that they give acceptable

Supporting Multilingual Information Retrieval in Web Applications: An English-Chinese Web Portal Experiment

by Jialun Qin, Yilu Zhou, Michael Chau, Hsinchun Chen - In Proceedings of the International Conference on Asian Digital Libraries (ICADL 2003), Kuala Lumpur , 2003
"... Cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) techniques have been widely studied, but they are not often applied to and evaluated for Web applications. In this paper, we present our research in developing and evaluating a multilingual English-Chinese Web ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) techniques have been widely studied, but they are not often applied to and evaluated for Web applications. In this paper, we present our research in developing and evaluating a multilingual English-Chinese Web portal in the business domain. A dictionary-based approach has been adopted that combines phrasal translation, co-occurrence analysis, and pre- and post-translation query expansion. The approach was evaluated by domain experts and the results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision when compared with simple word-by-word translation.

Center for Language

by Luo Zhiyong
"... Unknown word recognition is an important problem in Chinese word segmentation systems. In this paper, we propose an integrated method for Chinese unknown word extraction for offline corpus processing, in which both contextentropy (on each side) and frequency ratio against background corpus are intro ..."
Abstract - Add to MetaCart
Unknown word recognition is an important problem in Chinese word segmentation systems. In this paper, we propose an integrated method for Chinese unknown word extraction for offline corpus processing, in which both contextentropy (on each side) and frequency ratio against background corpus are introduced to evaluate the candidate words. Both of the measures are computed efficiently on Suffix array with much less space overhead. Our method can also be reinforced when combined with a basic Segmentor by boundary-verification and arbitrary n-gram words can be extracted by our method. We test our method on Chinese novel Xiao Ao Jiang Hu, and obtain satisfactory achievements compared to traditional criteria such as Likelihood Ratio. 1

Chinese Word Segmentation for Terrorism-Related Contents

by Daniel Zeng, Donghua Wei, Michael Chau, Feiyue Wang
"... ..."
Abstract - Add to MetaCart
Abstract not found

DOI 10.1007/s10796-010-9278-5 Domain-specific Chinese word segmentation using suffix tree and mutual information

by Daniel Zeng, Donghua Wei, Michael Chau, Feiyue Wang, D. Wei, F. Wang, D. Zeng, F. Wang, M. Chau , 2010
"... Abstract As the amount of online Chinese contents grows, there is a critical need for effective Chinese word segmentation approaches to facilitate Web computing applications in a range of domains including terrorism informatics. Most existing Chinese word segmentation approaches are either statistic ..."
Abstract - Add to MetaCart
Abstract As the amount of online Chinese contents grows, there is a critical need for effective Chinese word segmentation approaches to facilitate Web computing applications in a range of domains including terrorism informatics. Most existing Chinese word segmentation approaches are either statistics-based or dictionary-based. The pure statistical method has lower precision, while the pure dictionary-based method cannot deal with new words beyond the dictionary. In this paper, we propose a hybrid method that is able to avoid the limitations of both types of approaches. Through the use of suffix tree and mutual information (MI) with the dictionary, our segmenter, called IASeg, achieves high accuracy in word segmentation when domain training is available. It can also identify new words through MI-based token merging and dictionary updating. In addition, with the proposed Improved Bigram method IASeg can process N-grams. To evaluate the

SpidersRUs: Creating specialized search engines

by Michael Chau A, Jialun Qin, Yilu Zhou, Chunju Tseng, Hsinchun Chen , 2007
"... in multiple languages ..."
Abstract - Add to MetaCart
in multiple languages
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University