Results 1 -
9 of
9
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
- In Proceedings of KDD ’05
, 2005
"... Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and revealing research trends in scientif ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
Temporal Text Mining (TTM) is concerned with discovering temporal patterns in text information collected over time. Since most text information bears some time stamps, TTM has many applications in multiple domains, such as summarizing events in news articles and revealing research trends in scientific literature. In this paper, we study a particular TTM task – discovering and summarizing the evolutionary patterns of themes in a text stream. We define this new text mining problem and present general probabilistic methods for solving this problem through (1) discovering latent themes from text; (2) constructing an evolution graph of themes; and (3) analyzing life cycles of themes. Evaluation of the proposed methods on two different domains (i.e., news articles and literature) shows that the proposed methods can discover interesting evolutionary theme patterns effectively.
CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature
- Journal of the American Society for Information Science and Technology
, 2006
"... This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conc ..."
Abstract
-
Cited by 53 (14 self)
- Add to MetaCart
This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science – research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature – an evolving network of scientific publications cited by research front concepts. Kleinberg’s burst detection algorithm is adapted to identify emergent research front concepts. Freeman’s betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are: 1) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, 2) the value of a co-citation cluster is explicitly interpreted in terms of research front concepts and 3) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.
A Software Infrastructure for Research in Textual Data Mining
- The International Journal on Artificial Intelligence Tools
, 2004
"... Few tools exist that address the challenges facing researchers in the Textual Data Mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Few tools exist that address the challenges facing researchers in the Textual Data Mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data.
Multimedia for computer science: from CS0 to grades 7-12
, 2003
"... Abstract: The pipeline for women and minorities entering CS/IT is shrinking. Using a combination of multimedia e-learning and mentoring, we seek to widen the pipeline in both first year college courses and grades 7-12. We are developing multimedia that complements a new first semester Computer Scien ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract: The pipeline for women and minorities entering CS/IT is shrinking. Using a combination of multimedia e-learning and mentoring, we seek to widen the pipeline in both first year college courses and grades 7-12. We are developing multimedia that complements a new first semester Computer Science (CS0) textbook. For grades 7-12, we plan to establish outreach teams consisting of undergraduate and graduate student Teaching Fellows, teachers and administrators, faculty members, and industry professionals. This year, two outreach teams will adapt multimedia designed for CS0 for use in middle schools. Preliminary results show that the multimedia promotes learning of Java programming “objects first, ” for both undergraduates and high school students. One outreach team will adapt these Java materials for use in a high school. Another team will adapt multimedia introducing the field of CS for use in a middle school, seeking to clear up common misconceptions about Computer Science.
Bootstrapping ontology learning for information retrieval using formal concept analysis and information anchors
- 14TH INTERNATIONAL CONFERENCE ON CONCEPTUAL STRUCTURES
"... We present an innovative approach to information retrieval for domain-specific digital library collections. We use a combination of Formal Concept Analysis (FCA) and a notion of information anchors to facilitate information delivery to the end user. This approach (1) uses ranked objects in attribut ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present an innovative approach to information retrieval for domain-specific digital library collections. We use a combination of Formal Concept Analysis (FCA) and a notion of information anchors to facilitate information delivery to the end user. This approach (1) uses ranked objects in attribute concepts to facilitate topical queries for experts and expertise profiles; (2) formulates (keyword by keyword) context for concept lattice construction via a set of heuristics, including those based on information anchors for selecting descriptive phrases, (3) bootstraps the learning of domain-specific concept hierarchies using FCA, and (4) incorporates the learnt concept hierarchies and WordNet for content-based document classification. To demonstrate the feasibility and utility of this approach, we implemented a prototype online information retrieval systemmemsworldonline.case.edu (MWOL) for the emerging engineering discipline of MEMS (microelectromechanical systems) incorporating these ideas. MWOL has been actively used by a non-trivial group of MEMS practitioners; all user queries are processed in a fraction of a second as a result of inverse indexing strategy using Berkeley DB. Voluntary user feedback using online forms has been encouraging. However, no other systems with similar features are available for a comparative study at this point.
An Anthological Review of Research Utilizing MontyLingua, a Python-Based End-to-End Text Processor
"... MontyLingua, an integral part of ConceptNet which is currently the largest commonsense knowledge base, is an English text processor developed using Python programming language in MIT Media Lab. The main feature of MontyLingua is the coverage for all aspects of English text processing from raw input ..."
Abstract
- Add to MetaCart
MontyLingua, an integral part of ConceptNet which is currently the largest commonsense knowledge base, is an English text processor developed using Python programming language in MIT Media Lab. The main feature of MontyLingua is the coverage for all aspects of English text processing from raw input text to semantic meanings and summary generation, yet each component in MontyLingua is loosely-coupled to each other at the architectural and code level, which enabled individual components to be used independently or substituted. However, there has been no review exploring the role of MontyLingua in recent research work utilizing it. This paper aims to review the use of and roles played by MontyLingua and its components in research work published in 19 articles between October 2004 and August 2006. We had observed a diversified use of MontyLingua in many different areas, both generic and domainspecific. Although the use of text summarizing component had not been observe, we are optimistic that it will have a crucial role in managing the current trend of information overload in future research.
A Framework for Exploration of News Corpora by Actor Evolution and Interaction
, 2007
"... IBM and will probably be copyrighted is accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communicati ..."
Abstract
- Add to MetaCart
IBM and will probably be copyrighted is accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T.J. Watson Research Center,
Exploring Evolutionary Technical Trends From Academic Research Papers
"... Automatic Term Recognition (ATR) is concerned with discovering terminology in large volumes of text corpora. Technical terms are vital elements for understanding the techniques used in academic research papers, and in this paper, we use focused technical terms to explore technical trends in the rese ..."
Abstract
- Add to MetaCart
Automatic Term Recognition (ATR) is concerned with discovering terminology in large volumes of text corpora. Technical terms are vital elements for understanding the techniques used in academic research papers, and in this paper, we use focused technical terms to explore technical trends in the research literature. The major purpose of this work is to understand the relationship between techniques and research topics to better explore technical trends. We define this new text mining issue and apply machine learning algorithms for solving this problem by (1) recognizing focused technical terms from research papers; (2) classifying these terms into predefined technology categories; (3) analyzing the evolution of technical trends. The dataset consists of 656 papers collected from well-known conferences on ACM. The experimental results indicate that our proposed methods can effectively explore interesting evolutionary technical trends in various research topics. 1.
Semi-Automatic Trend Detection in Scholarly Repository Using Semantic Approach
"... Abstract—Currently WWW is the first solution for scholars in finding information. But, analyzing and interpreting this volume of information will lead to researchers overload in pursuing their research. Trend detection in scientific publication retrieval systems helps scholars to find relevant, new ..."
Abstract
- Add to MetaCart
Abstract—Currently WWW is the first solution for scholars in finding information. But, analyzing and interpreting this volume of information will lead to researchers overload in pursuing their research. Trend detection in scientific publication retrieval systems helps scholars to find relevant, new and popular special areas by visualizing the trend of input topic. However, there are few researches on trend detection in scientific corpora while their proposed models do not appear to be suitable. Previous works lack of an appropriate representation scheme for research topics. This paper describes a method that combines Semantic Web and ontology to support advance search functions such as trend detection in the context of scholarly Semantic Web system (SSWeb).

