Results 1 - 10
of
19
SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation
- Proceedings of the 12 th International Conference on World Wide Web (WWW’03
, 2003
"... This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatica ..."
Abstract
-
Cited by 120 (4 self)
- Add to MetaCart
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. The final version of this paper will reflect new data labeling one billion pages, rather than the 264 million pages reported on herein. To our knowledge, this is the largest scale semantic tagging effort to date. We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web. 1.
Automatic Ontology-based Knowledge Extraction from Web Documents
- Intelligent Systems
, 2003
"... This paper presents recent developments in the Artequakt project which seeks to automatically extract knowledge about artists from the Web, populate a knowledge base, and use it to generate personalized narrative biographies. An overview of the system architecture is presented and the three key comp ..."
Abstract
-
Cited by 77 (12 self)
- Add to MetaCart
This paper presents recent developments in the Artequakt project which seeks to automatically extract knowledge about artists from the Web, populate a knowledge base, and use it to generate personalized narrative biographies. An overview of the system architecture is presented and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. An example experiment is detailed and further challenges are outlined.
SEmantic portAL - The SEAL approach
- Spinning the Semantic Web
, 2001
"... The core idea of the Semantic Web is to make information accessible to human and software agents on a semantic basis. Hence, web sites may feed directly from the Semantic Web exploiting the underlying structures for human and machine access. We have developed a generic approach for developing sem ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
The core idea of the Semantic Web is to make information accessible to human and software agents on a semantic basis. Hence, web sites may feed directly from the Semantic Web exploiting the underlying structures for human and machine access. We have developed a generic approach for developing semantic portals, viz. SEAL (SEmantic portAL), that exploits semantics for providing and accessing information at a portal as well as constructing and maintaining the portal. In this paper, we discuss the role that semantic structures make for establishing communication between different agents in general. We elaborate on a number of intelligent means that make semantic web sites accessible from the outside, viz. semantics-based browsing, semantic querying and querying with semantic similarity, semantic personalization, and machine access to semantic information at a semantic portal. As a case study we refer to the AIFB web site --- a place that is increasingly driven by Semantic Web tec...
Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web
- Presented at the Semantic Authoring, Annotation and Knowledge Markup (SAAKM) 2002 Workshop at the 15th European Conference on Artificial Intelligence (ECAI 2002
"... The Artequakt project is working towards automatically generating narrative biographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are ex ..."
Abstract
-
Cited by 29 (9 self)
- Add to MetaCart
The Artequakt project is working towards automatically generating narrative biographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future plans are described.
SEAL -- a framework for developing SEmantic web portALs
- PROCEEDINGS OF THE 18TH BRITISH NATIONAL CONFERENCE ON DATABASES. VOLUME 2097 OF LNCS
, 2001
"... The core idea of the Semantic Web is to make information accessible to human and software agents on a semantic basis. Hence, web sites may feed directly from the Semantic Web exploiting the underlying structures for human and machine access. We have developed a generic approach for developing sema ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
The core idea of the Semantic Web is to make information accessible to human and software agents on a semantic basis. Hence, web sites may feed directly from the Semantic Web exploiting the underlying structures for human and machine access. We have developed a generic approach for developing semantic portals, viz. SEAL (SEmantic portAL), that exploits semantics for providing and accessing information at a portal as well as constructing and maintaining the portal. In this paper, we discuss the role that semantic structures make for establishing communication between different agents in general. We elaborate on a number of intelligent means that make semantic web sites accessible from the outside, viz. semantics-based browsing, semantic querying and querying with semantic similarity, and machine access to semantic information at a semantic portal. As a case study we refer to the AIFB web site — a place that is increasingly driven by Semantic Web technologies.
Linguistic annotation for the semantic web
- Annotation for the Semantic Web. IOS
, 2003
"... Abstract. Establishing the semantic web on a large scale implies the widespread annotation ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. Establishing the semantic web on a large scale implies the widespread annotation
Automatic Extraction of Knowledge from Web Documents
- Workshop on Human Language Technology for the Semantic Web and Web Services, 2 nd Int. Semantic Web Conf. Sanibel Island
, 2003
"... A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequ ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.
Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation
- 2 nd Int. Conf. Knowledge Capture (KCap'03), Workshop on Knowledge Markup and Semantic Annotation
, 2003
"... The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically extract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to generate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation.
Using a web-based categorization approach to generate thematic metadata form texts
- In ACM Transactions on Asian Language Information Processing
, 2004
"... Conventional tools for automatic metadata creation mostly extract named entities or text segments from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often insufficient for machines to understand the facts contain ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Conventional tools for automatic metadata creation mostly extract named entities or text segments from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often insufficient for machines to understand the facts contained in the texts, thus precluding the possibility of implementing more advanced, intelligent applications, such as concept-based search. In this work, we try to create more refined thematic metadata inherent in texts. Based on Web resource mining, our approach acquires training corpora necessary to describe both the thematic categories and the metadata extracted from the texts. The approach then finds the corresponding relationships among them by means of categorization and thus generates thematic metadata for the textual data. Experimental results confirm the potential and wide adaptability of our approach.
Keyword Extraction from the Web for Personal Metadata Annotation
- ISWC Workshop Notes VIII 115 (W8)–4th International Workshop on Knowledge Markup and Semantic Annotation (Semannot2004) (in conjunction with 3rd Int’l Semantic Web Conference (ISWC2004)), pp.51–60
, 2004
"... Abstract. With the currently growing interest in the Semantic Web and Social Networking, personal metadata is coming to play an important role in the Web. This paper proposes a novel keyword extraction method to extract personal metadata from the Web. The proposed method is based on co-occurrence in ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. With the currently growing interest in the Semantic Web and Social Networking, personal metadata is coming to play an important role in the Web. This paper proposes a novel keyword extraction method to extract personal metadata from the Web. The proposed method is based on co-occurrence information of words. Our method extracts relevant keywords depending on the context of a person. Our experimental results show that extracted keywords are useful for personal metadata creation. We also discuss the annotation of personal metadata and application to the Semantic Web. 1

