Natural Language Processing of Mathematical Texts in mArachna

Abstract-mArachna is a technical framework designed for the extraction of mathematical knowledge from natural language texts. mArachna avoids the problems typically encountered in automated-reasoning based approaches through the use of natural language processing techniques taking advantage of the strict formalized language characterizing mathematical texts. Mathematical texts possess a strict internal structuring and can be separated into text elements (entities) such as definitions, theorems etc. These entities are the principal carriers of mathematical information. In addition, Entities show a characteristic coupling between the presented information and their internal linguistic structure, well suited for natural language processing techniques. Taking advantage of this structure, mArachna extracts mathematical relations from texts and integrates them into a knowledge base. Identifying sub

### Managing mathematical texts with OWL and their graphical representation

Mathematical knowledge contained in scientific digital publications poses a challenge for intelligent retrieval mechanisms. Many current approaches use statistical (e.g. Google) or natural language processing methods to find correlations in texts and annotate texts semantically. However both kinds of approaches face the problem of extracting and processing knowledge from mathematical equations. The presented system is based on natural language processing techniques, and benefits from characteristic linguistic structures defined by the language used in mathematical texts. It accumulates extracted information snippets from texts, symbols, and equations in knowledge bases. These knowledge bases provide the foundation for the information retrieval. This article describes the concepts and the prototypical technical implementation.

### Knowledge Bases in mArachna

Automated extraction of knowledge from natural language texts is a major technical challenge that remains largely unsolved. Scientific texts in general, and mathematical texts in particular, are characterised by the use of complex language constructs with the intent to transfer knowledge. To a large extend, mathematical texts possess a strict internal structuring and can be separated into text elements such as definitions, theorems etc. These text elements are principal carriers of mathematical information. In addition, these elements show a characteristic linguistic structuring well suited for natural language processing techniques. In this paper we present MARACHNA, a system for extracting mathematical relations from texts and integrating them into a knowledge base. In response to user queries, parts of the knowledge base are visualised using XML Topic Maps. In particular, MARACHNA aims to provide an overview of single fields of mathematics, as well as showing intra-field relations between mathematical objects and concepts.

### MARACHNA: Automated Creation of Knowledge Representations for Mathematics

