## A search engine for mathematical formulae (2006)

Venue: | Proc. of Artificial Intelligence and Symbolic Computation, number 4120 in LNAI |

Citations: | 18 - 1 self |

### BibTeX

@INPROCEEDINGS{Kohlhase06asearch,

author = {Michael Kohlhase and Ioan A. S¸ucan},

title = {A search engine for mathematical formulae},

booktitle = {Proc. of Artificial Intelligence and Symbolic Computation, number 4120 in LNAI},

year = {2006},

pages = {241--253},

publisher = {Springer}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. We present a search engine for mathematical formulae. The MathWebSearch system harvests the web for content representations (currently MathML and OpenMath) of formulae and indexes them with substitution tree indexing, a technique originally developed for accessing intermediate results in automated theorem provers. For querying, we present a generic language extension approach that allows constructing queries by minimally annotating existing representations. First experiments show that this architecture results in a scalable application. 1

### Citations

237 |
The Mathematica Book
- Wolfram
- 1999
(Show Context)
Citation Context ...e for the functional structure of mathematical formulae. 2 There are various other formats that are proprietary or based on specific mathematical software packages like Wolfram Research’s Mathematica =-=[Wol02]-=-. We currently support them if there is a converter to OpenMath or MathML.sIn Content MathML, the formula � a 0 sin(x)dx would be represented as the following expression: Listing 1.1. Content Represen... |

130 | OMDoc – An open markup format for mathematical documents [Version 1.2
- Kohlhase
- 2006
(Show Context)
Citation Context ...matical formulae oriented search method. The second approach is taken by the MBase system [KF01], which applies the pattern matching of the underlying programming language to search for OMDoc-encoded =-=[Koh06]-=- mathematical documents in the knowledge base. The search engine for the Helm project indexes structural meta-data gleaned from Content MathML representations for efficient retrieval [AS04]. The idea ... |

42 | MBase: Representing Knowledge and Context for the Intergration of
- Kohlhase, Franke
- 2001
(Show Context)
Citation Context ... the important advantage that they rely on already existing technologies but they do not fully provide a mathematical formulae oriented search method. The second approach is taken by the MBase system =-=[KF01]-=-, which applies the pattern matching of the underlying programming language to search for OMDoc-encoded [Koh06] mathematical documents in the knowledge base. The search engine for the Helm project ind... |

27 | The Open Math standard, version 2.0 - Buswell, Caprotti, et al. |

21 | Technical aspects of the digital library of mathematical functions
- Miller, Youssef
- 2003
(Show Context)
Citation Context ...conventional information retrieval methods, and the other leverages the structure inherent in content representations. The first approach is utilized for the Digital Library of Mathematical Functions =-=[MY03]-=- and ActiveMath system [LM06]: mathematical formulae are converted to text and indexed. The search string is similar to L ATEX commands and is converted to string before performing the search. This al... |

11 |
Activation Framework
- Grosso
- 2001
(Show Context)
Citation Context ... data is attached — an identifier that relates the term to its exact location. The identifier, location and other relevant data are stored in a database external to the search engine. We use XPointer =-=[GMMW03]-=- references to specify term locations (see Subsection 4.3 for more details). Unfortunately, substitution tree indexing does not support subterm search in an elegant fashion, so when adding a term to t... |

10 |
Communities of Practice in MKM: An Extensional Model
- Kohlhase, Kohlhase
(Show Context)
Citation Context ...in a variety of notations depending on , nC k , Cn k , and Ck n all mean the same thing: 1 the context: � n k 1 The third notation is the French standard, whereas the last one is the Russian one (see =-=[KK06]-=- for a discussion of social context in mathematics). This poses a very difficult problem for searching, since these two look the same, but mean different things.sn! . In a formula search we would like... |

8 | Efficient retrieval of mathematical statements
- Asperti, Selmi
- 2004
(Show Context)
Citation Context ...c-encoded [Koh06] mathematical documents in the knowledge base. The search engine for the Helm project indexes structural meta-data gleaned from Content MathML representations for efficient retrieval =-=[AS04]-=-. The idea is that this metadata approximates the formula structure and can serve as a filter for very large term data bases. However, since the full structure of the formulae is lost, semantic equiva... |

8 |
Term Indexing. Number 1053
- Graf
- 1996
(Show Context)
Citation Context ...ndexing mathematical formulae on the web, we will interpret them as first-order terms (see Subsection 4.1 for details). This allows us to use a technique from automated reasoning called term indexing =-=[Gra96]-=-. This is the process by which a set of terms is stored in a special purpose data structure (the index, normally stored in memory) where common parts of the terms are potentially shared, so as to mini... |

5 | Knowledge representation and management in ActiveMath - Melis, Büdenbender, et al. - 2003 |

4 |
Content-Faithful Transformations for MathML
- Huerter, Rodionov, et al.
(Show Context)
Citation Context ...am.com for a widely known web-site that uses parallel markup). 3 Modern presentation mechanisms will generate parallel markup, since that e.g. allows copy-and-paste into mathematical software systems =-=[HRW02]-=-.s1.4 A Running Example: The Power of a Signal A standard use case 4 for MathWebSearch is that of an engineer trying to solve a mathematical problem such as finding the power of a given signal s(t). O... |

2 |
Software Foundation FSF. Gnu general public license. Software License available at http://www.gnu.org/copyleft/gpl.html
- Free
- 1991
(Show Context)
Citation Context ...ar to a standard search engine like Google, except that it can retrieve content representations of mathematical formulae not just raw text. The system is released under the Gnu General Public License =-=[FSF91]-=- (see [Mat06] for details). A running prototype is available for testing at http://search.mathweb.org. 1.2 State of the Art in Math Search There seem to be two general approaches to searching mathemat... |

2 |
Libbrecht and Erica Melis. Methods for Access and Retrieval of Mathematical Content in ActiveMath
- Paul
- 2006
(Show Context)
Citation Context ...ieval methods, and the other leverages the structure inherent in content representations. The first approach is utilized for the Digital Library of Mathematical Functions [MY03] and ActiveMath system =-=[LM06]-=-: mathematical formulae are converted to text and indexed. The search string is similar to L ATEX commands and is converted to string before performing the search. This allows searching for normal tex... |

2 |
Enhanced theorem reuse by partial theory inclusionss
- Normann
(Show Context)
Citation Context ...er applications include the retrieval of equations that allow to transform a formula, of Lemmata to simplify a proof goal, or to find mathematical theories that can be re-used in a given context (see =-=[Nor06a]-=- for a discussion of the latter).sThe main advantage of substitution tree indexing is that we only store substitutions, not the actual terms, and this leads to a small memory footprint. Figure 2 shows... |

2 | Interfacing to computer algebra via term indexing
- Theiß, Sorge, et al.
- 2006
(Show Context)
Citation Context ...bases. However, since the full structure of the formulae is lost, semantic equivalences like α-equivalence cannot be taken into account. Another system that takes this second approach is described in =-=[TSP06]-=-. It uses term indexing for interfacing with Computer Algebra Systems while determining applicable algorithms in an automatically carried proof. This is closely related to what we present in this pape... |

1 |
Web page at http://creativecommons.org
- Commons
(Show Context)
Citation Context ... In these cases, the content representations have to be harvested from the repositories themselves. For instance, we harvest the Connexions corpus, which is available under a Creative Commons License =-=[Cre]-=- for MathWebSearch. As we will see, this poses some problems in associating presentation (for the human reader) with the content representation. Other repositories include the ActiveMath repository [M... |

1 |
Web page at http://kwarc.eecs.iu-bremen.de/ projects/mws/, seen
- search
- 2006
(Show Context)
Citation Context ...ard search engine like Google, except that it can retrieve content representations of mathematical formulae not just raw text. The system is released under the Gnu General Public License [FSF91] (see =-=[Mat06]-=- for details). A running prototype is available for testing at http://search.mathweb.org. 1.2 State of the Art in Math Search There seem to be two general approaches to searching mathematical formulae... |

1 |
Extended normalization for e-retrieval of formulae. to appear
- Normann
- 2006
(Show Context)
Citation Context ...r instance, our search in Listing 1.4 might be used to find a useful identity for � 0 ∞ f(x) · g(x)dx, if we know that s(x) · s(x) = s2 (x). MathWebSearch can be extended to a E-Retrieval engine (see =-=[Nor06b]-=-) without compromising efficiency by simply Estandardizing index and query terms. We plan to index more content, particularly more OpenMath. In the long run, it would be interesting to interface MathW... |