Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems

André Greiner-Petter; Moritz Schubotz; Howard S Cohl; Bela Gipp

doi:10.1108/AJIM-08-2018-0185

. Author manuscript; available in PMC: 2021 Oct 2.

Published in final edited form as: ASLIB J Inf Manag. 2019;71(3):10.1108/AJIM-08-2018-0185. doi: 10.1108/AJIM-08-2018-0185

Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems

André Greiner-Petter ¹, Moritz Schubotz ², Howard S Cohl ³, Bela Gipp ⁴

PMCID: PMC8486947 NIHMSID: NIHMS1685988 PMID: 34603731

Abstract

Purpose –

Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS.

Design/methodology/approach –

Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies.

Findings –

The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple.

Originality/value –

This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.

Keywords: Translation, Computer Algebra System (CAS), Document Preparation System (DPS), LaTeX, Presentation to Computation (P2C), Special functions

1. Introduction

A typical workflow of a scientist who writes a scientific publication is to use Document Preparation Systems (DPS) to write the paper and one or more Computer Algebra Systems (CAS) for verification, analysis and visualization. Especially in the Science, Technology, Engineering and Mathematics literature, LaTeX has become the de facto standard for writing scientific publications over the past 30 years (Knuth, 1997, 1998, p. 559; Alex, 2007). LaTeX enables printing of mathematical formulae in a structure similar to handwritten style. For example, consider the specific Jacobi polynomial (DLMF, 2019, Table 18.3.1):

P_{n}^{(α, β)} (\cos (a Θ)),

(1)

where n is a nonnegative integer, α, β > −1, and a; $Θ \in ℝ$ . This mathematical expression can be written in LaTeX as:

P_n^{(\ aplha, \ beta)} (\ cos (a \ Theta)) .

While LaTeX focuses on displaying mathematics, a CAS concentrates on computations and user friendly syntax. Especially important for a CAS is to embed unambiguous semantic information within the input. Each system uses different representations and syntax, so that a writer needs to continually translate mathematical expressions from one representation to another and back again. Table I shows four different representations for Expression (1).

Table I.

Different representations for (1)

Systems	Representations
Generic LaTeX	P_n^{(\alpha, \beta)}(\cos(a\Theta))
Semantic LaTeX	\JacobiP{\alpha}{\beta}{n}@{\cos@{a\Theta}}
Maple	JacobiP (n, alpha, beta, cos (a*Theta))
Mathematica	JacobiP [n, \[Alpha], \[Beta], Cos [a \[CapitalTheta]]]

Open in a new tab

Notes: Generic LaTeX is the default LaTeX expression; semantic LaTeX uses special semantic macros to embed semantic information; and CAS representations are unique to themselves

Translations from generic LaTeX to CAS are difficult to realize since the full semantic information is not easily constructed from the input. Bruce Miller at the National Institute of Standards and Technology (NIST) has created a set of semantic LaTeX macros (Miller and Youssef, 2003). Most macros tie specific character sequences to well-defined mathematical objects and are linked with corresponding definitions in the Digital Library of Mathematical Functions (DLMF). The Digital Repository of Mathematical Formulae (DRMF) is an outgrowth of the DLMF with the goal to facilitate interaction among a community of mathematicians and scientists (Cohl et al., 2014, 2015). The DRMF extends the set of semantic macros. These macros embed necessary semantic information into LaTeX expressions. The macros may also contain @ symbols preceding the variables of the function. The number of @ symbols is used to switch between different notation styles, e.g., cos(x) and cos x. One example of such a macro is given in Table I for the semantic LaTeX representation of the Jacobi polynomial. The macros provide isolated access to important parts of the mathematical function, such as the arguments.

Even with embedded semantic information, a translation between systems can be difficult. A typical example of complex problems occurs for multivalued functions (Davenport, 2010). A CAS usually defines branch cuts to compute principal values of multivalued functions (England et al., 2014), which makes the implementation of a theoretically continuous function to a discontinuous presentation of it. In general, positioning branch cuts follows conventions, but can be positioned arbitrarily in many cases. Communicating and explaining the decision for defined branch cuts is a critical issue for CAS and can vary between various systems (Corless et al., 2000). Figure 1 illustrates two examples of different branch cut positioning for the inverse trigonometric arccotangent function. While Maple [1] (square brackets refer to notes which appear at the end of the manuscript, just above the bibliography) defines the branch cut at [−i∞, −i], [i, i∞] (Figure 1(a)), Mathematica defines the branch cut at [−i, i] (Figure 1(b)).

Figure 1. — Two plots of the real part for the arccotangent function with a branch cut at [−i∞, −i], [i, i∞] in (a) and at [−i, i] in (b), respectively

**Note:** Plotted with Maple 2016

A CAS user needs to fully understand the properties and special definitions (such as the position of branch cuts) in the CAS to avoid mistakes during a translation (England et al., 2014). A manual translation process is not only laborious, but also prone to errors. Note that this general problem has been named as automatic Presentation To Computation (P2C) conversion (Youssef, 2017).

This paper presents a new approach for automatic P2C and vice versa conversions. Translations from presentational to computational (computational to presentational) systems are called forward (backward) translations. A forward translation is denoted with an arrow with the target system language above the arrow. For example:

t \overset{M_{a p l e}}{\mapsto} c,

where t is an expression in the LaTeX language and c is an element of the Maple language $M_{a p l e}$ . As we will see later in this paper, we need to compare mathematical concepts between systems. This is impossible from a mathematical point of view. Consider the transcendental mathematical constant e, known as Euler’s number. The theoretical construct for this symbol cannot be mathematically equivalent to the value exp(1) in Maple, caused by computational and implementational limitations.

In order to clarify the notion of equivalence (or lack thereof) in our context of translations, we introduce the terms appropriate and inappropriate translations. We consider a translation to be appropriate, when a numerical evaluation returns the same values in both concepts up to a numerical precision $| ε | ≪ 1$ , for all possible points in specified domains for the functions. A translation is considered as inappropriate, when it is not appropriate.

For example, a translation such as:

\ \cos @ {z} \overset{M_{a p l e}}{\mapsto} \cos (z),

(2)

is appropriate, while a translation such as:

\ \cos @ {z} \overset{M_{a p l e}}{\mapsto} \sin (z),

(3)

is inappropriate. Note that it is not always as easy as in this example to decide if a translation is appropriate or not. This paper also presents several validation techniques to automatically verify if a translation is appropriate or inappropriate.

In addition, we also introduce the notion of direct translations. Most mathematical objects in one system have a direct counterpart in other systems. Later in the paper, we will explain that a translation from one specific mathematical object to its counterpart in the other system is not always appropriate. Also, not every mathematical object has a counterpart in other systems. We call a translation to its counterpart direct. For example, the translation (2) is direct, while a translation to the definition of the cosine function:

\ \cos @ {z} \overset{M_{a p l e}}{\mapsto} (\exp (I * z) + \exp (- I * z)) / 2,

is not a direct translation even though it is appropriate. Note that partial results of this paper have been published in Cohl et al. (2017).

2. Related work

Since LaTeX became the de facto standard for writing papers in mathematics, most CAS provide simple functions to import and export mathematical LaTeX expressions [2]. Those tools have two essential problems. They are only able to import simple mathematical expressions, where the semantics are unique. For example, the internal LaTeX macro \frac always indicates a fraction. For more complex expressions, e.g., the Jacobi polynomial in Table I, the import functions fail. The second problem appears in the export tools. Mathematical expressions in CAS are fully semantic. Otherwise the CAS would not be able to compute or evaluate the expressions. During the export process, the semantic information is lost, because generic LaTeX is not able to carry sufficient semantic information. Because of these problems, an exported expression cannot be imported to the same system again in most cases (except for simple expressions such as those described above). Our tool attempts to solve these problems and provide round trip translations between LaTeX and CAS.

The semantics must be well-known before an expression can be translated. There are two main approaches to solve that problem: first, someone could specify the semantic information during the writing process (pre-defined semantics); and second, the translator can determine the correct semantic information in general mathematical expressions before it translates the expression. So-called interactive documents [3], such as the Computable Document Format (CDF) [4] by Wolfram Research, or worksheets by Maple, try to solve this problem with the second approach and allow one to embed semantic information into the input. Those complex document formats require specialized tools to show and work with the documents (Wolfram CDF Player, or Maple for the worksheets). The JOBAD architecture (Giceva et al., 2009) is able to create web-based interactive documents and uses Open Mathematical Documents (OMDoc) (Kohlhase, 2006) to carry semantics. The documents can be viewed and edited in the browser. Those JOBAD documents also allow one to perform computations via CAS. This gives one the opportunity to calculate, compute and change mathematical expressions directly in the document. The translation performs in the background, invisible to the user. Similar to the JOBAD architecture, other interactive web documents exist, such as MathDox (Cuypers et al., 2008) and The Planetary System (Kohlhase et al., 2011).

Another approach tries to avoid translation problems by allowing computations directly via the LaTeX compiler, e.g., LaTeXCalc (Churchill and Boyd, 2010). Those packages are limited to the abilities of the compiler and therefore are not as powerful as CAS. A work around for this case is sagetex (Drake, 2009), which is a LaTeX package interface for the open source CAS sage [5]. This package allows sage commands in TeX-files and uses sage in the background to compute the commands. In this scenario, a writer still needs to manually translate expressions to the syntax of sage, but it is possible to integrate CAS expressions directly into TeX documents.

There exist two approaches for marking up mathematical TeX/LaTeX documents semantically with TeX macros. Namely, sTeX (Kohlhase, 2008) developed by Kohlhase and the DLMF/DRMF LaTeX macros developed by Miller (Miller and Youssef, 2003). This paper shows that it is possible to develop a context-free translation tool using the semantic macros introduced by Miller. The goal of sTeX is to markup the functional structure of mathematical documents so that they can be exported to the OMDoc format. The macro functionality developed by Miller introduces new macros for special functions, orthogonal polynomials and mathematical constants. Each of these macros ties specific character sequences to a well-defined mathematical object and is linked with the corresponding definition in the DLMF or DRMF. We call these semantic macros DLMF/DRMF LaTeX macros. These semantic macros are internally used in the DLMF and the DRMF. We gave the DLMF/DRMF LaTeX macro set the preference for developing the translation engine because it provides DLMF definitions for a comprehensive number of functions. In contrast, sTeX does not focus on the semantics of functions, is often complex to use, and defines diverse macros for symbols and concepts that CAS usually does not support.

Miller also developed LaTeXML, a tool for converting LaTeX expressions to MathML (Miller, 2004). LaTeXML is used to generate the DLMF and is able to parse the DLMF/DRMF LaTeX macros to generate content MathML. Even though many CAS are able to import and export MathML, they fail for special functions. Schubotz and collaborators recently performed benchmarks on several LaTeX to MathML conversion tools, including LaTeXML, in Schubotz et al. (2018).

3. Translation problems

There are several potential problems for performing translations between systems that embed semantic information in the input. These problems vary from simple cases, e.g., a function is not defined in the system, to complex cases, e.g., different positioning of branch cuts for multivalued functions. This section will discuss some problems and our workarounds.

3.1. Different sets of defined functions

If a function is defined in one system but not in the other, sometimes we can easily translate the definition of the mathematical function. For example, the Gudermannian (DLMF (4.23.10)) gd(x) function is defined by:

gd (x) : = \arctan (\sinh x), x \in ℝ,

(4)

and linked to the semantic macro \Gudermannian in the DLMF but does not exist in Maple. We can perform a translation for the definition (4) instead of the macro itself:

\ Gudermannian {x} \overset{M_{a p l e}}{\mapsto} \arctan (\sinh (x)) .

(5)

Since translations such as these are non-intuitive, describing explanations become necessary for the translation process. A particular logging function stores each translation and provides details after a successful translation process. Section 5 explains this task further.

Providing detailed information also solves the problem for multiple alternative translations. In some cases, a semantic macro has two alternative representations in the CAS or vice versa. In such cases, the translator picks one of the alternatives and informs the user about the decision.

3.2. Positions of branch cuts

In case of differences between defined branch cuts, we can also use alternative translations to solve the problems. Consider the mentioned case of the arccotangent function (Corless et al., 2000) that has different positioned branch cuts in Maple as compared to the DLMF or Mathematica definitions. As suggested by Corless et al. (2000), we can translate an alternative definition of the arccotangent function to avoid the branch cut issues. Considering Corless et al. (2000), (23) and (25), we can define three translations:

\ acot@ {z} \overset{M_{a p l e}}{\mapsto} arccot (z),

(6)

\overset{M_{a p l e}}{\mapsto} \arctan (1 / z),

(7)

\overset{M_{a p l e}}{\mapsto} I / 2 * \ln ((z - 1) / (z + 1)) .

(8)

The position of the branch cut of the arccotangent function differs after the direct translation (6), which may lead to incorrect calculations later on. The alternative translations (7) and (8) use other functions instead of the arccotangent function. The arctangent function (7) and the natural logarithm (8) have the same positioned branch cuts as in the DLMF and in Maple. Translation (7) solves this issue as long as the user does not evaluate the function at z = 0, while translation (8) solves the issue except at z = −i. Note that none of the translations ((6)–(8)) are appropriate.

3.3. Insufficient semantic information

Other problematic cases for translations are the DLMF/DRMF LaTeX macros themselves. In some cases, they do not provide sufficient semantic information to perform translations. One example is the Wronskian determinant. For two differentiable functions w₁, w₂, the Wronskian is defined as (DLMF (1.13.4)):

W {w_{1} (z), w_{2} (z)} = w_{1} (z) w_{2}^{'} (z) - w_{2} (z) w_{1}^{'} (z) .

In semantic LaTeX, it is currently implemented using:

\ Wronskian @ {w_1 (z), w_2 (z)} .

(9)

This translation is unfeasible because the macro does not explicitly define the variable of differentiation for the functions w₁, w₂. For a correct translation, the CAS needs to be aware of the variable of differentiation z. We solved this issue by creating a new macro \Wron, e.g.:

\ Wron {z} @ {w_1 (z)} {w_2 (z)} .

(10)

This example demonstrates that the DLMF/DRMF LaTeX macros are still a work in progress and further updates are sometimes necessary in order to further encapsulate critical semantic information.

3.4. Potentially ambiguous expressions

Since the DLMF/DRMF LaTeX macros aims to cover an extensive set of special functions, orthogonal polynomials and mathematical constants, they do not contain specific macros for other mathematical objects. However, also mathematical expression without functions, polynomials and mathematical constants can be ambiguous. As an example, multiplications are rarely explicitly marked in LaTeX expressions, e.g., scientists using whitespace to indicate multiplications rather than using \cdot or similar symbols. But whitespaces can also be used to improve the readability and not to represent a multiplication.

For such problems, we introduced a new macro \idot for an invisible multiplication symbol (this macro will not be rendered). Since this macro is newly introduced by contributors of the DRMF team, and automatic conversion of existing equations is difficult, none of the equations in the DLMF use this macro. The translator has some simple rules for performing translations without explicitly marking multiplication translations with \idot.

The DLMF/DRMF LaTeX macros do not guarantee entirely disambiguated expressions. In Table II there are four examples of potentially ambiguous expressions. These expressions are unambiguous for the LaTeX compiler since it only considers the very next token for superscripts and subscripts. Our translator follows the same rules to solve these issues.

Table II.

Potentially ambiguous LaTeX expressions and how LaTeX displays them

Potentially ambiguous input	LaTeX output
n^m!	n^m!
a^bc^d	a^bc^d
x^y^z	Double superscript error
x_y_z	Double subscript error

Open in a new tab

Another more questionable translation decision addresses alphanumerical expressions. As explained in Table VI, the Part-of-Math (PoM)-tagger handles strings of letters and numbers differently depending on the order of the symbols. The reason is that an expression such as “4b” is usually considered to be a multiplication of 4 and “b”, while “b4” gives the impression that 4 indexing “b”. While the first example produces two nodes, namely 4 and “b”, the second example “b4” produces just a single alphanumerical node in the PoM-Parsed Tree (PPT). The translator interprets alphanumerical expressions as multiplications for two reasons: we would assume that the inputs “4b” and “b4” are mathematically equivalent; and it is more common in mathematics to use single letter names for variables (Cajori, 1994). We have used rules as follows:

4 b \overset{M_{a p l e}}{\mapsto} 4 * b,

b4 \overset{M_{a p l e}}{\mapsto} b * 4,

energy \overset{M_{a p l e}}{\mapsto} e * n * e * r * g * y .

In general, the translator is designed to find a work around for disambiguating expressions. If there is no way to solve a potential ambiguity with defined rules, then we stop the translation process.

Table VI.

A table of all kinds of nodes in a PoM syntax tree

	Node type	Explanation	Example
r has children	Sequence	Contains a list of expressions	a+b is a sequence with three children (a, + and b)
	Balanced expression	Similar to a sequence. But in this case the sequence is wrapped by \left and \right delimiters. Note that normal parentheses do not create balanced expressions	\left(a+b \right) is a balanced expression with three children (a, + and b)
	Fraction	All kinds of fractions, such as\frac, \ifrac, etc.	\ifrac{a}{b} is a fraction with two children (a and b)
	Binomial	Binomials	\binom{a}{b} has two children (a and b)
	Square root	The square root with one child	\sqrt{a} has one child (a)
	Radical with a specified index	nth root with two children	\sqrt[a]{b} has two children (a and b)
	Underscore	The underscore “_” for subscripts	The sequence a_b has two children (a and “_”). The underscore itself “_” has one child (b)
	Caret	The caret “^” for superscripts or exponents. Similar to the underscore	The sequence a^b has two children (a and “^”). The caret itself “^” has one child (b)
r is a leaf	DLMF/DRMF LaTeX macro	A semantic LaTeX macro	\JacobiP, etc.
	Generic LaTeX macro	All kinds of LaTeX macros	\Rightarrow,\alpha, etc.
	Alphanumerical expressions	Letters, numbers and general strings	Depends on the order of symbols. ab3 is alphanumerical, while 4b are two nodes (4 and b)
	Symbols	All kind of symbols	“@”, “*”, “+”, “!”, etc.

Open in a new tab

Note: Note that this table groups some types together for a better overview

Source: For a complete list and a more detailed version see Youssef (2017)

4. The translator

The translator analyzes a parse tree to perform translations. For generating a parse tree of LaTeX expressions, the translator uses the PoM-Tagger (Youssef, 2017) [6]. CAS define their own syntax parser. We were able to use Maple’s internal data structure to obtain a parse tree of the input. Sections 5 and 6 will explain the parsing and translation process in detail.

All translations are defined by a library (Comma-Separated Values and JavaScript Object Notation (JSON) files) that define translation patterns for each function and symbol. The pattern uses $i as a placeholder to determine the positions of the arguments. For example, the translation patterns for the Jacobi polynomial are illustrated in Table III.

Table III.

Forward and backward translation patterns for the Jacobi polynomial example (1) in this manuscript

Forward Translation
Maple	JacobiP ($2, $0, $1, $3)
Mathematica	JacobiP [$2, $0, $1, $3]
Backward Translation from Maple/Mathematica
Semantic LaTeX	\JacobiP {$1}{$2}{$0}@{$3}

Open in a new tab

Notes: The pattern for the backward translation is the same for Maple and Mathematica

The DLMF/DRMF LaTeX macros also allow one to specify optional arguments to distinguish between standard and another version of these functions. The Legendre and associated Legendre functions of the first kind are examples of such cases. The library that defines translations for each macro uses the macro name as the primary key to identify the translations. The Legendre and associated Legendre function of the first kind both use the same macro \LegendreP. To distinguish such cases, we use a special syntax, shown in Table IV.

Table IV.

Example entries of the Legendre and associated Legendre function in the translation library

Semantic macro entry	Maple entry
\LegendreP {\nu}@{x}	LegendreP ($0, $1)
X1:\LegendrePX\LegendreP[\mu]{\nu}@{x}	LegendreP ($1, $0, $2)

Open in a new tab

Notes: The prefix notation X<d>:<name>X defines the translation for <name> with <d>-number of optional arguments

4.1. Escape the placeholder symbol

The used placeholders cause trouble when the CAS uses the symbol $ for other reasons, e.g., differentiation in Maple is implemented as:

diff (f, [x $ n]),

where f is an algebraic expression or an equation, x is the name of the differentiation variable and n is an integer representing the n-th order differentiation [7]. A translation for (d²x²)/(dx²) should be displayed as:

\ deriv [2] {x^2} {x} \overset{M_{a p l e}}{\mapsto} diff (x^2, [x $ 2]),

but would end up as:

\ deriv [2] {x^2} {x} \overset{M_{a p l e}}{\mapsto} diff (x^2, [xx]) .

We can solve this issue by using parentheses in such cases, e.g., diff($1, [$2$ ($0)]).

5. Forward translations

As a pre-processing step, we use the PoM-Tagger (Youssef, 2017) [8] for parsing semantic LaTeX expressions. The PoM-Tagger is defined by a context-free grammar in Backus-Naur Form (BNF) and is an LL-Parser, i.e., it parses the input from Left to right and assigns the Leftmost (first applicable) derivation rule defined by the grammar to an expression. In other words, the PoM-Tagger scans the input for terms and groups them into sub expressions if suitable, where terms are non-terminal symbols in the context of BNF. A node in the generated parse tree will be tagged by meta information if the node matches defined symbols. The meta information is stored in lexicon files. Those lexicon files were manually cultivated for the PoM-Tagger.

We integrated the defined translation patterns from our library also into these lexicon files. The tagger also tags a node in the parse tree by its translation patterns Table V gives an example of an entry of the lexicon file.

Table V.

The entry of the trigonometric sine function in the lexicon file

Symbol: \sin

Feature Set: dlmf-macro

DLMF: \sin@@{ z}

DLMF-Link: dlmf.nist.gov/4.14.E1

Meanings: Sine

Number of Parameters: 0

Number of Variables: 1

Number of Ats: 2

Maple: sin($θ)

Maple-Link: www.maplesoft.com/support/help/maple/view.aspx?path=sin

Mathematica: Sin [$θ]

Mathematica-Link: reference.wolfrarn.com/language/ref/Sin.html

Open in a new tab

The parsed tree generated by the PoM-Tagger is not a mathematical expression tree. The PoM project aims to disambiguate mathematical LaTeX expressions and generates an expression tree. In the current state, however, many expressions still cannot be disambiguated. The PoM-tagger generates a raw parsed tree where each token in the LaTeX expression is a node in the tree. We call this parsed tree the PPT.

The overall forward translation process is explained in Figure 2. All translation patterns and related information are stored in the DLMF/DRMF tables. These tables are converted by the lexicon-creator to the DLMF-macros-lexicon lexicon file. Together with the global-lexicon file, the PPT will be created by the PoM-tagger. The latex-converter takes a string representation of a semantic LaTeX expression and uses the PoM engine as well as our Translator to create a proper string representation for a specified CAS.

5.1. Analyzing the PoM-parsed tree

Since the BNF does not define rules for semantic macros, each argument of the semantic macro and each @ symbol are following siblings of the semantic macro node. That is the reason why we stored the number of parameters, variables and @ symbols in the lexicon files. Otherwise, the translator could not find the end of a semantic macro in the PPT.

Figure 3 visualizes the PPT of the Jacobi polynomial example from Table I. Because of the differences between expression trees and PPT, it can be difficult to generate a string representation after a successful translation process. It is especially difficult to determine necessary and unnecessary parentheses when we analyze the PPT. We create the Translated Expression Object (TEO) which is a list containing already translated sub expressions.

With these tools, we can translate a LaTeX expression by translating the PPT node by node and perform group or reordering operations for some special cases. The algorithm is realized in a simple recursive structure. Whenever the algorithm finds a leaf, it can translate this single term. If the node is not a leaf, it starts to translate all children of the node recursively. This idea appears to be a practical and elegant solution, but it has a significant drawback. It cannot be used to translate functions. Since the arguments of functions are following siblings in the PPT, the algorithm needs to look ahead when a leaf is a known function, e.g., in the case of a semantic macro with arguments (see Figure 3). Algorithm 1 is an improved version with lookahead functionality:

\bar{A l g o r i t h m 1 Abstract translation algorithm to translate PPT .} \bar{I n p u t : Root r of a PoM-Parse tree T . List f o l l o w i n g_s i b l i n g s with the following siblings of r .} The list can be empty . 1 : p r o c e d u r e ABSTRACT_TRANSLATOR (r, f o l l o w i n g_s i b l i n g s) 2 : i f r is leaf t h e n 3 : TRANSLATE_LEAF (r, f o l l o w i n g_s i b l i n g s); 4 : e l s e 5 : c h i l d r e n = r . getChildren (); ⊳ c h i l d r e n is a list of nodes 6: ABSTRACT_TRANSLATOR (c h i l d r e n . removeFirst (), c h i l d r e n); 7 : e n d i f 8 : i f f o l l o w i n g_s i b l i n g s is not empty t h e n 9 : r = f o l l o w i n g_s i b l i n g s . removeFirst (); 10 : ABSTRACT_TRANSLATOR (r, f o l l o w i n g_s i b l i n g s); 11 : e n d i f \underline{12 : e n d p r o c e d u r e}

If the root r is a leaf, it still can be translated as a leaf. Eventually, some of the following siblings are needed to translate r. The list of following_siblings in Line 3 might be reduced to avoid multiple translations for one node. If r is not a leaf, it contains one or more children. We can call the ABSTRACT_TRANSLATOR recursively for the children. Once we have translated r, we can go a step further and translate the next node. Line 8 checks if there are following siblings left and calls the ABSTRACT_TRANSLATOR recursively in such cases. Translated expressions are stored by the TEO object. Algorithm 1 is a simplified version of the translator process. The Lines 3 and 6 process the translations for each node. Table VI gives an overview of all the different node types the root r can be. A more detailed explanation of the types can be found in Youssef (2017).

The BNF grammar defines some basic grammatical rules for generic LaTeX macros, such as for \frac, \sqrt. There is a hierarchical structure for those symbols similar to the structure in expression trees. As already mentioned, some of these types can be translated directly, such as Greek letters, while others are more complex, such as semantic LaTeX macros. The translator delegates the translation to specialized sub-translators. This delegation process is implemented in Lines 3 and 6 of Algorithm 1. Subsection 5.3 discusses these classes in more detail.

5.2. Problems with the lookahead approach

The lookahead functionality appears to solve the problems for functions. However, there is a problem with the lookahead functionality that Section 3 did not address. In some cases, the arguments of a function do not follow but precede the function node.

If we intently examine mathematical notations, we discover many different types of notations used to represent formulae. Table VII illustrates the expression (a+b)x in different notations. The Normal Polish Notation [9] (hereafter called prefix notation) places the operator to the left of/before its operands. The Reverse Polish Notation [10] (hereafter called postfix notation) does the opposite and places the operator to the right of/after its operands. The infix notation is commonly used in arithmetic and places the operator between its operands. This only makes sense if the operator is a binary operator.

Table VII.

The mathematical expression “(a+b) · x” in infix, prefix, postfix and functional notation

Notation	Expression
Infix	(a + b) · x
Prefix	·+ a b x
Postfix	a b + x ·
Functional	·(+ (a, b), x)

Open in a new tab

In mathematical expressions, notations are mostly mixed, depending on the case and number of operands. For example, infix notation is common for binary operators (+, −, ·, mod, etc.), while functional notations are conveniently used for any kind of functions (sin, cos, etc.), and the postfix notation is often common for unary operators (2!, −2, etc.). Sometimes the same symbol is used in different notations to distinguish different meanings. For example, the “−” as a unary operator is used in prefix notation to indicate the negative value of its operand, such as in “−2”. Of course, “−” can also be the binary operator for subtraction, which is commonly used in infix notation.

Since it is more convenient to parse expressions using uniform notations, most programming languages (and CAS as well) internally use prefix or postfix notation and do not mix the notations in one expression. The common practice in science is to use mixed notations in expressions. Since the PoM has rarely implemented mathematical grammatical rules, it takes the input as it is and does not build an expression tree. It parses all four examples from Table VII to four different PPTs rather than to one unique expression tree. In general, this is not a problem for our translation process since most CAS are familiar with most common notations. The translator does not need to know that a and b are the operands of the binary operator “+” in a + b. The translator could simply translate the symbols in a + b in the same order as they appear in the expression and the CAS would understand it.

However, there are two new problems with this approach:

The translated expression is only syntactically correct if the input expression was syntactically correct.
We cannot translate expressions to CAS which use non-standard notations.

Problem 1 should be obvious. Since we want to develop a translation tool and not a verification tool for mathematical LaTeX expressions, we can assume syntactically correct input expressions and produce errors otherwise. Problem 2 is more complex. If a user wants to support a CAS that uses prefix or postfix notation by default, the translator would fail in its current state. Supporting CAS with another notation would be a part of future work.

Nonetheless, adopting different notations, in some situations, could also solve potential ambiguities. Consider the two potentially ambiguous examples in Table VIII. While a scientist would probably just ask for the right interpretation of the first example, Maple automatically computes the first interpretation. On the other hand, LaTeX automatically disambiguates the first example by only recognizing the very next element (single symbols or sequences in curly brackets) for the superscript and therefore displays the second interpretation. The second example should not be misinterpreted since this notation is the standard interpretation in science for the double factorial. We wrote the second interpretation with parentheses for pointing out that we mean the double factorial in this case. However, surprisingly, Maple computes the first interpretation (the factorial of the factorial of n) again rather than the common standard interpretation.

Table VIII.

Potentially ambiguous examples using the factorial and double factorial symbols

	Text format expression	First interpretation	Second interpretation
1:	4^2!	4^2!	4²!
2:	n!!	(n!)!	(n)!!

Open in a new tab

Note: One expression in a text format can potentially be interpreted in different ways

In most cases, parentheses can be used to disambiguate expressions. We used them in Table VIII to clarify the different interpretations in Example 2. Note that the use of parentheses will not always resolve a mistaken computation. For example, there is no way to add parentheses to force Maple to compute “n!!” as the double factorial function. Even “(n)!!” will be interpreted as “(n!)!”. Rather than using the exclamation mark in Maple, one could also use the functional notation. For example, the interpretations “(2!)!” and “(2)!!” can be distinguished in Maple by using factorial(factorial(2)) and doublefactorial(2), respectively. We define the translations as follows:

n! \overset{M_{a p l e}}{\mapsto} factorial (n),

n!! \overset{M_{a p l e}}{\mapsto} doublefactorial (n) .

Algorithm 1 does not allow this translation right now. It has no access to previously translated nodes in its current state. This problem is solved by the TEO that stores and groups translated objects as lists. This allows one to access the latest translated expression and use it as the argument for the factorial function. Table IX shows three examples for the TEO list that groups some tokens.

Table IX.

How the TEO list groups subexpressions

Input Expression	TEO List
a+b	[a,+,b]
(a+b)	[(a+b)]
(a/b)−2	[(a)/(b), −, 2]

Open in a new tab

5.3. Sub-translators

The SequenceTranslator translates the sequence and balanced expressions in the PPT. If a node n is a leaf and the represented symbol is an open bracket (parentheses, square brackets and so on), the following nodes are also taken as a sequence. Combined with the recursive translation approach, the SequenceTranslator also checks balancing of parentheses in expressions. An expression such as “(a]” produces a mismatched parentheses error. On the other hand, this is a problem for real interval expressions such as “[a,b)”. In the current version, the program cannot distinguish between mismatched parentheses and half-opened, half-closed intervals. Whether an expression is an interval or another expression is difficult to decide and can depend on the context. Also, the parentheses checker could simply be deactivated to allow mismatched parentheses in an expression. Another option is to use interval macros, e.g., \intcc@{a}{b} = [a,b].

The SequenceTranslator also handles positions of multiplication symbols. There are a couple of obvious choices to translate multiplication. The most common symbol for multiplications is still the white space (or no space between the tokens), as explained previously. Consider the simple expression “2nπ”. The PPT generates a sequence node with three children, namely, 2, n and π. This sequence should be interpreted as a multiplication of the three elements. The SequenceTranslator checks the types of the current and next nodes in the tree to decide if it should add a multiplication symbol or not. For example, if the current or next node is an operator, a relation symbol or an ellipsis, there will be no multiplication symbol added. However, this approach implies an important property. The translator interprets all sequences of nodes as multiplications as long as it is not defined otherwise. This potentially produces strange effects. Consider an expression such as “f(x)”. Translating this to Maple will give f⁎(x). But we do not consider this translation to be wrong, because there is a semantic macro to represent functions. In this case, the user should use \f{f}@{x} instead of f(x) to distinguish between f as a function call and f as a symbol.

The translation process for the DLMF/DRMF LaTeX macros is complex, so there is a special class, the MacroTranslator, that handles those nodes in the PPT. Algorithm 2 explains the MacroTranslator without error handling. It has extracted necessary information from the PPT, such as how many arguments this function has, in Line 2. It also processes the following siblings to translate the arguments. The MacroTranslator will be called in Line 3 in Algorithm 1, since the macro is a leaf node in the PPT. The following cases describe the different kinds of the following siblings after a semantic macro node. Those can be:

an exponent, such as for “^2” right after the macro node (Line 5);
an optional parameter in square brackets right after the macro node or after an exponent (Line 9);
a parameter in curly brackets (a sequence node in the PPT) if none of the above and no “@” symbols were passed (Line 14);
“@” symbols (Line 15); and
a variable in curly brackets (a sequence node) after the “@” symbols were passed (Line 16).

All cases before the “@” symbols are optional. The MacroTranslator removes all following siblings according to the number of expected parameters and variables. Parameter and variable nodes are translated separately. If an exponent was registered right after the semantic macro node, it will be shifted to the end in Line 19. The macro itself will be translated by putting all translated parameters and variables into the translation pattern (Line 18).

Following siblings after the macro was translated (with all arguments) do not belong to the semantic macro. If the next node is an exponent, the translated macro is the base. Table X shows an example for the translation of the trigonometric cosine function with multiple exponents.

Table X.

A trigonometric cosine function example with exponents before and after the argument

	Semantic LaTeX		Maple
Text Representation	Cos^n@{x}^m	$\overset{M_{a p l e}}{\mapsto}$	((cos(x))^(n))^ m
Displayed As	cosⁿ(x)^m		(cos(x)ⁿ)^m

Open in a new tab

6. Maple to semantic LaTeX translator

In this section, we will discuss several techniques to access the parse tree of Maple’s input. The translation process from this parse tree then follows the same principle as for the forward translations. Instead of writing a custom Maple syntax parser, we use Maple’s internal data structure to obtain the syntax tree of the input [11]. Maple allows several different input styles. The 1D input is mainly used for programming purposes and is also used to perform our translations. Internally, Maple uses a Directed Acyclic Graph (DAG) for syntax trees.

Each node in the DAG stores its children and has a header which defines the type and the length of the node. Consider the polynomial x²+x. Figure 4 illustrates the internal DAG representation with headers and arguments.

One can access the internal data structure of expressions via the ToInert command, which returns the InertForm. The InertForm format is a nested list [12] of the internal DAG for the given expression. Some of the important types for the nodes are specified in Table XI. The translator uses the OpenMaple (Bernardin et al., 2016, §14.3) Application Programming Interface for interacting with Maple’s kernel implementation.

Table XI.

A subset of important internal Maple data types

Type	Explanation
SUM	Sums. Internally stored with factors for each summand, i.e., “x+y” would be stored as “x·1+y·1”
PROD	Products
EXPSEQ	Expression sequence is a kind of list. The arguments of functions are stored in such sequences
INTPOS	Positive integers
INTNEG	Negative integers
COMPLEX	Complex numbers with real and imaginary part
FLOAT	Float numbers are stored in the scientific notation with integer values for the exponent n and the significant m in m·10ⁿ
RATIONAL	Rational numbers are fractions stored in integer values for the numerator and positive integers for the denominator
POWER	Exponentiation with expressions as base and exponent
FUNCTION	Function invocation with the name, arguments and attributes of the function

Open in a new tab

Source: See Bernardin et al. (2016) for a complete list

6.1. Automatic changes of inputs in Maple

Maple evaluates inputs automatically and changes the input into an internal representation. This internal representation may differ to the input. One example has already been given with Figure 4, where each summand of a sum is stored with a factor. Here is a list of all internal changes that occur for inputs:

Maple evaluates input expressions immediately.
There is no data type to represent square roots such as $\sqrt{x}$ (or n-th roots). Maple stores roots as an exponentiation with a fractional exponent. For example, $\sqrt{x}$ is stored as $x^{1 / 2}$ .
There is no data type for subtractions, only for sums. Negative terms are changed to absolute values times “−1”. For example, x−y is stored as x+y·(−1).
Floating point numbers are stored using scientific notation with a mantissa and an exponent in the base 10. For example, 3.1 is internally represented as 31 · 10⁻¹.
There is only a data type for rational numbers (fractions with an integer numerator and a positive integer denominator), but not for general fractions, such as (x + y)/(z). This will be automatically changed to (x+y)·z⁻¹.

There are unevaluation quotes implemented to avoid evaluations on input expressions. Table XII gives an example how unevaluation quotes work.

Table XII.

Example of unevaluation quotes for 1D Maple input expressions

	Without unevaluation quotes	With unevaluation quotes
Input expression:	sin(Pi)+2−1	sin(Pi)+2−1
Stored expression:	1	sin(Pi)+1

Open in a new tab

Since we want to keep a translated expression similar to the input expression, we implemented some cosmetic rules for backward translations which solve or reduce the effects due to the list of changes above:

We use unevaluation quotes to suppress evaluations of the input.
We perform a reordering of factors and summands so that negative factors appear in front of the summand. This gives us the opportunity to translate x−y to x−y instead of x+y·(−1).
We introduced new internal data types MYFLOAT and DIVIDE to translate floats and fractions in more convenient notations.

The translation process then follows the same principle as for the forward translations. Since the syntax tree of Maple is an expression tree, we do not need to implement special reordering or grouping algorithms to perform backward translations. Translations for functions are also realized via patterns and placeholders. Figure 5 illustrates the backward translation process for the Jacobi polynomial example from Table I.

Figure 5. — A scheme of the backward translation process from Maple for the Jacobi polynomial expression $P_{n}^{(α, β)} (\cos (a Θ))$ **Notes:** The input string is converted by the Maple kernel into the nested list representation. This list is translated by subtranslators (blue and red arrows). A function translation (bold blue arrows) is again realized using translation patterns to define the position of the arguments (red arrows)

7. Evaluation

We implemented three approaches to evaluate whether a translation was appropriate or inappropriate:

Round trip tests: translates expressions back and forth and analyzes the changes.
Function relation tests (symbolical): translates mathematically proven equivalent expressions from one system to a CAS and evaluates whether the relation remains valid via symbolical equivalence checks.
Numerical tests: takes the same equations from Approach 2 but evaluates them on specific numerical values to test whether the translation was appropriate.

7.1. Round trip tests

A round trip test always starts with a valid expression either in semantic LaTeX or in Maple. A translation from one system to another is called a step. A complete round trip translation (two steps) is called one cycle. A fixed point representation (or short fixed point) in a round trip translation process is a string representation that is identical to all string representations in the following cycles. Table XIII illustrates an example of a round trip test which reaches a fixed point for the mathematical expression:

\frac{\cos (a Θ)}{2} .

Step 4 is identical to step 2, and since the translator is a deterministic algorithm, it can be easily shown that steps 2 and 3 are fixed point representations for semantic LaTeX and Maple.

Table XIII.

A round trip test reaching a fixed point

Steps	semantic LaTeX/ Maple representations
0	\frac{\cos@{a\Theta}}{2}
1	(cos(a*Theta))/(2)
2	\frac{1}{2}\idot\cos@{a\idot\Theta}
3	(1)/(2)cos(aTheta)
4	\frac{1}{2}\idot\cos@{a\idot\Theta}

Open in a new tab

There is currently only one exception known where a round trip test does not reach a fixed point representation: Legendre’s incomplete elliptic integrals (DLMF (19.2.4–7)) are defined with the amplitude ϕ in the first argument in the DLMF, while Maple takes the trigonometric sine of the amplitude as the first argument. The forward and backward translations are defined as:

\ EllIntF @ {\ phi} {k} \overset{M_{a p l e}}{\mapsto} EllipticF (\sin (phi), k),

(12)

\ EllIntF @ \ asin@ {\ phi} {k} \overset{M_{a p l e}}{\mapsto} EllipticF (phi, k),

(13)

and the round trip translations produce infinite chains of sine and inverse sine calls because there are no evaluations involved.

The round trip tests are very successful, but they only detect errors in string representations. However, because of the simplification techniques of fixed points, we are able to at least detect logical errors in one system: Maple. On the other hand, these tests cannot determine logical errors in the translations between the two systems. Suppose we mistakenly defined an inappropriate forward and backward translation for the sine function:

\ sin @ {\ phi} \overset{M_{a p l e}}{\mapsto} cos (phi),

(14)

\ \cos @ {\ phi} \overset{M_{a p l e}}{\mapsto} \sin (phi) .

(15)

In that case the round trip test would not detect any errors but reaches a fixed point representation.

7.2. Function relation tests

The DLMF is a compendium for special functions and orthogonal polynomials and lists many relations between the functions and polynomials. The idea of this evaluation approach is to translate an entire relation and test whether the relation remains valid after performing the stranslations.

With this idea, we can detect inappropriate translations such as in Equations (14) and (15). Consider the DLMF equation for the sine and cosine function (DLMF (4.21.2)):

\sin (u + v) = \sin u \cos v + \cos u \sin v .

(16)

Assume the translator would forward translate the expression based on Equations (14) and (15). Then:

\ \sin @ {u + v} \overset{M_{a p l e}}{\mapsto} \cos (u + v),

(17)

\ \sin @ @ {u} \ \cos @ @ {v} \overset{M_{a p l e}}{\mapsto} \cos (u) * \sin (v),

(18)

\ \cos @ @ {u} \ \sin @ @ {v} \overset{M_{a p l e}}{\mapsto} \sin (u) * \cos (v) .

(19)

This produces the equation in Maple

\cos (u + v) = \cos u \sin v + \sin u \cos v,

(20)

which is wrong. Since the expression is correct before the translation, we conclude that there was an error during the translation process and our defined translations were inappropriate.

There are two essential problems with this approach. Testing whether expressions are appropriate representations of each other is a challenging task for CAS and they often have difficulties testing simple equations symbolically. For example, consider (DLMF (4.35.34)):

\sinh (x + i y) = \sinh x \cos y + i \cosh x \sin y,

as a difference of the left- and right-hand sides cannot be simplified to zero by default. Furthermore, this approach only checks forward translations because there is no way to automatically check whether two LaTeX expressions are appropriate or inappropriate representations of each other (again this could become feasible with our translator). We use Maple’s simplify function to check if the difference of the left-hand side and the right-hand side of the equation is equal to zero. In addition, we use simplify and check if the division of the right-hand side by the left-hand side returns a numerical value or not. The simplification function is the most powerful function to check whether two expressions are appropriate representations in Maple. However, there are several cases where simplification fails. Because of implementation details, there are some techniques that help Maple to find possible simplifications. For example, we can force Maple to convert the formula:

\sinh x + \sin x,

(21)

to an appropriate representation using their exponential representations, namely:

\frac{1}{2} e^{x} - \frac{1}{2} e^{- x} - \frac{1}{2} i (e^{i x} - e^{- i x}) .

(22)

With such pre-conversions, we are able to improve the simplification process in Maple. However, the limitations of the simplify function are still the weakest part of this verification approach. Consider the complex example (DLMF (12.7.10)):

U (0, z) = \sqrt{\frac{z}{2 π}} K_{1 / 4} (\frac{1}{4} z^{2}),

(23)

where U(0, z) is the parabolic cylinder function and K_v(z) is the modified Bessel function of the second kind. Both functions are well-defined in both systems and we can define a direct translation for Equation (23). The modified Bessel function of the second kind has its branch cut in Maple and in the DLMF at z < 0. However, the argument of K contains a z². If $| ph (z) | \in (π / 2, π)$ the value of the right-hand side of Equation (23) would be no longer on the principal branch. Maple will still compute the principal values independently of the value of z and the translation:

\ BesselK {\ frac {1} {4}} @ {\ frac {1} {4} z^2} \overset{M_{a p l e}}{\mapsto} BesselK (1 / 4, (1 / 4) * z^2),

(24)

is inappropriate if $| ph (z) | \in (π / 2, π)$ . One should instead use the analytic continuation for the right-hand side of Equation (23). To evaluate such complex cases, the previous checks for appropriate representations in CAS are insufficient. We implement numerical tests as an additional step.

7.3. Numerical tests

Consider the difference of the left- and right-hand sides of Equation (23), namely:

D (z) : = U (0, z) = \sqrt{\frac{z}{2 π}} K_{1 / 4} (\frac{1}{4} z^{2}) .

(25)

Table XIV presents four numerical evaluations for D(z), one value for each quadrant in the complex plane.

Table XIV.

Four numerical evaluations of D(z) in Maple

z	D(z)
1+i	2·10⁻¹⁰ − 2·10⁻¹⁰ i
−1+i	2.222121916 − 1.116719816 i
−1−i	2.222121913 + 1.116719816 i
1−i	2·10⁻¹⁰ + 2·10⁻¹⁰ i

Open in a new tab

Considering machine accuracy and the default precision to ten significant digits, we can regard the first and last values as 0 differences. While this evaluation is very powerful, it has a significant problem. Even when all tested values return 0, it does not prove that Equation (23) was appropriately translated. When the values are different from 0, it does indicate that there might be an error satisfying one of the four cases (Cohl et al., 2018):

the numerical engine tests invalid combinations of values;
the translation was inappropriately defined;
there may be an error in the DLMF source; and
there may be an error in Maple.

7.4. Results

There are currently 685 DLMF/DRMF LaTeX macros [13] in total, and 665 of them were implemented in the translator engine. We defined forward translations to Maple for 201 of the macros and backward translations from Maple for 195 functions.

The DLMF provides a data set of LaTeX expressions with semantic macros. We extracted 4,087 equations from the DLMF and applied our round trip and relation tests on them. The translator was able to translate 2,405 [14] (58.8 percent) of the extracted equations without errors. Simplification techniques of Maple were successfully verified for 660 (27.4 percent) of the translated expressions. We applied additional numerical tests for the remaining 1,745 equations. For 418 (24 percent) of them, the numerical tests were valid. More detailed results for numerical and symbolical tests were presented in Cohl et al. (2018).

The evaluation techniques have proven to be very powerful for evaluating CAS and online mathematical compendia such as the DLMF. During the evaluations, we were able to detect several errors in the translation and evaluation engine, and also discovered two errors in the DLMF and one error in Maple’s simplify function.

The numerical test engine was able to discover a sign error in Equation (DLMF (14.5.14)) [15]:

Q_{v}^{- 1 / 2} (\cos θ) = - {(\frac{π}{2 \sin θ})}^{1 / 2} \frac{\cos ((v + (1 / 2)) θ)}{v + (1 / 2)} .

(26)

The error can be found on Olver et al. (2010, p. 359) and has been fixed in the DLMF with version 1.0.16. The same engine also identified a missing comma in the constraint of (DLMF (10.16.7)). The original constraint was given by 2v ≠ −1, −2 −3, … , with a missing comma after the −2.

We have also noticed that our testing procedure is able to identify errors in CAS procedures, namely the Maple simplify procedure. The left-hand side of (DLMF (7.18.4)) is given by:

\frac{d^{n}}{d z^{n}} (e^{z^{2}} erfc z), n = 0, 1, 2, \dots,

where e is the base of the natural logarithm, and erfc is the complementary error function. Our translation correctly produces:

diff ((\exp ((z)^(2)) * erfc (z)), [z $ (n)]) .

However, the Maple 2016 simplify function falsely returns 0 for the translated left-hand side. Maplesoft has confirmed in a private communication that this is indeed a defect in Maple 2016. Furthermore, although the nature of the defect changes, the defect still persists in Maple 2018 as of the publication of this manuscript.

8. Conclusion and future work

During this project we uncovered several problems that need to be solved before providing a translation of mathematical expressions between two systems. The translator concept has proven itself by discovering errors in the online DLMF compendia and the test cases have also shown how difficult it is to validate translated expression. Our validation techniques also assume the correctness of simplification and computational algorithms in CAS. However, combining those techniques and automatically running translation checks not only can discover errors in mathematical compendia but can also detect errors in simplifications or computations of the CAS.

The tasks for future work are diverse. The main task is to improve the translator by implementing more functions and features. For example, for the current state, only translations to Maple’s standard function library were implemented. Maple allows one to load extra packages dynamically and therefore support an enhanced set of functions. This feature would drastically increase the number of possible translations. With such improvements, further work on evaluation techniques become worthwhile to evaluate the DLMF and CAS. Increasing the amount of translatable formulae in the DLMF and improving the verification techniques are also parts of ongoing projects.

The translator was designed to be easily extendable. This allows one to implement translations for other CAS without much effort. However, most LaTeX sources, such as in arXiv, are given in generic LaTeX. Semantic LaTeX, which is a prerequisite for our translator is currently prevalent in the DLMF and DRMF projects alone. Without exclusively given semantic information, the translator is not able to translate functions. Currently, we are working on mathematical information retrieval techniques which will allow for an extension of the translator to generic LaTeX inputs.

Further improvements for numerical tests could be to perform tests for specific (critical) values (Beaumont et al., 2007) with respect to the involved functions. Beamont and collaborators tested identities for multivalued elementary functions by choosing sample points from regions with respect to branch cuts for functions. Choosing sample points from those regions could significantly improve the success rate of the numerical tests.

Acknowledgments

This work was supported by the German Research Foundation (DFG Grant GI-1259–1).

Notes

^1.

The mention of specific products, trademarks, or brand names is for purposes of identification only. Such mention is not to be interpreted in any way as an endorsement or certification of such products or brands by the National Institute of Standards and Technology, nor does it imply that the products so identified are necessarily the best available for the purpose. All trademarks mentioned herein belong to their respective owners.

^2.

The selected CAS Maple, Mathematica, Matlab and SageMath provide import and/or export functions for LaTeX: Maple, http://www.maplesoft.com/support/help/Maple/view.aspx?path=latex (accessed June 2017); Mathematica, https://reference.wolfram.com/language/tutorial/GeneratingAndImportingTeX.html (accessed June 2017); Matlab, http://www.mathworks.com/help/symbolic/latex.html (accessed June 2017); SageMath, http://doc.sagemath.org/html/en/tutorial/latex.html (accessed June 2017).

^3.

There is no adequate definition for what interactive documents are. However, this name is widely used to describe electronic document formats that allow for interactivity to change the content in real time.

^4.

Wolfram Research; Computable Document Format (CDF); http://www.wolfram.com/cdf/, July 2011.

^5.

An abbreviation for SageMath.

^6.

Named according to the Part-of-Speech-Taggers in Natural Language Processing (NLP).

^7.

http://www.maplesoft.com/support/help/maple/view.aspx?path=diff (accessed July 2018).

^8.

Named according to the Part-of-Speech-Taggers in NLP.

^9.

Also known as prefix notation, Warsaw Notation or Łukasiewicz notation. It was invented by J. Łukasiewicz in 1924 to create a parenthesis-free notation (Hamblin, 1962). Note that this notation is indeed parenthesis-free as long as all operators have the same arity.

^10.

Also known as postfix notation. Also invented by J. Łukasiewicz. Same as NPN it does not need parenthesis as long as all operators have the same arity.

^11.

A license of Maple is mandatory to perform backward translations. Our translator uses the version Maple 2016.

^12.

The nested list is a tree representation of a DAG that splits nodes with multiple parents into multiple nodes so that each node has only one parent node.

^13.

The DLMF/DRMF semantic macros are still a work in progress, and the total number is constantly changing.

^14.

All percentages are approximately calculated.

^15.

The equation had originally been stated as shown in Equation (26). The error was reported on April 10, 2017.

Contributor Information

André Greiner-Petter, School of Electrical, Information and Media Engineering, University of Wuppertal, Wuppertal, Germany.

Moritz Schubotz, School of Electrical, Information and Media Engineering, University of Wuppertal, Wuppertal, Germany.

Howard S. Cohl, Applied and Computational Mathematics Division, National Institute of Standards and Technology, Mission Viejo, California, USA

Bela Gipp, School of Electrical, Information and Media Engineering, University of Wuppertal, Wuppertal, Germany.

References

Alex G (2007), “Do open source developers respond to competition? The (La)TeX case study”, Review of Network Economics, Vol. 6 No. 2, pp. 1–25. [Google Scholar]
Beaumont JC, Bradford RJ, Davenport JH and Phisanbut N (2007), “Testing elementary function identities using CAD”, Applicable Algebra in Engineering Communication and Computing, Vol. 18 No. 6, pp. 513–543. [Google Scholar]
Bernardin L, Chin P, DeMarco P, Geddes KO, Hare DEG, Heal KM, Labahn G, May JP, McCarron J, Monagan MB, Ohashi D and Vorkoetter SM (2016), Maple 2016 Programming Guide, Maplesoft, A Division of Waterloo Maple, Waterloo. [Google Scholar]
Cajori F (1994), A History of Mathematical Notations, Dover Publications, Mineola, New York, NY, p. 848. [Google Scholar]
Churchill B and Boyd S (2010), “LaTeX calc”, available at: https://sourceforge.net/projects/latexcalc/ (accessed April 1, 2019).
Cohl HS, Greiner-Petter A and Schubotz M (2018), “Automated symbolic and numerical testing of DLMF formulae using computer algebra systems”, Proceedings of the 11th Conference on Intelligent Computer Mathematics, CICM 2018, RISC, Hagenberg, August 13–17. [Google Scholar]
Cohl HS, McClain MA, Saunders BV, Schubotz M and Williams JC (2014), “Digital repository of mathematical formulae”, in Watt SM, Davenport JH, Sexton AP, Sojka P and Urban J (Eds), Intelligent Computer Mathematics – International Conference, CICM2014, Vol. 8543, Lecture Notes in Computer Science, Springer, Coimbra, July 7–11, pp. 419–422. [Google Scholar]
Cohl HS, Schubotz M, McClain MA, Saunders BV, Zou CY, Mohammed AS and Danoff AA (2015), “Growing the digital repository of mathematical formulae with generic LaTeX sources”, in Kerber M, Carette J, Kaliszyk C, Rabe F and Sorge V (Eds), Intelligent Computer Mathematics – International Conference, CICM 2015, Vol. 9150, Lecture Notes in Computer Science, Springer, Washington, DC, July 13–17, pp. 280–287. [Google Scholar]
Cohl HS, Schubotz M, Youssef A, Greiner-Petter A, Gerhard J, Saunders BV, McClain MA, Bang J and Chen K (2017), “Semantic preserving bijective mappings of mathematical formulae between document preparation systems and computer algebra systems”, in Geuvers H, England M, Hasan O, Rabe F and Teschke O (Eds), Intelligent Computer Mathematics – 10th International Conference, CICM 2017, Vol. 10383, Lecture Notes in Computer Science, Springer, Edinburgh, July 17–21, pp. 115–131. [Google Scholar]
Corless RM, Jeffrey DJ, Watt SM and Davenport JH (2000), “‘According to Abramowitz and Stegun’ or arccoth needn’t be uncouth”, ACM SIGSAM Bulletin, Vol. 34 No. 2, pp. 58–65. [Google Scholar]
Cuypers H, Cohen AM, Knopper JW, Verrijzer R and Spanbroek M (2008), “MathDox, a system for interactive mathematics”, in Luca J and Weippl ER (Eds), Proceedings of EdMedia: World Conference on Educational Media and Technology 2008, Association for the Advancement of Computing in Education (AACE), Vienna, pp. 5177–5182. [Google Scholar]
Davenport JH (2010), “The challenges of multivalued ‘functions’”, in Autexier S, Calmet J, Delahaye D, Ion PDF, Rideau L, Rioboo R and Sexton AP (Eds), Intelligent Computer Mathematics, 10th International Conference, AISC 2010, 17th Symposium, Calculemus 2010, and 9th International Conference, MKM 2010, Vol. 6167, Lecture Notes in Computer Science, Springer, Paris, July 5–10, pp. 1–12. [Google Scholar]
DLMF (2019), NIST Digital Library of Mathematical Functions. Release 1.0.22 of 2019–03–15. Olver FWJ, Olde Daalhuis AB, Lozier DW, Schneider BI, Boisvert RF, Clark CW, Miller BR and Saunders BV (Eds), available at: http://dlmf.nist.gov/
Drake D (2009), Sagetex, available at: https://ctan.org/tex-archive/macros/latex/contrib/sagetex/ (accessed April 1, 2019).
England M, Cheb-Terrab ES, Bradford RJ, Davenport JH and Wilson DJ (2014), “Branch cuts in Maple 17”, ACM Communication in Computer Algebra, Vol. 48 Nos 1/2, pp. 24–27. [Google Scholar]
Giceva J, Lange C and Rabe F (2009), “Integrating web services into active mathematical documents”, in Carette J, Dixon L, Coen CS and Watt SM (Eds), Intelligent Computer Mathematics, 16th Symposium, Calculemus 2009, 8th International Conference, MKM 2009, Held as Part of CICM 2009, Vol. 5625, Lecture Notes in Computer Science, Springer, Grand Bend, July 6–12, pp. 279–293. [Google Scholar]
Hamblin CL (1962), “Translation to and from Polish Notation”, The Computer Journal, Vol. 5 No. 3, pp. 210–213. [Google Scholar]
Knuth DE (1997), The Art of Computer Programming, Volume I: Fundamental Algorithms, 3rd ed., Addison-Wesley, Boston, MA. [Google Scholar]
Knuth DE (1998), Digital Typography, Reissue, Lecture Notes (Book 78), Center for the Study of Language and Information (CSLI), Stanford, CA, p. 685. [Google Scholar]
Kohlhase M (2006), OMDoc – An Open Markup Format for Mathematical Documents [Version 1.2], Lecture Notes in Computer Science, Springer, Heidelberg. [Google Scholar]
Kohlhase M (2008), “Using LaTeX as a semantic markup format”, Mathematics in Computer Science, Vol. 2 No. 2, pp. 279–304. [Google Scholar]
Kohlhase M, Corneli J, David C, Ginev D, Jucovschi C, Kohlhase A, Lange C, Matican B, Mirea S and Zholudev V (2011), “The planetary system: web 3.0 & active documents for STEM”, in Sato M, Mat-suoka S, Sloot PMA, van Albada GD and Dongarra J (Eds), Proceedings of the International Conference on Computational Science, ICCS 2011, Nanyang Technological University, Singapore, Vol. 4, Procedia Computer Science, Elsevier, Singapore, June1–3, pp. 598–607. [Google Scholar]
Miller BR (2004), “LaTeXML: A LaTeX to XML/HTML/MathML converter”, available at: http://dlmf.nist.gov/LaTeXML/ (accessed April 1, 2019).
Miller BR and Youssef A (2003), “Technical aspects of the digital library of mathematical functions”, Annals of Mathematics and Artificial Intelligence, Vol. 38 Nos 1–3, pp. 121–136. [Google Scholar]
Olver FW, Lozier DW, Boisvert RF and Clark CW (2010), NIST Handbook of Mathematical Functions, 1st ed., Cambridge University Press, New York, NY. [Google Scholar]
Schubotz M, Greiner-Petter A, Scharpf P, Meuschke N, Cohl HS and Gipp B (2018), “Improving the representation and conversion of mathematical formulae by considering their textual context”, in Chen J, Goncalves MA, Allen JM, Fox EA, Kan M and Petras V (Eds), Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2018, ACM, Fort Worth, TX, June 3–7, pp. 233–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
Youssef A (2017), “Part-of-math tagging and applications”, in Geuvers H, England M, Hasan O, Rabe F and Teschke O (Eds), Intelligent Computer Mathematics – 10th International Conference, CICM 2017, Vol. 10383, Lecture Notes in Computer Science, Springer, Edinburgh, July 17–21, pp. 356–374. [Google Scholar]

[R1] Alex G (2007), “Do open source developers respond to competition? The (La)TeX case study”, Review of Network Economics, Vol. 6 No. 2, pp. 1–25. [Google Scholar]

[R2] Beaumont JC, Bradford RJ, Davenport JH and Phisanbut N (2007), “Testing elementary function identities using CAD”, Applicable Algebra in Engineering Communication and Computing, Vol. 18 No. 6, pp. 513–543. [Google Scholar]

[R3] Bernardin L, Chin P, DeMarco P, Geddes KO, Hare DEG, Heal KM, Labahn G, May JP, McCarron J, Monagan MB, Ohashi D and Vorkoetter SM (2016), Maple 2016 Programming Guide, Maplesoft, A Division of Waterloo Maple, Waterloo. [Google Scholar]

[R4] Cajori F (1994), A History of Mathematical Notations, Dover Publications, Mineola, New York, NY, p. 848. [Google Scholar]

[R5] Churchill B and Boyd S (2010), “LaTeX calc”, available at: https://sourceforge.net/projects/latexcalc/ (accessed April 1, 2019).

[R6] Cohl HS, Greiner-Petter A and Schubotz M (2018), “Automated symbolic and numerical testing of DLMF formulae using computer algebra systems”, Proceedings of the 11th Conference on Intelligent Computer Mathematics, CICM 2018, RISC, Hagenberg, August 13–17. [Google Scholar]

[R7] Cohl HS, McClain MA, Saunders BV, Schubotz M and Williams JC (2014), “Digital repository of mathematical formulae”, in Watt SM, Davenport JH, Sexton AP, Sojka P and Urban J (Eds), Intelligent Computer Mathematics – International Conference, CICM2014, Vol. 8543, Lecture Notes in Computer Science, Springer, Coimbra, July 7–11, pp. 419–422. [Google Scholar]

[R8] Cohl HS, Schubotz M, McClain MA, Saunders BV, Zou CY, Mohammed AS and Danoff AA (2015), “Growing the digital repository of mathematical formulae with generic LaTeX sources”, in Kerber M, Carette J, Kaliszyk C, Rabe F and Sorge V (Eds), Intelligent Computer Mathematics – International Conference, CICM 2015, Vol. 9150, Lecture Notes in Computer Science, Springer, Washington, DC, July 13–17, pp. 280–287. [Google Scholar]

[R9] Cohl HS, Schubotz M, Youssef A, Greiner-Petter A, Gerhard J, Saunders BV, McClain MA, Bang J and Chen K (2017), “Semantic preserving bijective mappings of mathematical formulae between document preparation systems and computer algebra systems”, in Geuvers H, England M, Hasan O, Rabe F and Teschke O (Eds), Intelligent Computer Mathematics – 10th International Conference, CICM 2017, Vol. 10383, Lecture Notes in Computer Science, Springer, Edinburgh, July 17–21, pp. 115–131. [Google Scholar]

[R10] Corless RM, Jeffrey DJ, Watt SM and Davenport JH (2000), “‘According to Abramowitz and Stegun’ or arccoth needn’t be uncouth”, ACM SIGSAM Bulletin, Vol. 34 No. 2, pp. 58–65. [Google Scholar]

[R11] Cuypers H, Cohen AM, Knopper JW, Verrijzer R and Spanbroek M (2008), “MathDox, a system for interactive mathematics”, in Luca J and Weippl ER (Eds), Proceedings of EdMedia: World Conference on Educational Media and Technology 2008, Association for the Advancement of Computing in Education (AACE), Vienna, pp. 5177–5182. [Google Scholar]

[R12] Davenport JH (2010), “The challenges of multivalued ‘functions’”, in Autexier S, Calmet J, Delahaye D, Ion PDF, Rideau L, Rioboo R and Sexton AP (Eds), Intelligent Computer Mathematics, 10th International Conference, AISC 2010, 17th Symposium, Calculemus 2010, and 9th International Conference, MKM 2010, Vol. 6167, Lecture Notes in Computer Science, Springer, Paris, July 5–10, pp. 1–12. [Google Scholar]

[R13] DLMF (2019), NIST Digital Library of Mathematical Functions. Release 1.0.22 of 2019–03–15. Olver FWJ, Olde Daalhuis AB, Lozier DW, Schneider BI, Boisvert RF, Clark CW, Miller BR and Saunders BV (Eds), available at: http://dlmf.nist.gov/

[R14] Drake D (2009), Sagetex, available at: https://ctan.org/tex-archive/macros/latex/contrib/sagetex/ (accessed April 1, 2019).

[R15] England M, Cheb-Terrab ES, Bradford RJ, Davenport JH and Wilson DJ (2014), “Branch cuts in Maple 17”, ACM Communication in Computer Algebra, Vol. 48 Nos 1/2, pp. 24–27. [Google Scholar]

[R16] Giceva J, Lange C and Rabe F (2009), “Integrating web services into active mathematical documents”, in Carette J, Dixon L, Coen CS and Watt SM (Eds), Intelligent Computer Mathematics, 16th Symposium, Calculemus 2009, 8th International Conference, MKM 2009, Held as Part of CICM 2009, Vol. 5625, Lecture Notes in Computer Science, Springer, Grand Bend, July 6–12, pp. 279–293. [Google Scholar]

[R17] Hamblin CL (1962), “Translation to and from Polish Notation”, The Computer Journal, Vol. 5 No. 3, pp. 210–213. [Google Scholar]

[R18] Knuth DE (1997), The Art of Computer Programming, Volume I: Fundamental Algorithms, 3rd ed., Addison-Wesley, Boston, MA. [Google Scholar]

[R19] Knuth DE (1998), Digital Typography, Reissue, Lecture Notes (Book 78), Center for the Study of Language and Information (CSLI), Stanford, CA, p. 685. [Google Scholar]

[R20] Kohlhase M (2006), OMDoc – An Open Markup Format for Mathematical Documents [Version 1.2], Lecture Notes in Computer Science, Springer, Heidelberg. [Google Scholar]

[R21] Kohlhase M (2008), “Using LaTeX as a semantic markup format”, Mathematics in Computer Science, Vol. 2 No. 2, pp. 279–304. [Google Scholar]

[R22] Kohlhase M, Corneli J, David C, Ginev D, Jucovschi C, Kohlhase A, Lange C, Matican B, Mirea S and Zholudev V (2011), “The planetary system: web 3.0 & active documents for STEM”, in Sato M, Mat-suoka S, Sloot PMA, van Albada GD and Dongarra J (Eds), Proceedings of the International Conference on Computational Science, ICCS 2011, Nanyang Technological University, Singapore, Vol. 4, Procedia Computer Science, Elsevier, Singapore, June1–3, pp. 598–607. [Google Scholar]

[R23] Miller BR (2004), “LaTeXML: A LaTeX to XML/HTML/MathML converter”, available at: http://dlmf.nist.gov/LaTeXML/ (accessed April 1, 2019).

[R24] Miller BR and Youssef A (2003), “Technical aspects of the digital library of mathematical functions”, Annals of Mathematics and Artificial Intelligence, Vol. 38 Nos 1–3, pp. 121–136. [Google Scholar]

[R25] Olver FW, Lozier DW, Boisvert RF and Clark CW (2010), NIST Handbook of Mathematical Functions, 1st ed., Cambridge University Press, New York, NY. [Google Scholar]

[R26] Schubotz M, Greiner-Petter A, Scharpf P, Meuschke N, Cohl HS and Gipp B (2018), “Improving the representation and conversion of mathematical formulae by considering their textual context”, in Chen J, Goncalves MA, Allen JM, Fox EA, Kan M and Petras V (Eds), Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2018, ACM, Fort Worth, TX, June 3–7, pp. 233–242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Youssef A (2017), “Part-of-math tagging and applications”, in Geuvers H, England M, Hasan O, Rabe F and Teschke O (Eds), Intelligent Computer Mathematics – 10th International Conference, CICM 2017, Vol. 10383, Lecture Notes in Computer Science, Springer, Edinburgh, July 17–21, pp. 356–374. [Google Scholar]

PERMALINK

Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems

André Greiner-Petter

Moritz Schubotz

Howard S Cohl

Bela Gipp

Abstract

Purpose –

Design/methodology/approach –

Findings –

Originality/value –

1. Introduction

Table I.

Figure 1.

2. Related work

3. Translation problems

3.1. Different sets of defined functions

3.2. Positions of branch cuts

3.3. Insufficient semantic information

3.4. Potentially ambiguous expressions

Table II.

Table VI.

4. The translator

Table III.

Table IV.

4.1. Escape the placeholder symbol

5. Forward translations

Table V.

Figure 2.

5.1. Analyzing the PoM-parsed tree

Figure 3.

5.2. Problems with the lookahead approach

Table VII.

Table VIII.

Table IX.

5.3. Sub-translators

Table X.

6. Maple to semantic LaTeX translator

Figure 4.

Table XI.

6.1. Automatic changes of inputs in Maple

Table XII.

Figure 5.

7. Evaluation

7.1. Round trip tests

Table XIII.

7.2. Function relation tests

7.3. Numerical tests

Table XIV.

7.4. Results

8. Conclusion and future work

Acknowledgments

Notes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases