**Excursion to Recursion**

The
goal of the article is to introduce several procedures performing parsing and
evaluation of complex arithmetic expressions with variables, parentheses and
functions – a powerful addition to a rather limited procedure *Val*
in Borland Pascal/Delphi. As an example, an advanced calculator component is
presented, allowing a user to deal with sets of formulas and series’.

**1. Introduction **

** **

** **Since the first versions of Borland Pascal and now in Delphi, it was
only a procedure *Val* which performed parsing and evaluation of a string
– a simple numeric value. We are going to consider procedures capable to
parse and evaluate a formula, or a chain of formulas represented either with a
string or a string list. Thus, the users of your application will be able to
input various formulas and compute them within your application. This computing
may be performed either directly in interpretive way (and rather slow), or by
means of “compiling” the formulas into the so called postfix
representation, and then using it for massive computing with the speed
compatible to that as if these formulas were hard-coded at design time and
compiled. Naturally it gives us an opportunity to analyze the usefulness and
elegance of the recursion techniques for parsing arithmetic expressions and
evaluating their postfix representations. Finally a new component –
advanced calculator – will be introduced so that it will allow the users
to do some math while running your application with a tool much more
appropriate than the calculator usually coming with Windows. ** **

** **

**2. Interpretive Evaluation**

** **

A
procedure or function is *recursive,* if it calls itself within the body
of its own declaration either directly or through other procedures – a
feature allowed in Pascal and several other high level languages. A trivial
example found everywhere is function *n!* :

**if** *n=1* **then** *Factorial := 1 ***else***
Factorial* := *n*Factorial(n-1)*; but usually it is recommended to
implement this function via an iterative loop as a more efficient solution. The
recursive form here is good only in that it exactly matches the definition of
Factorial, but in practice each function call requires a certain overhead, and
implementation of multiple function calls works less efficiently than a simple
iterative loop in this case. Nevertheless, there are problems whose
mathematical models are recursive, and recursive algorithms for them are not
only the most natural, but also the preferred method of solution. One of them
is an evaluation of an arbitrary arithmetic expression with parentheses, functions
and variables in the standard mathematical notation (+, -, *, / and ^ for
power). Actually we are going to deal with lists of such expressions, where
each line may be either any arithmetic expression, or an equation in the form

*Variable = Expression*

where the *Expression*
is an arithmetic expression containing either numbers or variables defined
earlier in the lines above. This is the so called *Linear Elaboration of
Declarations* (LED) in ADA terminology – the chain of formulas, where each
variable first has to appear in the left hand part of an equation, before it is
used in the right hand parts below. For example, the following lines

*13.1 + 12.3*

*vol = 300*

*s = 13*

*height = vol/s*

*a = 10*(height - 2)*

*r = sqrt((a - 2)^2 + a^2)*

*arcsin(a/r)*

*e=2.718281828*

*x=1*

*y=2*

*e^(-x^2-y^2)*

are such a list. It may be
represented by a variable of type *TStrings*, say via a property *Lines*
of *TMemo*. We are going to define a pair of overloaded functions

**function** Evaluate(**const** ValStrings : TStrings) : **boolean;
overload**;

**function** Evaluate(**const** str : string; **out** res :
extended;

**const** ValStrings
: TStrings = **nil**) : **boolean; overload;** *{optional list of declarations}*

* *

to cover the following three situations:

1. The input is purely in TStrings type and it represents any list of expressions obeying to the principle of LED – the first version of the function;

2. The input is in
a simple string *str* (with only a numeric expression in this case), and
result is in a variable *res *(default parameter ValStrings is omitted)
– the second version of the function;

3. Combination of
the cases 1 and 2 (parameter *ValStrings *contains variables and
equations) – also the second version.

In all three cases a special
dynamic array *ResultArr* :** array of** *extended *is associated
with the lines of *ValStrings* and represents the results of evaluation
(if successful) – unit ParsList.pas, folder AdvCalc\Source.

The
kernel problem solved by these functions is the parsing and evaluating of an
arithmetic expression built according to the syntax of Pascal (and also a caret
to denote raising to power, for example *2^2^n, *meaning *2^(2^n)*,
or *e^(-x^2-y^2)* ). Syntax of expressions in programming languages
usually is presented by Backus-Naur Forms (BNF):

<TermOp> ::= **+** |
**-**

<Expression> ::= <Term> | <TermOp><Term> | <Expression><TermOp><Term>

<FactOp> ::= ***** |
**/**

<Term> ::= <Factor> | <Term><FactOp><Factor>

<Factor > ::= <PrimaryExpr>
| < Factor >**^**<PrimaryExpr>

<PrimaryExpr> ::=
<unsigned number> | <variable> | <function call> | **(**<Expression>**)**

** **

The recursive definitions are
present in all three components – *Expression, Term* and *Factor*,
but for the former two it may be easily eliminated. The BNF for *Expression*
just encodes the fact that it is a sequence of *Terms* delimited by the *Term
Operation* signs (and possibly it can start with these signs), and similarly
a *Term* is a sequence of *Factors* delimited by the *Factor
Operation* signs. Thus, the key procedure *Expression* (Fig. 1) uses
iterative loops for parsing *Terms* and *Factors*, being recursive
only because of implementing the definition of the* Factor* and *Primary
Expression* (the procedure *Expression* is a modification of that by
Jensen and Wirth (1978)). The procedure *NextToken* scans the *WorkStr*
and returns the next token (in a string variable *Tkn*), which is either
an alpha-numeric string (possibly a number or a variable), or a character
(expected to be an operation sign +,-,*,/,^ or parenthesis), or empty.

Any **procedure**
call requires allocation of the stack memory for the parameters, local
variables (if any) and for the result (in case of **function**). To improve
the performance, one has to minimize this overhead, thus the functions *Expression,
Term* and *Factor* are designed without parameters and with minimal
number of local variables.

Different
exceptions may occur and be raised while parsing. The user can either handle
them in his/her **try…except** clause, or just analyze the Boolean
result of the *Evaluation* after acknowledging the warning message in the *ShowMessage*
box.

The recursion here
will never go infinite: it ends as soon as the scanning reaches the end of the *WorkString*
and the string *Tkn* becomes empty.

This procedure works pretty well, but how fast is it really? As a benchmark, I used a sample where repetitive calculation of equations

*x
:= 1*

*y
:= 2
*(1)

*z
:= 3*

*f
:= sqrt(x*x + y*y + z*z)*

was run first hard-coded and
compiled, and second – via the parsing procedure *Evaluate*
described above. Easily predictable, the parsing works much slower than the
hard-coded compiled code: at a 100MHz Pentium it was 1870 ms vs. 2.6 ms, or ~720 times slower. Fortunately, this speed does
not really matter as far as it serves only user driven dialogs like that in the
Advanced Calculator, where something near 2 ms per formula is still fast
enough, but this is not always the case.

Suppose, that the evaluation of the formulas inputted by the user should be performed massively for different values of the arguments rather than just once. For example, this is necessary for drawing a complex 2D or 3D image defined by the given equations. In such a situation the decrease in speed thousand times less than that of the compiled code is absolutely unacceptable. Thus, in the next chapter we will consider an alternative approach.

**3. Compiled Evaluation **

** **

Processing the user
defined equations involves parsing, searching variables and function names in
the corresponding lists, and evaluation of ASCII-coded numbers. It is that
which causes such a dramatic decrease in speed. Is it possible to do this time
consuming processing just once and to encode all operations, their sequence and
intermediate binary values into a certain structure so that later an efficient
algorithm could evaluate this structure for different arguments with a much
greater speed? One of such structures is known as the Postfix notation sequence
(Polish notation) – a representation without parenthesis where the
operation codes follow their corresponding operands. For example, the postfix
notation for *a + b*c* is *a b c * + ; *for *e ^{-(x^2 +y^2)}*
it is

*x 2 ^ y 2 ^ + - exp; *for *n!/(m!*(n-m)!) *it is *n ! m ! n m - ! *
/. *Our set of operation now is (+, -, *, /, ^ and ! for factorial).

* *

Thus, the idea is that first we compile the source list of formulas into such structure, and if successful, we will apply an efficient algorithm evaluating these postfix notations. Now we are going to define the corresponding structures and procedures.

The
arithmetic expressions are built from operations requiring either two
arguments, or just one (like the elementary functions and unary minus). For the
sake of simplicity we accept the convention that *any* operation requires
two arguments: if there is an excess argument, it should be always 0. Then, in
the case of unary ‘minus’ (or ‘plus’) it works correct (
*-x = 0 - x, *or in postfix notation: *0 x -* ), while for the
predefined elementary functions the second argument is just ignored if
it’s not necessary.

Our
input data – a list of formulas – comes as *TStrings*, while
the output data types representing the postfix notation are given in the Fig.
2: they are dynamic arrays of type **object**. For each formula (a string
with index *i* of *TStrings*) there is exactly one real number in
dynamic array *Parameters[i]* and one postfix notation line *PostfLines[i]*
in **object** *TPostfixList* corresponding to this formula. Each
element of *PostfLines* includes a dynamic array of objects *TPostfixItems*.
Each postfix item represents either an operation code *op* , or an *Operand*.
The latter is represented either by a real value *ActualVal* (which
contains numeric constants from the formulas), or an integer index *ParIndex *referring
to the global dynamic array *Parameters* (containing values of the already
evaluated formulas in the previous lines). The users usually deal only with
these two methods of the object *TPostfixList*: first it is a

**function** TPostfixList.Compile(**const** ValStrings : TStrings;

* {optional}* PfixLookList : TStrings = **nil**) : boolean; *{just to show postfix notation}*

* *

– to compile the source list of formulas into the list of postfix sequences; and then a

**function** TPostfixList.Evaluate(**const** GivenParams : **array
of** extended) : extended;

– for massive
calculations with this list for different sets of the parameters. For example,
suppose that given earlier formulas (1) and default values of *x, y, z*
are in Memo1.Lines. Then with declaration **var** *MyPList : TPostfixList*,
if *MyPList.Compile(Memo1.Lines)* is successful, we can call *MyPList.Evaluate([5,6,7]),
MyPList.Evaluate([8,9]),* or to do that in some loop. In other words, we can
now override dynamically a certain number of the beginning lines in a list of
linearly elaborated declarations. The Project1 in folder Postfix allows the users
to enter different formulas, to see the corresponding postfix notations and to
evaluate them. The parsing is performed in unit Pars.pas by a procedure (Fig.
3) similar to that of Jensen and Wirth (1978), and the postfix processing code
is in the unit Postfix.pas – both are in the folder Postfix.

The
key method here is the function *Evaluate* of object *TPostfix*,
containing an elegant and essentially recursive function *GetResult*. In
general, there are many ways to evaluate a postfix notation sequence: select
any triplet of consecutive items

(*Operand Operand Operation*),
substitute it with their result as a new *Operand*, and so on, until the
process ends with just one *Operand* – the result of the evaluation.
But we need such an algorithm, that:

- it does not change the source postfix sequence, leaving it intact for multiple use;

- it does not create new copies of the source sequence (which would be costly).

With
these in mind, the recursive function *GetResult* in *TPostfix.Evaluate*
was designed.

**function** TPostfix.Evaluate :
extended;

**var** Pos : cardinal; *{must point to position next to the desired processing
start}*

**function**
GetResult : extended;

**begin**

Dec(Pos);
*{Pos < 0 may happen only in case of mistakes in
postfix sequence}*

**with**
Items[Pos] **do**

**if** op = nop **then** result := Operand.Value *{Items[Pos] is an operand}*

**else** result := MyFunctions(GetResult, GetResult)*{Items[Pos] is an operation}*

**end;**
*{Left-to-Right parameter evaluation is
essential !}*

**begin** Pos := Count;

result := GetResult

**end**; *{TPostfix.Evaluate}*

* *

A
non-local variable *Pos* there is assumed pointing to the position next to
one where we want the processing of the postfix sequence actually to start. Every
call to *GetResult* decrements *Pos* (as a side effect). Depending on
what type the *Items[Pos]* is, *GetResult* returns either the value
of the *Operand* and immediately ends, or it calls a universal function *MyFunction*
computing the required operation over the two actual parameters –
recursive calls to *GetResult*. The *Left-to-Right
parameter evaluation* here is crucial: otherwise the procedure would not
work. (The *Left-to-Right evaluation *order corresponds to the parameter
passing conventions **register **and **pascal**, and the default** **convention
in Delphi is** register**).

First
we have to call it with *Pos* pointing to the position next after the end
of a postfix sequence (thus *Pos := Count,* and the end of a correct
postfix sequence must be an operation code). The further actions evolve
depending on what items precede the current position in the postfix sequence,
building a sophisticated tree of recursive calls. But it is easy to prove that
it works correct by the method of mathematical induction.

The
shortest (trivial) sequence consists of just one *Operand*. The next by
length is a triplet *Operand Operand Operation. *It is easy to see that in
both cases the *GetResult* returns a correct result with *Pos*
pointing to the last processed item. This is the basis of the induction. Now,
assuming that the *GetResult *works correct for sequences of length *n*
and less, we have to prove that it works for sequences of the length *n+1*.

Then,
the last item *Items[n+1]* must be an *Operation*, therefore at the
beginning the *GetResult* goes to the branch result := MyFunctions(GetResult,
GetResult) with *Pos* pointing to *n*.
There are two possible cases for the item at the position *n*, and they
are presented in this table:

* *

Pos |
1 |
… |
n-1 |
n |
n+1 |

Case 1 |
… |
… |
Operation |
Operand |
Operation |

Case 2 |
… |
… |
… |
Operation |
Operation |

In
the Case 1 the first-parameter call immediately returns the value of the *Operand*.
The second-parameter call, starting with position *n-1,* must return the
correct result by the induction assumption, ‘consuming’ all
remaining items.

In
the Case 2 the first-parameter call must return the correct result by the
induction assumption also, stopping at some position *m, 0 < m < n*
: otherwise the postfix sequence would be wrong. The remaining items from *0*
to *m-1,* again by the induction assumption, must present a correct
postfix sequence and be processed successfully by the second-parameter call.
This concludes the proof.

To test the
efficiency of the compiled evaluation, I used as a benchmark the same formulas
(1), and it took 11.6 ms per formula. If the same, but hard-coded formula *sqrt(x*x
+ y*y + z*z) *in the format

*sqrt(Parameters[0]*Parameters[0]+Parameters[1]*Parameters[1]+Parameters[2]*Parameters[2])*

* *

is substituted into the
method *TPostfList.Evaluate* (instead of the *PostfLines[i].Evaluate*),
it takes 2.6 ms. So 2.6 vs. 11.6: still our postfix evaluation
requires 4.45 times more than the hard-coded version of this formula, but it
seems quite reasonable. The recursive function *GetResult*, although
itself without parameters, calls *MyFunctions* (which has two parameters)
and therefore initiates two more recursive calls and corresponding records in
the stack. Will it grow exponentially (like in the case of the notorious
Ackerman function)? Fortunately, each call decrements the counter of postfix
items, and the total number of calls cannot exceed the initial number of items
in the postfix sequence – at least for a __correct__ postfix sequence.
To enforce this correctness, the method *TPostfList.Evaluate* prevents
from attempts to do that if the field *IsCorrect = false* in the compiled
object.

**4. The Features of the Advanced Calculator**

Unlike some other software imitations of the old-days (scientific) calculators with only a one-line display and a hidden register, this one (Fig. 4) takes full advantages of the computer screen, implementing the standard mathematical notation for lists of formulas.

* *

While
editing the lines in the Input box, each time when you press **Enter** key
or click **Evaluation** in the menu, the computer performs evaluation of the
whole list of lines, displaying the results in the Result box. Conversely, if
you have changed the Input, but not pressed **Enter** or clicked **Evaluation**,
the Results box is cleared.

The advanced calculator is an object of type TForm. You can use it either as a part of your application, adding the required forms and units to your project (the entire content of the folder AdvCalc\Source), or as a separate application (the folder AdvCalc\Exec). In the former case when adding the forms and the units, the forms HelpForm and AboutBox should not be auto-created (the form CalcForm creates, opens and releases them).

**5. Operations over selected items in the calculator**

** **

There are three main menu titles (Items, Formula, Series) which apply to a selected set of items in the Input box (usually shown on the blue background in Windows, and in the blue font in this document).

** **

** **

**5.1. Calculations with multiple items **

** **

To perform calculations
listed in the title** Items **of the menu, first you have to input
expressions or equations (one in a line) of a sequence to be processed in the
Input box. Then you have to select (by your mouse or shift-arrow keys) the
lines and to choose the necessary function in the **Items** drop down menu.
For example, if you select (in blue) this

*123 + 456 {not selected
lines}*

*100 **{selected lines}*

*11^2*

*12^2*

*k = 13^2*

*14^2*

and click **Items/Sum** in
the menu, it allows to obtain *100 + 11^2 + 12^2 + 13^2 + 14^2 = 730*

**5.2. Calculations using a Formula **

** **

To
perform calculations using the same formula for a series of values of a certain
variable, say *var* , first enter the list of the values followed by the
formula, select them all together and click **Formula** in the menu. For
example, if you select this

*123 + 456 123
+ 456*

*var = 1 var
= 1*

*2* or *2*

*3 3*

*4 4 *

*f = var^3 var^3*

and click **Formula** in
the menu, it allows to obtain function* var^3* for *var = 1, 2, 3, 4*
.

** **

**5.3. Calculations of (almost) infinite sums and
products **

** **

In
opposite to the described above, where operations were made over just as many
items as were selected, the main menu option **Series **allows to compute
sums or products of an arbitrary number of terms – expressions depending
on an index number, say *n*, while the selected lines are just three.
These lines specify the boundaries of summing (multiplying) and the expression
involved:

*n = LowerBoundary
*

*HighBoundary*

*Expression** {to be summed, depending on n and probably other
variables}*

For example, if you select this (shown in blue)

*a = 0.123 a
= 0.123*

*a + 456 a
+ 456*

*n = 1 *or* n = 1*

*10000* *10000*

*f = a*n^3 a*n^3*

and click **Series/Sum**
in the menu, it allows to obtain *S**a*n3* for *n =
1..10000.*

* *

**6. Conclusion**

We considered the
problem of parsing and evaluating of arithmetic expressions for lists of linear
elaborated declarations, and found out that recursive procedures here provide
solutions both elegant and efficient. Also we introduced the structures –
a hierarchy of objects – for representing and processing postfix notation
sequences. Incidentally, these structures are of type **object** (rather
than **class**), proving that the “old style” objects still may
deliver the simplest solution for certain problems. (In this particular case
only encapsulation feature of the OOP was used, because inheritance and
polymorphism were not required here). Finally, the attached samples of useful
software may sweeten these a little bit dry theoretical considerations.

**References**

K. Jensen, N. Wirth (1978). PASCAL. User Manual and Report

Alexander Gofen

** **