Excursion to Recursion

The goal of the article is to introduce several procedures performing parsing and evaluation of complex arithmetic expressions with variables, parentheses and functions – a powerful addition to a rather limited procedure Val in Borland Pascal/Delphi. As an example, an advanced calculator component is presented, allowing a user to deal with sets of formulas and series’.

1. Introduction

Since the first versions of Borland Pascal and now in Delphi, it was only a procedure Val which performed parsing and evaluation of a string – a simple numeric value. We are going to consider procedures capable to parse and evaluate a formula, or a chain of formulas represented either with a string or a string list. Thus, the users of your application will be able to input various formulas and compute them within your application. This computing may be performed either directly in interpretive way (and rather slow), or by means of “compiling” the formulas into the so called postfix representation, and then using it for massive computing with the speed compatible to that as if these formulas were hard-coded at design time and compiled. Naturally it gives us an opportunity to analyze the usefulness and elegance of the recursion techniques for parsing arithmetic expressions and evaluating their postfix representations. Finally a new component – advanced calculator – will be introduced so that it will allow the users to do some math while running your application with a tool much more appropriate than the calculator usually coming with Windows.

2. Interpretive Evaluation

A procedure or function is recursive, if it calls itself within the body of its own declaration either directly or through other procedures – a feature allowed in Pascal and several other high level languages. A trivial example found everywhere is function n! :

if n=1 then Factorial := 1 else Factorial := n*Factorial(n-1); but usually it is recommended to implement this function via an iterative loop as a more efficient solution. The recursive form here is good only in that it exactly matches the definition of Factorial, but in practice each function call requires a certain overhead, and implementation of multiple function calls works less efficiently than a simple iterative loop in this case. Nevertheless,  there are problems whose mathematical models are recursive, and recursive algorithms for them are not only the most natural, but also the preferred method of solution. One of them is an evaluation of an arbitrary arithmetic expression with parentheses, functions and variables in the standard mathematical notation (+, -, *, / and ^ for power). Actually we are going to deal with lists of such expressions, where each line may be either any arithmetic expression, or an equation in the form

Variable = Expression

where the Expression is an arithmetic expression containing either numbers or variables defined earlier in the lines above. This is the so called Linear Elaboration of Declarations (LED) in ADA terminology – the chain of formulas, where each variable first has to appear in the left hand part of an equation, before it is used in the right hand parts below. For example, the following lines

13.1 + 12.3

vol = 300

s = 13

height = vol/s

a = 10*(height - 2)

r = sqrt((a - 2)^2 + a^2)

arcsin(a/r)

e=2.718281828

x=1

y=2

e^(-x^2-y^2)

are such a list. It may be represented by a variable of type TStrings, say via a property Lines of TMemo. We are going to define a pair of overloaded functions

function Evaluate(const ValStrings : TStrings) : boolean; overload;

function Evaluate(const str : string; out res : extended;

const ValStrings : TStrings = nil) : boolean; overload; {optional list of declarations}

to cover the following three situations:

1. The input is purely in TStrings type and it represents any list of expressions obeying to the principle of LED – the first version of the function;

2. The input is in a simple string str (with only a numeric expression in this case), and result is in a variable res (default parameter ValStrings is omitted) – the second version of the function;

3. Combination of the cases 1 and 2 (parameter ValStrings contains variables and equations) – also the second version.

In all three cases a special dynamic array ResultArr : array of extended  is associated with the lines of ValStrings and represents the results of evaluation (if successful) – unit ParsList.pas, folder AdvCalc\Source.

The kernel problem solved by these functions is the parsing and evaluating of an arithmetic expression built according to the syntax of Pascal (and also a caret to denote raising to power, for example  2^2^n, meaning 2^(2^n), or  e^(-x^2-y^2) ). Syntax of expressions in programming languages usually is presented by Backus-Naur Forms (BNF):

<TermOp> ::= + | -

<Expression> ::= <Term> | <TermOp><Term> | <Expression><TermOp><Term>

<FactOp> ::= * | /

<Term> ::= <Factor> | <Term><FactOp><Factor>

<Factor > ::= <PrimaryExpr> | < Factor >^<PrimaryExpr>

<PrimaryExpr> ::= <unsigned number> | <variable> | <function call> | (<Expression>)

The recursive definitions are present in all three components – Expression, Term and Factor, but for the former two it may be easily eliminated. The BNF for Expression just encodes the fact that it is a sequence of Terms delimited by the Term Operation signs (and possibly it can start with these signs), and similarly a Term is a sequence of Factors delimited by the Factor Operation signs. Thus, the key procedure Expression (Fig. 1) uses iterative loops for parsing Terms and Factors, being recursive only because of implementing the definition of the Factor and Primary Expression (the procedure Expression is a modification of that by Jensen and Wirth (1978)). The procedure NextToken scans the WorkStr and returns the next token (in a string variable Tkn), which is either an alpha-numeric string (possibly a number or a variable), or a character (expected to be an operation sign +,-,*,/,^ or parenthesis), or empty.

Any procedure call requires allocation of the stack memory for the parameters, local variables (if any) and for the result (in case of function). To improve the performance, one has to minimize this overhead, thus the functions Expression, Term and Factor are designed without parameters and with minimal number of local variables.

Different exceptions may occur and be raised while parsing. The user can either handle them in his/her try…except clause, or just analyze the Boolean result of the Evaluation after acknowledging the warning message in the ShowMessage box.

The recursion here will never go infinite: it ends as soon as the scanning reaches the end of the WorkString and the string Tkn becomes empty.

This procedure works pretty well, but how fast is it really? As a benchmark, I used a sample where repetitive calculation of equations

x := 1

y := 2                                                                                       (1)

z := 3

f := sqrt(x*x + y*y + z*z)

was run first hard-coded and compiled, and second – via the parsing procedure Evaluate described above. Easily predictable, the parsing works much slower than the hard-coded compiled code: at a 100MHz Pentium it was 1870 ms vs. 2.6 ms, or ~720 times slower. Fortunately, this speed does not really matter as far as it serves only user driven dialogs like that in the Advanced Calculator, where something near 2 ms per formula is still fast enough, but this is not always the case.

Suppose, that the evaluation of the formulas inputted by the user should be performed massively for different values of the arguments rather than just once. For example, this is necessary for drawing a complex 2D or 3D image defined by the given equations. In such a situation the decrease in speed thousand times less than that of the compiled code is absolutely unacceptable. Thus, in the next chapter we will consider an alternative approach.

3. Compiled Evaluation

Processing the user defined equations involves parsing, searching variables and function names in the corresponding lists, and evaluation of ASCII-coded numbers. It is that which causes such a dramatic decrease in speed. Is it possible to do this time consuming processing just once and to encode all operations, their sequence and intermediate binary values into a certain structure so that later an efficient algorithm could evaluate this structure for different arguments with a much greater speed? One of such structures is known as the Postfix notation sequence (Polish notation) – a representation without parenthesis where the operation codes follow their corresponding operands. For example, the postfix notation for a + b*c  is  a b c * + ; for  e-(x^2 +y^2) it is

x 2 ^ y 2 ^ + - exp; for  n!/(m!*(n-m)!) it is n ! m ! n m - ! * /.  Our set of operation now is (+, -, *, /, ^ and ! for factorial).

Thus, the idea is that first we compile the source list of formulas into such structure, and if successful, we will apply an efficient algorithm evaluating these postfix notations. Now we are going to define the corresponding structures and procedures.

The arithmetic expressions are built from operations requiring either two arguments, or just one (like the elementary functions and unary minus). For the sake of simplicity we accept the convention that any operation requires two arguments: if there is an excess argument, it should be always 0. Then, in the case of unary ‘minus’ (or ‘plus’) it works correct ( -x = 0 - x, or in postfix notation: 0 x - ), while for the predefined elementary functions the second argument is just ignored if it’s not necessary.

Our input data – a list of formulas – comes as TStrings, while the output data types representing the postfix notation are given in the Fig. 2: they are dynamic arrays of type object. For each formula (a string with index i of TStrings) there is exactly one real number in dynamic array Parameters[i] and one postfix notation line PostfLines[i]  in object TPostfixList corresponding to this formula. Each element of PostfLines includes a dynamic array of objects TPostfixItems. Each postfix item represents either an operation code op , or an Operand. The latter is represented either by a real value ActualVal (which contains numeric constants from the formulas), or an integer index ParIndex referring to the global dynamic array Parameters (containing values of the already evaluated formulas in the previous lines). The users usually deal only with these two methods of the object TPostfixList: first it is a

function TPostfixList.Compile(const ValStrings : TStrings;

{optional}  PfixLookList : TStrings = nil) : boolean; {just to show postfix notation}

– to compile the source list of formulas into the list of postfix sequences; and then a

function TPostfixList.Evaluate(const GivenParams : array of extended) : extended;

– for massive calculations with this list for different sets of the parameters. For example, suppose that given earlier formulas (1) and default values of x, y, z are in Memo1.Lines. Then with declaration var MyPList : TPostfixList, if MyPList.Compile(Memo1.Lines) is successful, we can call MyPList.Evaluate([5,6,7]), MyPList.Evaluate([8,9]), or to do that in some loop. In other words, we can now override dynamically a certain number of the beginning lines in a list of linearly elaborated declarations. The Project1 in folder Postfix allows the users to enter different formulas, to see the corresponding postfix notations and to evaluate them. The parsing is performed in unit Pars.pas by a procedure (Fig. 3) similar to that of Jensen and Wirth (1978), and the postfix processing code is in the unit Postfix.pas – both are in the folder Postfix.

The key method here is the function Evaluate of object TPostfix, containing an elegant and essentially recursive function GetResult. In general, there are many ways to evaluate a postfix notation sequence: select any triplet of consecutive items

(Operand Operand Operation), substitute it with their result as a new Operand, and so on, until the process ends with just one Operand – the result of the evaluation. But we need such an algorithm, that:

- it does not change the source postfix sequence, leaving it intact for multiple use;

- it does not create new copies of the source sequence (which would be costly).

With these in mind, the recursive function GetResult in TPostfix.Evaluate was designed.

function TPostfix.Evaluate : extended;

var Pos : cardinal; {must point to position next to the desired processing start}

function GetResult : extended;

begin

Dec(Pos); {Pos < 0 may happen only in case of mistakes in postfix sequence}

with Items[Pos] do

if op = nop then result := Operand.Value        {Items[Pos] is an operand}

else result := MyFunctions(GetResult, GetResult){Items[Pos] is an operation}

end;        {Left-to-Right parameter evaluation is essential !}

begin Pos := Count;

result := GetResult

end; {TPostfix.Evaluate}

A non-local variable Pos there is assumed pointing to the position next to one where we want the processing of the postfix sequence actually to start. Every call to GetResult decrements Pos (as a side effect). Depending on what type the Items[Pos] is, GetResult returns either the value of the Operand and immediately ends, or it calls a universal function MyFunction computing the required operation over the two actual parameters – recursive calls to GetResult. The Left-to-Right parameter evaluation here is crucial: otherwise the procedure would not work. (The Left-to-Right evaluation order corresponds to the parameter passing conventions register and pascal, and the default convention in Delphi is register).

First we have to call it with Pos pointing to the position next after the end of a postfix sequence (thus Pos := Count, and the end of a correct postfix sequence must be an operation code). The further actions evolve depending on what items precede the current position in the postfix sequence, building a sophisticated tree of recursive calls. But it is easy to prove that it works correct by the method of mathematical induction.

The shortest (trivial) sequence consists of just one Operand. The next by length is a triplet Operand Operand Operation. It is easy to see that in both cases the GetResult returns a correct result with Pos pointing to the last processed item. This is the basis of the induction. Now, assuming that the GetResult works correct for sequences of length n and less, we have to prove that it works for sequences of the length n+1.

Then, the last item Items[n+1] must be an Operation, therefore at the beginning the GetResult goes to the branch  result := MyFunctions(GetResult, GetResult) with Pos pointing to n. There are two possible cases for the item at the position n, and they are presented in this table:

 Pos 1 … n-1 n n+1 Case 1 … … Operation Operand Operation Case 2 … … … Operation Operation

In the Case 1 the first-parameter call immediately returns the value of the Operand. The second-parameter call, starting with position n-1, must return the correct result by the induction assumption, ‘consuming’ all remaining items.

In the Case 2 the first-parameter call must return the correct result by the induction assumption also, stopping at some position m, 0 < m < n : otherwise the postfix sequence would be wrong. The remaining items from 0 to m-1, again by the induction assumption, must present a correct postfix sequence and be processed successfully by the second-parameter call. This concludes the proof.

To test the efficiency of the compiled evaluation, I used as a benchmark the same formulas (1), and it took 11.6 ms per formula. If the same, but hard-coded formula sqrt(x*x + y*y + z*z) in the format

sqrt(Parameters*Parameters+Parameters*Parameters+Parameters*Parameters)

is substituted into the method TPostfList.Evaluate (instead of the PostfLines[i].Evaluate), it takes 2.6 ms.  So 2.6 vs. 11.6: still our postfix evaluation requires 4.45 times more than the hard-coded version of this formula, but it seems quite reasonable. The recursive function GetResult, although itself without parameters, calls MyFunctions (which has two parameters) and therefore initiates two more recursive calls and corresponding records in the stack. Will it grow exponentially (like in the case of the notorious Ackerman function)? Fortunately, each call decrements the counter of postfix items, and the total number of calls cannot exceed the initial number of items in the postfix sequence – at least for a correct postfix sequence. To enforce this correctness, the method TPostfList.Evaluate prevents from attempts to do that if the field IsCorrect = false in the compiled object.

4. The Features of the Advanced Calculator

Unlike some other software imitations of the old-days (scientific) calculators with only a one-line display and a hidden register, this one (Fig. 4) takes full advantages of the computer screen, implementing the standard mathematical notation for lists of formulas.

While editing the lines in the Input box, each time when you press Enter key or click Evaluation in the menu, the computer performs evaluation of the whole list of lines, displaying the results in the Result box. Conversely, if you have changed the Input, but not pressed Enter or clicked Evaluation, the Results box is cleared.

The advanced calculator is an object of type TForm. You can use it either as a part of your application, adding the required forms and units to your project (the entire content of the folder AdvCalc\Source), or as a separate application (the folder AdvCalc\Exec).  In the former case when adding the forms and the units, the forms HelpForm and AboutBox should not be auto-created (the form CalcForm creates, opens and releases them).

5. Operations over selected items in the calculator

There are three main menu titles (Items, Formula, Series) which apply to a selected set of items in the Input box (usually shown on the blue background in Windows, and in the blue font in this document).

5.1. Calculations with multiple items

To perform calculations listed in the title Items of the menu, first you have to input expressions or equations (one in a line) of a sequence to be processed in the Input box. Then you have to select (by your mouse or shift-arrow keys) the lines and to choose the necessary function in the Items drop down menu. For example, if you select (in blue) this

123 + 456  {not selected lines}

100            {selected lines}

11^2

12^2

k = 13^2

14^2

and click Items/Sum in the menu, it allows to obtain 100 + 11^2 + 12^2 + 13^2 + 14^2 = 730

5.2. Calculations using a Formula

To perform calculations using the same formula for a series of values of a certain variable, say var , first enter the list of the values followed by the formula, select them all together and click Formula in the menu. For example, if you select this

123 + 456                                 123 + 456

var = 1                                     var = 1

2                           or                2

3                                              3

4                                              4

f = var^3                                  var^3

and click Formula in the menu, it allows to obtain function var^3  for  var = 1, 2, 3, 4 .

5.3. Calculations of (almost) infinite sums and products

In opposite to the described above, where operations were made over just as many items as were selected, the main menu option Series allows to compute sums or products of an arbitrary number of terms – expressions depending on an index number, say n, while the selected lines are just three. These lines specify the boundaries of summing (multiplying) and the expression involved:

n = LowerBoundary

HighBoundary

Expression {to be summed, depending on n and probably other variables}

For example, if you select this (shown in blue)

a = 0.123                                  a = 0.123

a + 456                                    a + 456

n = 1                     or                n = 1

10000                                       10000

f = a*n^3                                  a*n^3

and click Series/Sum in the menu, it allows to obtain Sa*n3  for  n = 1..10000.

6. Conclusion

We considered the problem of parsing and evaluating of arithmetic expressions for lists of linear elaborated declarations, and found out that recursive procedures here provide solutions both elegant and efficient. Also we introduced the structures – a hierarchy of objects – for representing and processing postfix notation sequences. Incidentally, these structures are of type object (rather than class), proving that the “old style” objects still may deliver the simplest solution for certain problems. (In this particular case only encapsulation feature of the OOP was used, because inheritance and polymorphism were not required here). Finally, the attached samples of useful software may sweeten these a little bit dry theoretical considerations.

References

K. Jensen, N. Wirth (1978). PASCAL. User Manual and Report

Alexander Gofen