We will use in this example the idea of a recursive chain to parse an arithmetic expression by what is called 'recursive descent'. This will give us another way to evaluate arithmetic expressions different from the infix-to-postifix that we studied earlier. To start us off, we need three definitions, of expression, term, and factor: An EXPRESSION is a term followed by a plus sign followed by a exp, or a term alone. A TERM is a factor followed by an asterisk followed by a term, or a factor alone. A FACTOR is either a letter or an expression enclosed in parenthesis. Note that none of the above three items is defined directly in terms (no pun intended) of itself. However, each is defined in terms of itself indirectly. An expression is defined in terms (Sic!) of a term, a term in terms of a factor, and a factor in terms of an expression. Similarly, a factor is defined in terms of an expression, which is defined in terms of a term, which is defined in terms of a factor. Thus the entire set of definitions forms a recursive chain. (A rose is a rose is a rose!). However, the definition of factor which can be a letter is what saves us. (This is like the basis step in an induction proof.) It is the 'deus ex machina'. Let us now give some examples. The simplest form of a factor is a letter. Thus, A, B, C, Q, Z, and M are all factors. They are also terms since a term may be a factor alone. They are also expressions since an expression may be a term alone. Since A is an expression, (A) is a factor and therefore a term as well as an expression. A+B is an example of an expression which is neither a term nor a factor. (A+B), however, is all three. A*B is a term and therefore an expression, but it is not a factor. A*B+C is an expression which is neither a term nor a factor. A*(B+C) is a term and an expression but not a factor. Each of the above examples is a valid expression. This can be shown by applying the definition of an expression to each of them. Consider, however, the string A+*B. It is neither an expression, term, nor factor. It would be instructive for you to attempt to apply the definitions of expression, term, and factor to see that none of them describe the string A+*B. Similarly, (A+B*)C is not a valid expression according to the above definitions. However, A+B+C will be a valid expression. This is because we said an expression is a term plus an expression. If we had stated that an expression is a term plus a term, then A+B+C would not be a valid expression. Why? If we give the definitions of expression, term, and factor in Backus-Naur Form (BNF), we get: expression: | term: | factor: | < (expression)> letter: A | B | C | D | E | F | G | H ... | X | Y | Z Let us now write a program which reads in a character string, prints it out, and then prints VALID if it is a valid expression and INVALID if it is not. We will use three functions to recognize expressions, terms, and factors, respectively. We will use an auxilary function 'getsym' which operates on the variable: 'string'. The variable 'string' is the input character String. 'getsym' then returns the next character. If one is at the end of string, then it returns a null character, which is encoded in ASCII as '/0'. We also have an auxilary method 'trim' which removes all blanks from the given expression. import java.util.*; public static void main() { Scanner scan = new Scanner(System.in); Evaluation example = new Evaluation(); String expression; System.out.println("Give an arithmetic expression whose operands are only "+ " one letter, "); System.out.println("each of which must be capitalized, and only '+' "+ " or '*', e.g.,A+B*C+(D+C) :"); do { System.out.print("Expression : "); expression = scan.nextLine(); boolean ok = example.evaluate(expression); if (ok == true) System.out.println ("the string is VALID."); else System.out.println ("the string is INVALID."); System.out.print("Do you want to continue? (Y/N) "); expression = scan.nextLine(); } while(expression.charAt(0) == 'Y' || expression.charAt(0) == 'y'); } public class Evaluate { private String string; /*infix string to be evaluated*/ private int i; public boolean evaluate(String next) { string = trim(next).toUpperCase(); //get rid of spaces in the expression i = 0; boolean ok = expr(); if (ok && string.length() == i) return ok; else return false; } private boolean expr() { boolean ok; /*flag returned value*/ char c; /*returned value from getsym*/ ok = term(); if (!ok) /*no expression exists*/ return(false); /* look at the next symbol */ c = getsym(); if (c != '+') /*We have found the longest expression (a single term). Reposition string so that it refers to the position immediately after the expression.*/ { if ( c != '\0') //end of string i--; return(true); } /*At this point, we have found a term and a plus sign. We must look for another term.*/ ok = expr(); if (ok) return(true); else return(false); } private boolean term() { boolean ok; /*flag returned value*/ char c; /*returned value from getsym*/ ok = factor(); if (!ok) return(false); c = getsym(); if ( c != '*') /* We have found the longest term ( a single factor). Reposition str so that it refers to the position immediately after the term.*/ { if (c != '\0') i--; return(true); } ok = term(); //equivalent to return term(); if (ok) return(true); else return(false); } private boolean factor() { boolean ok; /*flag returned value*/ char c; /*returned value from getsym.*/ String alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; c = getsym(); if (c != '(' ) /* check for a letter. We do this using the built-in string function of indexOf. If c is not an alphabetic character, indexOf will return a -1, else the address of the first occurrence of c in alpha.*/ if ( alpha.indexOf(c) == -1) return(false); else return(true); /*The factor is a parenthesized expression */ ok = expr(); if (!ok) return(false); c = getsym(); if ( c != ')' ) return(false); else return(true); } private char getsym() { if (i != string.length()) { i++; return(string.charAt(i-1)); } else return('\0'); //reached end of input string } private String trim(String next) { String x = ""; for(int i = 0; i < next.length(); ++i) if (next.charAt(i) != ' ') x += next.substring(i,i+1); return x; } } }//end of the class definition of Evaluate The code above determines if a given arithmetic expression is valid or not. Note that we can easily extend it to subtraction (-) and division (/), by just adding these symbols to the expression and term above. E.g., replace the expression of if (c != '*') by the expression of if ( c != '*' && c != '/'). Question: How does handle exponentiation? However, just knowing whether an arithmetic expression is valid or not does not help us write a program for the machine to evaluate. I.e., not only do we want to know if the expression is valid, but also we want a means for the program to evaluate it and return a numerical value. This can be done rather easily by a slight modification to the above program. To show this, we must decide how a machine will evaluate an arithmetic expression. To simplify life, we will assume that we have a machine with one general purpose register, the accumulator. This was the machine created by von Neumann in his IAS machine built in the late 1940's at Princeton. (The first "von Neumann machine". Today's machines, of course, have many general purpose registers. Although for the Intel X86 machines, the A register stands for the accumulator. So assume that we have a one-address machine. That is to say, a machine instruction has at most one operand. So typical instructions would be ADD X STO Y LOD Z where ADD X means add the contents of memory address X to the accumulator and store the results into the accumulator. Similarly, STO Y means store the contents of the accumulator into memory address Y. LOD Z means load the contents of memory address Z into the accumulator. So if we consider the contents of memory address X as Mem[X] (consider memory as a one-dimensional array), we have: ADD X accumulator <-- accumulator + Mem[X] STO Y Mem[Y] <-- accumulator LOD Z accumulator <-- Mem[Z] then the following modification to the above exp, term, and factor, would generate machine code: public class Compile { private String string; /*infix string to be evaluated*/ private int i,j; public boolean evaluate(String next) { string = trim(next).toUpperCase(); i = j = 0; String ok = expr(); if (!ok.equals("false") && string.length() == i) return true; else return false; } private String expr() { String ok,firstFactor; /*flag returned value*/ char c; /*returned value from getsym*/ ok = term(); firstFactor = ok; if (ok.equals("false")) /*no expression exists*/ return("false"); /* look at the next symbol */ c = getsym(); if (c != '+') /*We have found the longest expression (a single term). Reposition string so that it refers to the position immediately after the expression.*/ { if ( c != '\0') i--; return(firstFactor); } /*At this point, we have found a term and a plus sign. We must look for another term.*/ ok = expr(); if (!ok.equals("false")) { System.out.println("LOD "+firstFactor); System.out.println("ADD "+ok); System.out.println("STO T"+j); j = j+1; return("T"+(j-1)); } else return("false"); } private String term() { String ok,firstFactor; /*flag returned value*/ char c; /*returned value from getsym*/ ok = factor(); firstFactor = ok; if (ok.equals("false")) return("false"); c = getsym(); if ( c != '*') /* We have found the longest term ( a single factor). Reposition str so that it refers to the position immediately after the term.*/ { if (c != '\0') i--; return(firstFactor); } ok = term(); //equivalent to return term(); if (!ok.equals("false")) { System.out.println("LOD "+firstFactor); System.out.println("MUL "+ok); System.out.println("STO T"+j); j = j+1; return("T"+(j-1)); } else return("false"); } private String factor() { String ok; /*flag returned value*/ char c; /*returned value from getsym.*/ String alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; c = getsym(); if (c != '(' ) /* check for a letter. We do this using the built-in string function of indexOf. If c is not an alphabetic character, indexOf will return a -1, else the address of the first occurrence of c in alpha.*/ if ( alpha.indexOf(c) == -1) return("false"); else return(c+""); /*The factor is a parenthesized expression */ ok = expr(); if (ok.equals("false")) return("false"); c = getsym(); if ( c != ')' ) return("false"); else return("T"+(j-1)); } private char getsym() { if (i != string.length()) { i++; return(string.charAt(i-1)); } else return('\0'); //reached end of input string } private String trim(String next) { String x = ""; for(int i = 0; i < next.length(); ++i) if (next.charAt(i) != ' ') x += next.substring(i,i+1); return x; } } What we have done is replace the returning of boolean by three methods (exp, term, and factor) by returning a String. Then rather than returning a boolean "true" if everything is satisfactory, we return the operand as a String. (Note that the operands A, B, etc., would really be replaced by the compiler by their memory addresses.) Since we only have one temporary memory address in the CPU (the accumulator), we must generate temporary storage addresses for storing intermediate values in the calculation. We call them T0, T1, etc. We do not try to optimize the code generated, so that it is not a very efficient machine code generated. We just want to show how it can be done. For example, the input of A + B, would generate LOD A ADD B STO T0 but the input of A + B + C would generate: LOD B ADD C STO T0 LOD A ADD T0 STO T1 while a more efficient machine code would be: LOD B ADD C ADD A STO T0 Even worse would be A+B*C+D LOD B MUL C STO T0 LOD T0 ADD D STO T1 LOD A ADD T1 STO T2 However, it does generate correct machine code to evaluate the expression.