Reliable Software Logo

C++ In Action: Language

Parser

Private methods.


The Parser requires a major makeover. Let’s start with the header file, parse.h. It contains a lot of forward declarations

class Node;
class Scanner;
class Store;
class FunctionTable;
class SymbolTable;
                    

All these classes are mentioned in parse.h either through pointers or references—there is no need to include the header files that define them.

The class Parser has a constructor that takes references to all the major objects it will need in order to parse the stream of tokens from the Scanner. It needs the Store to create VarNodes, FunctionTable to create FunNodes and the SymbolTable to recognize variables and function names. Once the Parser is created, we can only ask it to evaluate the expression passed to it in the Scanner. The Eval method has the side effect of printing the result on the screen (if you're cringing now, I promise you'll be relieved when you read the next chapter of this book).

Parser has a number of private methods that it uses to parse expressions, terms and factors; and a method to execute the evaluation of the expression tree. Private methods are a useful device for code structuring. They cannot be called from outside of the class—they are just helper functions used in the implementations of other, public or private, methods.

class Parser
{
public:
    Parser (Scanner & scanner,
        Store & store,
        FunctionTable & funTab,
        SymbolTable & symTab);
    ~Parser ();
    Status Eval ();
private:
    void   Parse();
    Node * Expr();
    Node * Term();
    Node * Factor();
    void   Execute ();

    Scanner        & _scanner;
    Node           * _pTree;
    Status           _status;
    Store          & _store;
    FunctionTable  & _funTab;
    SymbolTable    & _symTab;
};
                    

The constructor (we are already looking at the implementation file, parse.cpp) initializes all the private variables.

Parser::Parser (Scanner & scanner, 
        Store & store,
        FunctionTable & funTab,
        SymbolTable & symTab )
: _scanner (scanner), 
  _pTree (0), 
  _status (stOk), 
  _funTab (funTab),
  _store (store),
  _symTab (symTab)
{
}
                    

The destructor deletes the parse tree:

Parser::~Parser ()
{
    delete _pTree;
}
                    

The Eval method calls private methods to parse the input and execute the calculation:

Status Parser::Eval ()
{
    Parse ();
    if (_status == stOk)
        Execute ();
    else
        _status = stQuit;
    return _status;
}
                    

Execute calls the Calc method of the top node of the parse tree and prints the result of the (recursive) calculation

void Parser::Execute ()
{
    if (_pTree)
    {
        double result = _pTree->Calc ();
        cout << "  " << result << endl;
    }
}
                    

The parsing starts with the assumption that whatever it is, it’s an expression.

void Parser::Parse ()
{
    _pTree = Expr ();
}
                    

The expression starts with a term. The term can be followed by one of the operators that bind terms: the plus sign, the minus sign, the assignment sign; or nothing at all. These correspond to the four productions:

Expression is Term + Expression
or Term - Expression
or Term = Expression  // the Term must be an lvalue
or just Term
                    

That’s why, after parsing a term we ask the Scanner for the next token and test it against tPlus, tMinus and tAssign. Immediately after a token is recognized we tell the Scanner to accept it.

Node * Parser::Expr ()
{
    Node * pNode = Term ();
    EToken token = _scanner.Token ();
    if (token == tPlus)
    {
        _scanner.Accept ();
        Node * pRight = Expr ();
        pNode = new AddNode (pNode, pRight);
    }
    else if (token == tMinus)
    {
        _scanner.Accept ();
        Node * pRight = Expr ();
        pNode = new SubNode (pNode, pRight);
    }
    else if (token == tAssign)
    {
        _scanner.Accept ();
        Node * pRight = Expr ();
        if (pNode->IsLvalue ())
        {
            pNode = new AssignNode (pNode, pRight);
        }
        else
        {
            _status = stError;
            delete pNode;
            pNode = Expr ();
        }
    }
    return pNode;
}
                    

We proceed in a similar fashion with the Term, following the productions

Term is Factor * Term
or Factor / Term
or just Factor
                    
Node * Parser::Term ()
{
    Node * pNode = Factor ();
    if (_scanner.Token () == tMult)
    {
        _scanner.Accept ();
        Node * pRight = Term ();
        pNode = new MultNode (pNode, pRight);
    }
    else if (_scanner.Token () == tDivide)
    {
        _scanner.Accept ();
        Node * pRight = Term ();
        pNode = new DivideNode (pNode, pRight);
    }
    return pNode;
}
                    

A Factor, in turn, can be one of these

Factor is ( Expression ) // parenthesized expression
or Number // literal floating point number
or Identifier ( Expression ) // function call
or Identifier // symbolic variable
or - Factor  // unary minus
Node * Parser::Factor ()
{
    Node * pNode;
    EToken token = _scanner.Token ();

    if (token == tLParen)
    {
        _scanner.Accept (); // accept '('
        pNode = Expr ();
        if (_scanner.Token() != tRParen)
            _status = stError;
        _scanner.Accept (); // accept ')'
    }
    else if (token == tNumber)
    {
        pNode = new NumNode (_scanner.Number ());
        _scanner.Accept ();
    }
    else if (token == tIdent)
    {
        char strSymbol [maxSymLen + 1];
        int lenSym = maxSymLen;
        // copy the symbol into strSymbol
        _scanner.SymbolName (strSymbol, lenSym);
        int id = _symTab.Find (strSymbol, lenSym);
        _scanner.Accept ();
        if (_scanner.Token() == tLParen) // function call
        {
            _scanner.Accept (); // accept '('
            pNode = Expr ();
            if (_scanner.Token () == tRParen)
                _scanner.Accept (); // accept ')'
            else
                _status = stError;
            if (id != idNotFound && id < _funTab.Size ())
            {
                pNode = new FunNode (
                    _funTab.GetFun (id), pNode);
            }
            else
            {
                cout << "Unknown function \"";
                cout << strSymbol << "\"\n";
            }
        }
        else
        {
            if (id == idNotFound)
                id = _symTab.ForceAdd (strSymbol, lenSym);
            pNode = new VarNode (id, _store);
        }
    }
    else if (token == tMinus) // unary minus
    {
        _scanner.Accept (); // accept minus
        pNode = new UMinusNode (Factor ());
    }
    else
    {
        _scanner.Accept ();
        _status = stError;
        pNode = 0;
    }
    return pNode;
}

To see how the parser works, let’s single step through a simple example. Suppose that the user typed

x = 1
                    

First, a scanner containing the input string is created. It scans the first token and decides that it's an identifier, tIdent. Next, the parser is called to Eval the expression. It starts parsing it by assuming that it is, indeed, an expression

_pTree = Expr ();
                    

Expression must start with a term, so Expr calls Term

Node * pNode = Term ();
                    

Term, on the other hand, expects to see a Factor

Node * pNode = Factor ();
                    

Finally Factor looks at the first token

EToken token = _scanner.Token ();
                    

and compares it to tLParen, tNumber, tIdent and tMinus--in our case the first token is an identifier. It then ask the scanner for the name of the symbol (it is "x") and searches for it in the symbol table. It's done with the first token so it tells the scanner to accept it.

// copy the symbol into strSymbol
_scanner.SymbolName (strSymbol, lenSym);
int id = _symTab.Find (strSymbol, lenSym);
_scanner.Accept ();
                    

The scanner scans for the next token, which is the assignment operator tAssign. The parser looks at this token to see if it a left parenthesis--that would signify a function call. Since it isn’t, it creates a VarNode using the id of the symbol found in the symbol table. If this is a new symbol for which there wasn’t any id, it just adds it to the symbol table and creates a new id.

if (id == idNotFound)
    id = _symTab.ForceAdd (strSymbol, lenSym);
pNode = new VarNode (id, _store);
                    

Factor is now done and it returns a pointer to the node it has just created. Back to Term. It has a node, so now it looks at the next token to see if it is one of tMult or tDivide. Since it’s not, it returns the same node. Back to Expr. Again, a look at the token: is it tPlus, tMinus or tAssign? It is tAssign! It accepts the token (the scanner positions itself at tNumber whose value is 1). So far, the parser has seen a node followed by an equal sign. An equal sign can be followed by an arbitrary expression, so Expr now calls itself to parse the rest of the input which is "1".

_scanner.Accept ();
Node* pRight = Expr ();
                    

Expr again goes through Term and Factor, which creates a NumNode

pNode = new NumNode (_scanner.Number ());
_scanner.Accept ();
                    

Back to Term and Expr with the new node pRight. The parser has now seen a node, pNode, followed by the equal sign and another node, pRight. It looks very much like an assignment statement, except that we don’t know if pNode represents an lvalue. In our case, the node being a VarNode, it does. An assignment node is created:

if (pNode->IsLvalue ())
{
    pNode = new AssignNode (pNode, pRight);
}
                    

Expr returns this AssignNode to Parse. The parse tree now looks like this (Figure 2-7)

Example

Figure 2-7


Since the parsing was successful, we call Execute, which calls _pTree->Calc () . The virtual Calc method of AssignNode calculates the value of the right node—NumNode returns 1—and call the Assign method of VarNode with this value. VarNode has access to the Store and stores the value of 1 in the slot corresponding to "x".


NextNext: Main