Reliable Software Logo

C++ In Action: Language

Stubbed Implementation

I will follow the top-down implementation strategy. It means that I will run the program first and implement it later. (Actually, I will run it as soon as I create stubs for all the top level classes.)

Let’s start with the Scanner. It is constructed out of a buffer of text (a line of text, to be precise). It keeps a pointer to that buffer and, later, it will be able to scan it left to right and convert it to tokens. For now the constructor of the Scanner stub announces its existence to the world and prints the contents of the buffer.

class Scanner
{
public:
    Scanner (char const * buf);
private:
    char const * const _buf;
};

Scanner::Scanner (char const * buf)
    : _buf (buf)
{
    cout << "Scanner with \"" << buf << "\"" << endl;
}

The SymbolTable stub will be really trivial for now. We only assume that it has a constructor.

class SymbolTable
{
public:
    SymbolTable () {}
};

                      

The Parser will need access to the scanner and to the symbol table. It will parse the tokens retrieved from the Scanner and evaluate the resulting tree. The method Eval is supposed to do that. It should return a status code that would depend on the result of parsing. We combine the three possible statuses into an enumeration. An enum is an integral type that can only take a few predefined values. These values are given symbolic names and are either initialized to concrete values by the programmer or by the compiler. In our case we don’t really care what values correspond to the various statuses, so we leave it to the compiler. Using an enum rather than an int for the return type of Eval has the advantage of stricter type checking. It also prevents us from returning anything other than one of the three values defined by the enum.

enum Status
{
    stOk,
    stQuit,
    stError
};

class Parser
{
public:
    Parser (Scanner & scanner, SymbolTable & symTab);
    ~Parser ();
    Status Eval ();
private:
    Scanner &        _scanner;
    SymbolTable &    _symTab;
};

Parser::Parser (Scanner & scanner, SymbolTable & symTab)
    : _scanner (scanner), _symTab (symTab)
{
    cout << "Parser created\n";
}

Parser::~Parser ()
{
    cout << "Destroying parser\n";
}

Status Parser::Eval ()
{
    cout << "Parser eval\n";
    return stQuit;
}

                      

Finally, the main procedure. Here you can see the top level design of the program in action. The lifetime of the symbol table has to be equal to that of the whole program, since it has to remember the names of all the variables introduced by the user during one session. The scanner and the parser, though, can be created every time a line of text is entered. The parser doesn’t have any state that has to be preserved from one line of text to another. If it encounters a new variable name, it will store it in the symbol table that has a longer life span.

In the main loop of our program a line of text is retrieved from the standard input using the getline method of cin, a scanner is constructed from this line, and a parser is created from this scanner. The Eval method of the parser is then called to parse and evaluate the expression. As long as the status returned by Eval is different from stQuit, the whole process is repeated.

const int maxBuf = 100;

int main ()
{
    char buf [maxBuf];
    Status status;
    SymbolTable symTab;
    do
    {
        cout << "> ";  // prompt
        cin.getline (buf, maxBuf);
        Scanner scanner (buf);
        Parser  parser (scanner, symTab);
        status = parser.Eval ();
    } while (status != stQuit);
}

                      

This program compiles and runs thus proving the validity of the concept.

Expanding Stubs

The first stub to be expanded into a full implementation will be that of the Scanner. The scanner converts the input string into a series of tokens. It works like an iterator. Whenever the Parser needs a token it asks the Scanner for the current token. When this token is parsed, the Parser accepts it by calling Scanner::Accept(). The Accept method scans the string further trying to recognize the next token.

There is a finite, well-defined set of tokens. It is convenient to put them into the enumeration EToken.

enum EToken
{
    tEnd,
    tError,
    tNumber,
    tPlus,
    tMult
};

                      

We would also like the Scanner to be able to convert the part of the input string recognized as a number to a floating-point value. This is done by the Number() method that may be called only when the current token is tNumber (see the assertion there). Notice the use of the type char const * const—a const pointer to a const string. The pointer is initialized in the constructor and never changes again. Its contents is read-only, too.

class Scanner
{
public:
    Scanner (char const * buf);
    EToken  Token () const { return _token; }
    EToken  Accept ();
    double Number ()
    {
        assert (_token == tNumber);
        return _number;
    }
private:
    void EatWhite ();

    char const * const   _buf;
    int                  _iLook;
    EToken               _token;
    double               _number;
};

                      

The constructor of the Scanner, besides initializing all member variables, calls the method Accept. Accept recognizes the first available token and positions the index _iLook past the recognized part of the buffer.

Scanner::Scanner (char const * buf)
    : _buf (buf), _iLook(0)
{
    cout << "Scanner with \"" << buf << "\"" << endl;
    Accept ();
}

EatWhite is the helper function that skips whitespace characters in the input.

void Scanner::EatWhite ()
{
    while (isspace (_buf [_iLook]))
        ++_iLook;
}

Accept is just one big switch statement. Depending on the value of the current character in the buffer, _buf[_iLook], the control passes to the appropriate case label within the switch. For now I have implemented the addition and the multiplication operators. When they are recognized, the _token variable is initialized to the appropriate enumerated value and the variable _iLook incremented by one.

The recognition of digits is done using the fall-through property of the case statements. In a switch statement, unless there is an explicit break statement, the control passes to the next case. All digit cases fall through, to reach the code after case '9'. The string starting with the digit is converted to an integer (later we will convert this part of the program to recognize floating-point numbers) and _iLook is positioned after the last digit.

The default case is executed when no other case matches the character in the switch.

EToken Scanner::Accept ()
{
    EatWhite ();
    switch (_buf[_iLook])
    {
    case '+':
        _token = tPlus;
        ++_iLook;
        break;
    case '*':
        _token = tMult;
        ++_iLook;
        break;
    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
        _token = tNumber;
        _number = atoi (&_buf [_iLook]);
        while (isdigit (_buf [_iLook]))
            ++_iLook;
        break;
    case '\0': // end of input
        _token = tEnd;
        break;
    default:
        _token = tError;
        break;
    }
    return Token ();
}

                      

You might ask: Can this switch statement be replaced by some clever use of polymorphism? I really don’t know how one could do it. The trouble is that the incoming data—characters from the buffer—is amorphic. At the stage where we are trying to determine the meaning of the input, to give it form, the best we can do is to go through a multi-way conditional, in this case a switch statement. Dealing with amorphic input is virtually the only case when the use of a switch statement is fully legitimate in C++. Parsing user input, data stored in a file or external events, all require such treatment.

The dummy parser is implemented as a for loop retrieving one token after another and printing the appropriate message.

Status Parser::Parse ()
{
    for (EToken token = _scanner.Token ();
        token != tEnd;
        token = _scanner.Accept ())
    {
        switch (token)
        {
        case tMult:
            cout << "Times\n";
            break;
        case tPlus:
            cout << "Plus\n";
            break;
        case tNumber:
            cout << "Number: " << _scanner.Number () << "\n";
            break;
        case tError:
            cout << "Error\n";
            return stQuit;
        default:
            cout << "Error: bad token\n";
            return stQuit;
        }
    }
    return stOk;
}

                      

Once more, this partially implemented program is compiled and tested. We can see the flow of control and the correct recognition of a few tokens.

NextNext: Final Implementation