Reliable Software Logo

C++ In Action: Language

Symbol Table

Explicit, function overloading.

With a few minor modifications we are going to reuse the hash table and the string buffer code to implement the symbol table. But first we'll make the size of the hash table settable during construction (not that it matters much).

explicit HTable (int size): _size (size)
    _aList = new List [size];

~HTable ()
    delete []_aList;

Notice the keyword explicit in front of the constructor. I put it there to tell the compiler not to use this constructor to do implicit conversions. Normally, whenever you define a constructor that takes a single argument of type T, the compiler can use it to quietly convert an object of type T to an object of the class whose constructor it is. For instance, since the constructor of the hash table takes a single argument of type int, the compiler would accept a statement like this:

HTable table (13);
table = 5;
It would just "convert" an integer, 5, to a hash table as easily as it converts an integer to a double. Such implicit conversions are sometimes extremely useful--at other times they are a source of unexpected problems. The code above makes little sense and the error would be easy to spot. Imagine however that somebody, by mistake, passed an int to a function expecting a const reference to a hash table. The compiler would actually create a temporary hash table of the size equal to the value of the integer and pass it to the function. To protect yourself from this kind of errors you should make it a habit to declare single-argument constructors explicit--that is, unless you really want to define an implicit conversion.

Another modification we'd like to make to our hash table is to allow the method Find (and also ForceAdd ) to take the length of the string as a parameter. It so happens that when the parser calls these methods, it has the length handy, so it's a waste of time to calculate it again inside these methods.

This is easily done, but what if we have to find a string in a hash table and we don't have its length handy? Of course, we could always call strlen and pass the result to the appropriate method. There's a simpler solution, though. We can define an alternative version of Find which does not take the length of the string, but calculates it internally and calls the other version. Like this:

List const & Find (char const * str) const
    return Find (str, strlen (str));

Notice that we didn't have to invent different names for these two version of Find. It's enough if they differ by the number or type of arguments--the compiler will figure out which one to call. This mechanism of defining more than one method with the same name is called overloading.

In C++ you can overload free functions, methods, as well as constructors. For instance, for clients who don't have a preference for the size of the hash table, or who want to create arrays of hash tables, we could add a second constructor HTable () with no arguments. It would be implemented exactly like the first constructor, only it would arbitrarily set the size to, say, 127 (remember our first implementation of the hash table?). There is, however, a simpler way to accomplish the same thing. Instead of defining two constructors, you could specify the default value for its argument. Such declaration would look like this:

HTable (int size = 127);
You can specify default values for arguments not only in constructors, but also in free functions and methods. The rule is that, in the declaration, the arguments with default values have to follow the ones without defaults. When you call such a function with fewer than the full set of arguments, the missing trailing arguments will be set to their defaults. Here's a simple example.
void fun (int x, double y = 0.0, char * str = "");
// different ways of calling fun
fun (10); // y = 0.0, str = ""
fun (5, 1.2) // str = ""
fun (1, 0.2, "hello!");

The symbol table uses a hash table to map strings into short lists of string id’s. New strings are stored in the string buffer at the first available offset, _curStrOff, and given the first available id, _curId. This offset is then stored in the array _offStr under the index equal to the string’s id.

const int idNotFound = -1;

class SymbolTable
    explicit SymbolTable (int size);
    ~SymbolTable ();
    int ForceAdd (char const * str, unsigned int len);
    int Find (char const * str, unsigned int len) const;
    char const * GetString (int id) const;
    HTable    _htab;
    int     * _offStr; // offsets of strings in buffer
    int       _size;
    int       _curId;
    char    * _strBuf;
    unsigned int _bufSize;
    unsigned int _curStrOff;

(We use unsigned int instead of int to avoid compiler warnings, which would be generated when storing the result of strlen, which is unsigned, in a variable that is signed--int is signed.)

Here are the constructor and the destructor of the symbol table. Other methods are just copied from our earlier implementation of the string buffer.

SymbolTable::SymbolTable (int size)
    : _size (size), _curId (0), _curStrOff (0), _htab (size + 1)
    _offStr = new int [size];
    _bufSize = size * 10;
    _strBuf = new char [_bufSize];

SymbolTable::~SymbolTable ()
    delete []_offStr;
    delete []_strBuf;

NextNext: Store