Reliable Software Logo

C++ In Action: Language

Function Table

Meta-programs. Pointers to functions. Explicit array initialization. Explicit class initialization.


For a good scientific calculator, built-in functions are a must. We have to be able to calculate square roots, logarithms, trigonometric functions, etc. We are quite lucky because the standard C library implements most of the basic math functions. All we need is for the parser to recognize the function call and then call the appropriate library function. The only tricky part is to make the connection between the function name—the string recognized by the parser—and the call to the appropriate function.

One way would be to create a multi-way conditional that compares the string to a list of predefined function names, and, when successful, calls the corresponding function.

if (strcmp (string, "sin") == 0)
{
    result = sin (arg);
}
else if ...
...
                  

As long as the number of built-in functions is reasonably small, this solution is good enough. Let’s pretend though that, instead of a toy calculator, we are writing an industrial strength program that will have to handle hundreds, if not thousands, of built in functions. The problem than becomes: Given a string, match it against hundreds of pre-defined strings. Clearly, doing hundreds of string comparisons every time is not acceptable.

But wait! We already have a string-matching object--the symbol table. Since it’s implemented as a hash table, it can perform a match in constant time, independent of the size of the table. The symbol table converts a string into an integer. We can pre-fill the table with built in function names (just like we did with built-in constants) and dispatch the function calls based on integers rather than strings. We could, for instance, insert the string "sin" in the zeroth slot of the symbol table, "cos" in the first slot, etc., and then dispatch calls using a switch statement:

case 0:
    result = sin (arg);
    break;
case 1:
    result = cos (arg);
    break;
...

                  

A switch statement that uses a set of consecutive labels, 0, 1, 2, etc., is implemented by the compiler as a jump table with a constant switching time independent of the number of labels. Seems like a perfect, constant-time solution.

But how do we make sure that sin always corresponds to 0, cos to 1, etc.? Well, we can always initialize the symbol table with these strings in this particular order. After all, we know that the symbol table assigns consecutive indexes starting with zero. Is it okay though? These are implementation secrets of the symbol table. What if the next guy to maintain our program rewrites the symbol table to use an even better algorithm? One that does not assign consecutive indexes starting with zero?

And how about expandability of such code? Suppose that at some point in the future we will want to add one more built-in function. What changes will we have to make to this implementation? We’ll have to:

  • Add one more case to the switch statement,
  • Add one more string to the symbol table, and
  • Make sure that the string is inserted at the offset that is the case label of the corresponding function.

Notice what we have just done. We have written a meta-program: a set of instructions to be followed in order to add a new built-in function to our calculator. These are not machine instructions, these are instructions for the programmer. In fact, they don’t appear anywhere in the program, even as comments—they are implicit. This kind of invisible meta-code adds to the hidden complexity of a program. What does meta-code describe? It describes the steps to be taken to implement the most likely extension to the program, as well as the assertions that have to be preserved when making such extensions.

When comparing various implementations, take into account not only the complexity of code but also the complexity of meta-code.

Let’s see if we can find a better implementation for built-in functions. Optimally, we would like to have some kind of a table listing all the functions. (It is almost always a good idea to shift the complexity from code to data structures.) Adding a new function would be equivalent to adding a new entry to this table. The meta-program for such an implementation would consist of a single statement:

  • Add a new entry to the function array

In order to make connection between data and executable code we need a new device: pointers to functions. A pointer to function is just like a regular pointer, only that instead of pointing to data it points to executable code. By using a pointer to data, you can read or modify data; by using a pointer to function you can call functions. Just like with any other type of pointer, we’ll have to be able to

  • declare a pointer to function,
  • initialize a pointer to point to a particular function,
  • make a function call using the pointer.

For instance, we may declare a pointer to function taking a double (as an argument) and returning a double. There are many such functions and we can initialize the pointer to point to any one of these. In particular we may initialize the pointer to point to the function double sin (double x). After that, if we make a function call through our pointer, we will be calling sin. Had we initialized the pointer to point to double cos (double x), we would have called cos instead. The beauty of a pointer to function is that the same pointer may point to different functions during the execution of the program. The part of the program that makes the function call doesn’t have to know what function it is calling.

Let’s see some actual code to see what the syntax looks like. First let’s declare a pointer pFun to a function taking a double and returning a double:

double (* pFun) (double x);

Compare it with the declaration of a function sin taking a double and returning a double:

double sin (double x);
                  

Why did we have to enclose the asterisk and the pointer name within parentheses? We had to do it in order to distinguish the declaration of a pointer to function from a declaration of a function returning a pointer. Without the parentheses

double * pFun (double x);
declares a function taking a double and returning a pointer to double. Quite a difference!

You can declare a pointer to any type of a function by taking the declaration of any function of this type and replacing the function name with (* pFun). Of course, you can use an arbitrary name instead of pFun.

Remember the function strtod? Its declaration was:

double strtod (char const * str, char ** ppEnd);

A pointer to a function of this type would be declared as:

double (* pStrtod) (char const * str, char ** ppEnd);

It's that simple!

Now for the second step: the initialization. In order to make the pointer point to a function, we have to take the address of the function and assign it to the pointer. It so happens that the name of the function is the address of the function. We can therefore initialize our pointer like this:

double (* pFun) (double x) = sin;
or do the assignment at a later point:
pFun = sin;

Finally, in order to invoke a function through a pointer, we de-reference the pointer and pass the appropriate argument(s):

double x = 3.1415;
double y = (* pFun) (x);

Pointers to functions can be used as building blocks in more complex data structures such as: arrays, classes, pointers to pointers, etc. To simplify the declarations of such data structures, type definitions are often used.

We start building our implementation of the built-in function table with a type definition:

typedef double (*PtrFun) (double);

All our built-in functions take one argument of the type double and return a result which is also double. The connection between a function and its string name is established in a class FunctionEntry.

class FunctionEntry
{
public:
    PtrFun pFun;
    char* strFun;
};

You might be wondering why we have made all the data members of this class public and provided no constructor to initialize them. That’s because we want to be able to initialize them explicitly, like this:

FunctionEntry funEntry = { sin, "sin" };

This kind of initialization, where you assign values directly to the object's data members, is possible only if the object has no private data and no constructor. Still, I haven't explained why anyone would want to use this kind of direct initialization instead of simply providing an appropriate constructor? Here's why:

const int maxIdFun = 16;

FunctionEntry funArr[maxIdFun] =
{ 
    log,    "log",
    log10,  "log10",
    exp,    "exp",
    sqrt,   "sqrt",
    sin,    "sin",
    cos,    "cos",
    tan,    "tan",
    CoTan,  "cotan",
    sinh,   "sinh",
    cosh,   "cosh",
    tanh,   "tanh",
    asin,   "asin",
    acos,   "acos",
    atan,   "atan",
    0,      ""
};

In one big swoop we've been able to initialize the whole array of FunctionEntries. That's exactly what we wanted. Notice how easy it will be now to add a new built-in function: just add one more line with function pointer and string name between any two lines in this list and it'll just work (you'll need to increment maxIdFun as well, but that's something the compiler will remind you of, if you forget). If you want to know more about explicit initialization of aggregates (such as class objects and arrays), I will talk more about it at the end of this chapter.

Now that we are through with the preliminaries, let's get back to our original problem: the initialization of the symbol table with built-in function names. This should be done in the construction stages of the program—the calculator is not ready to be used unless the symbol table is pre-filled. We will therefore define an object called the FunctionTable with the dual purpose of initializing the symbol table with the names and translating symbol id's to function pointers for all the built-in functions.

class SymbolTable;

class FunctionTable
{
public:
    FunctionTable (SymbolTable & symTab, FunctionEntry funArr []);
    int Size () const { return _size; }
    PtrFun GetFun (int id) { return _pFun [id]; }
private:
    PtrFun  _pFun [maxIdFun];
    int   _size;
};

The constructor of the FunctionTable takes a reference to the symbol table and fills it with strings from our array of FunctionEntries. At the same time it copies corresponding function pointers into its own array (strictly speaking the copying could be avoided if we let the function table use the function array directly—that's an optimization that is best left as an exercise to the reader).

FunctionTable::FunctionTable(SymbolTable &symTab,FunctionEntry funArr [])
    : _size (0)
{
    for (int i = 0; i < maxIdFun; ++i)
    {
        int len =  strlen (funArr [i].strFun);
        if (len == 0)
            break;
        _pFun [i] = funArr [i].pFun;
        cout << funArr[i].strFun << endl;
        int j = symTab.ForceAdd (funArr [i].strFun, len);
        assert (i == j);
        ++_size;
    }
}

Notice the very important assertion in the loop. We are making sure that the assumption that the symbol table assigns consecutive indexes to our functions is indeed true. We want the string "log" to correspond to id 0, because we store the pointer to the function log at offset 0, etc.

All of the built-in functions so far have been library functions (defined in the standard C library) except for one, the cotangent. The cotangent is just the inverse of the tangent, so it is easy to write our own implementation of it. The only tricky part is dealing with the inverse of zero—division by zero causes an exception, which would terminate the program. That's why we test for zero and return HUGE_VAL as its inverse. That's not entirely correct (the result of division by zero is undefined), but it'll do for now.

double CoTan (double x) 
{
    double y = tan (x);
    if (y == 0)
    {
        cout << "cotan of " << x << " undefined\n";
        return HUGE_VAL;
    }
    return 1.0 / y;
}

Note that the appropriate line in the initialization of funArr

CoTan, "cotan",

contains the pointer to our own implementation of cotangent, namely CoTan.


To summarize: We went through quite a bit of trouble in order to produce an implementation that avoids some of the hidden complexities of something we called meta-code. In this particular case, it was well worth the effort because we knew that the table of built-in functions would be the first thing to be modified in the future. We made sure such modifications would be easy and virtually fool-proof. We have localized them in one place (the initialization of funArr, made trivial mistakes compiler detectable (forgetting to update maxIdFun ) and positioned an assertion in a strategic place (inside the FunctionTableconstructor).


NextNext: Nodes