Reliable Software Logo

C++ In Action: Language

C Digression

How would the same problem be solved in C, where there are no virtual functions? A good way would actually be to try to implement a virtual table by hand, and make calls through function pointers. You'd be surprised how often it is done. It is called object oriented programming in C. The more traditional approach would be to define struct Node that has enough fields to accommodate the needs of binary operators as well as numeric nodes. The identity of the node would be stored in a separate field.

#define NUM_NODE 1
#define ADD_NODE 2
#define MULT_NODE 3

struct Node
{
    /* Node type */
    int type;
    /* Used only in numeric nodes */
    double val;
    /* Used only in binary nodes */
    Node * pLeft;
    Node * pRight;
};

Virtual functions in C++ would be turned into multi-way conditionals, like this switch statement below.

double Calc (Node * pNode)
{
    switch (pNode->type)
    {
    case NUM_NODE:
        return pNode->val;
    case ADD_NODE:
        return Calc (pNode->pLeft) + Calc (pNode->pRight);
    case MULT_NODE:
        return Calc (pNode->pLeft) * Calc (pNode->pRight);
    }
    return 0;
}

Obviously, this solution would be too expensive when applied to trees with hundreds of nodes. This is when a C programmer would fetch his or her powerful secret weapon. When the going gets tough, type safety is the first to go—use casts (I am being sarcastic here). Instead of packing everything into a single struct, the programmer creates a variety of structs:

struct Node
{
    int type;
};

struct NumNode
{
    int type;
    double val;
};

struct BinNode
{
    int type;
    Node * pLeft;
    Node * pRight;
};

Some would even use a byte to store type—trading speed for size (on many processors fetching a word aligned value is faster than fetching a byte aligned value). The function Calc takes a pointer to Node and, based on its type field, casts it to the appropriate pointer. (In fact, one could even get rid of the last pretense of type safety and pass Calc a void pointer.)

double Calc (Node * pNode)
{
    switch (pNode->type)
    {
        case NUM_NODE:
            return ((NumNode *)pNode)->val;
        case ADD_NODE:
        {
            BinNode * pN = (BinNode *) pNode;
            return Calc (pN->pLeft) + Calc (pN->pRight);
        }
        case MULT_NODE:
        {
            BinNode * pN = (BinNode *) pNode;
            return Calc (pN->pLeft) * Calc (pN->pRight);
        }
        default:
            printf ("Bad node type\n");
    }
    return 0;
}

Notice that even the "constructor"—CreateNumNode—has to use casts. The destructor would use both casts and conditionals.

Node * CreateNumNode (double value)
{
    NumNode * pNode = (NumNode *) malloc (sizeof (NumNode));
    pNode->type = NUM_NODE;
    pNode->val = value;
    return (Node *) pNode;
}

How do these C solutions compare with the C++ polymorphic approach? As far as the size of the data structures is concerned, C doesn't offer much in the way of savings. In fact the first C solution is substantially more memory consuming.

As far as the speed of execution is concerned, we have to compare a multi-way conditional plus a direct call against a doubly indirect call through vtable. The result of the comparison depends on how well the compiler optimizes conditionals. In our case, since the case labels are consecutive, the compiler will probably create a jump table indexed by the value of type. Before doing an indirect jump, it will have to check whether the index is within bounds (if it's not, it will jump to the default label). So in fact we have a conditional, an indirect jump, and a function call. In C++ we have a doubly indirect function call. The C++ compiler usually optimizes the passing of the this pointer by putting it in a register, so the calling sequence may be simpler in C++. The bottom line is that, without actually timing both versions, there is no way of telling which one will be faster.

As far as correctness and ease of maintenance is concerned, just consider what it would be like to add a new type of node to each of the implementations. Notice also that, even though we've been writing progressively more complex programs in C++, we haven't felt the need for either casts or switch statements. Even C programmers will admit that using casts should be a last resort. C++ programmers feel the same about switch statements.