Inapoi
Inainte
Cuprins
virtual functions
To cause late binding to occur for a
particular function, C++ requires that you use the virtual
keyword when declaring the
function in the base class. Late binding occurs only with virtual
functions, and only when you’re using an address of the base class where
those virtual functions exist, although they may also be defined in an
earlier base class.
To create a member function as
virtual, you simply precede the declaration of
the function with the keyword virtual. Only the declaration needs the
virtual keyword, not the definition. If a function is declared as
virtual in the base class, it is virtual in all the derived
classes. The redefinition of a virtual function in a derived class is
usually called
overriding.
Notice
that you are only required to declare a function virtual in the base
class. All derived-class functions that match the signature of the base-class
declaration will be called using the virtual mechanism. You can use the
virtual keyword in the derived-class declarations
(it does
no harm to do so), but it is redundant and can be confusing.
To get the desired behavior from
Instrument2.cpp, simply add the virtual keyword in the base class
before play( ):
//: C15:Instrument3.cpp
// Late binding with the virtual keyword
#include <iostream>
using namespace std;
enum note { middleC, Csharp, Cflat }; // Etc.
class Instrument {
public:
virtual void play(note) const {
cout << "Instrument::play" << endl;
}
};
// Wind objects are Instruments
// because they have the same interface:
class Wind : public Instrument {
public:
// Override interface function:
void play(note) const {
cout << "Wind::play" << endl;
}
};
void tune(Instrument& i) {
// ...
i.play(middleC);
}
int main() {
Wind flute;
tune(flute); // Upcasting
} ///:~
This file is identical to
Instrument2.cpp except for the addition of the virtual keyword,
and yet the behavior is significantly different: Now the output is
Wind::play.
Extensibility
With play( ) defined as
virtual in the base class, you can add as many new types as you want
without changing the tune( ) function. In a well-designed OOP
program, most or all of your functions will follow the model of
tune( ) and communicate only with the base-class
interface. Such a program is
extensible because you can add new functionality
by inheriting new data types from the common base class. The functions that
manipulate the base-class interface will not need to be changed at all to
accommodate the new classes.
Here’s the instrument example with
more virtual functions and a number of new classes, all of which work correctly
with the old, unchanged tune( ) function:
//: C15:Instrument4.cpp
// Extensibility in OOP
#include <iostream>
using namespace std;
enum note { middleC, Csharp, Cflat }; // Etc.
class Instrument {
public:
virtual void play(note) const {
cout << "Instrument::play" << endl;
}
virtual char* what() const {
return "Instrument";
}
// Assume this will modify the object:
virtual void adjust(int) {}
};
class Wind : public Instrument {
public:
void play(note) const {
cout << "Wind::play" << endl;
}
char* what() const { return "Wind"; }
void adjust(int) {}
};
class Percussion : public Instrument {
public:
void play(note) const {
cout << "Percussion::play" << endl;
}
char* what() const { return "Percussion"; }
void adjust(int) {}
};
class Stringed : public Instrument {
public:
void play(note) const {
cout << "Stringed::play" << endl;
}
char* what() const { return "Stringed"; }
void adjust(int) {}
};
class Brass : public Wind {
public:
void play(note) const {
cout << "Brass::play" << endl;
}
char* what() const { return "Brass"; }
};
class Woodwind : public Wind {
public:
void play(note) const {
cout << "Woodwind::play" << endl;
}
char* what() const { return "Woodwind"; }
};
// Identical function from before:
void tune(Instrument& i) {
// ...
i.play(middleC);
}
// New function:
void f(Instrument& i) { i.adjust(1); }
// Upcasting during array initialization:
Instrument* A[] = {
new Wind,
new Percussion,
new Stringed,
new Brass,
};
int main() {
Wind flute;
Percussion drum;
Stringed violin;
Brass flugelhorn;
Woodwind recorder;
tune(flute);
tune(drum);
tune(violin);
tune(flugelhorn);
tune(recorder);
f(flugelhorn);
} ///:~
You can see that another inheritance
level has been added beneath Wind, but the virtual mechanism works
correctly no matter how many levels there are. The adjust( )
function is not overridden for Brass and Woodwind. When
this happens, the “closest” definition in the inheritance hierarchy
is automatically used – the compiler guarantees there’s always
some definition for a virtual function, so you’ll never end up with
a call that doesn’t bind to a function body. (That would be
disastrous.)
The array A[ ] contains pointers
to the base class Instrument, so upcasting occurs during the process of
array initialization. This array and the function f( ) will be used
in later discussions.
In the call to tune( ),
upcasting is performed on each different type of object,
yet the desired behavior always takes place. This can be described as
“sending a message to an
object and letting the object worry about what to do with it.” The
virtual function is the lens to use when you’re trying to analyze a
project: Where should the base classes occur, and how might you want to extend
the program? However, even if you don’t discover the proper base class
interfaces and virtual functions at the initial creation of the program,
you’ll often discover them later, even much later, when you set out to
extend or otherwise maintain the program. This is not an analysis or design
error; it simply means you didn’t or couldn’t know all the
information the first time. Because of the tight class modularization in C++, it
isn’t a large problem when this occurs because changes you make in one
part of a system tend not to propagate to other parts of the system as they do
in
C.
How C++ implements late binding
How can late
binding happen? All the work goes on behind the scenes
by the compiler, which installs the necessary late-binding mechanism when you
ask it to (you ask by creating virtual functions). Because programmers often
benefit from understanding the mechanism of virtual functions in C++, this
section will elaborate on the way the compiler implements this
mechanism.
The keyword
virtual tells the
compiler it should not perform early binding. Instead, it should automatically
install all the mechanisms necessary to perform late binding. This means that if
you call play( ) for a Brass object through an address for
the base-class Instrument, you’ll get the proper
function.
To accomplish this, the typical
compiler[54]
creates a single table (called the VTABLE) for each
class that contains virtual functions. The compiler places the addresses
of the virtual functions for that particular class in the VTABLE. In each class
with virtual functions, it secretly places a pointer, called the vpointer
(abbreviated as VPTR), which
points to the VTABLE for that object. When you make a virtual function call
through a base-class pointer (that is, when you make a polymorphic
call), the compiler quietly
inserts code to fetch the VPTR and look up the function address in the VTABLE,
thus calling the correct function and causing late binding to take
place.
All of this – setting up the VTABLE
for each class, initializing the VPTR, inserting the code for the virtual
function call – happens automatically, so you don’t have to worry
about it. With virtual functions, the proper function gets called for an object,
even if the compiler cannot know the specific type of the
object.
The following sections go into this
process in more
detail.
Storing type
information
You can see that there is no explicit
type information stored in any of the classes. But the previous examples, and
simple logic, tell you that there must be some sort of type information stored
in the objects; otherwise the type could not be established at runtime. This is
true, but the type information is hidden. To see it, here’s an example to
examine the sizes of classes that use virtual functions compared with those that
don’t:
//: C15:Sizes.cpp
// Object sizes with/without virtual functions
#include <iostream>
using namespace std;
class NoVirtual {
int a;
public:
void x() const {}
int i() const { return 1; }
};
class OneVirtual {
int a;
public:
virtual void x() const {}
int i() const { return 1; }
};
class TwoVirtuals {
int a;
public:
virtual void x() const {}
virtual int i() const { return 1; }
};
int main() {
cout << "int: " << sizeof(int) << endl;
cout << "NoVirtual: "
<< sizeof(NoVirtual) << endl;
cout << "void* : " << sizeof(void*) << endl;
cout << "OneVirtual: "
<< sizeof(OneVirtual) << endl;
cout << "TwoVirtuals: "
<< sizeof(TwoVirtuals) << endl;
} ///:~
With no virtual functions, the size of
the object is exactly what you’d expect: the size of a
single[55]
int. With a single virtual function in OneVirtual, the size of the
object is the size of NoVirtual plus the size of a void pointer.
It turns out that the compiler inserts a single pointer (the VPTR) into the
structure if you have one or more virtual functions. There is no size
difference between OneVirtual and TwoVirtuals. That’s
because the VPTR points to a table of function addresses. You need only one
table because all the virtual function addresses are contained in that single
table.
This example required at least one data
member. If there had been no data members, the C++ compiler would have forced
the objects to be a nonzero size
because each object must have a
distinct address. If you imagine indexing into an array of zero-sized objects,
you’ll understand. A “dummy” member is inserted into objects
that would otherwise be zero-sized. When the type information is inserted
because of the virtual keyword, this takes the place of the
“dummy” member. Try commenting out the int a in all the
classes in the example above to see
this.
Picturing virtual
functions
To understand exactly what’s going
on when you use a virtual function, it’s helpful to visualize the
activities going on behind the curtain. Here’s a drawing of the array of
pointers A[ ] in Instrument4.cpp:
The array of Instrument pointers
has no specific type information; they each point to an object of type
Instrument. Wind, Percussion, Stringed, and
Brass all fit into this category because they are derived from
Instrument (and thus have the same interface as Instrument, and
can respond to the same messages), so their addresses can also be placed into
the array. However, the compiler doesn’t know that they are anything more
than Instrument objects, so left to its own devices it would normally
call the base-class versions of all the functions. But in this case, all those
functions have been declared with the virtual keyword, so something
different happens.
Each time you create a class that
contains virtual functions, or you derive from a class that contains virtual
functions, the compiler creates a unique VTABLE for that
class, seen on the right of the diagram. In that table it places the addresses
of all the functions that are declared virtual in this class or in the base
class. If you don’t override a function that was declared virtual in the
base class, the compiler uses the address of the base-class version in the
derived class. (You can see this in the adjust entry in the Brass
VTABLE.) Then it places the VPTR (discovered in
Sizes.cpp) into the class. There is only one VPTR for each object when
using simple inheritance like this. The VPTR must be initialized to point to the
starting address of the appropriate VTABLE. (This happens in the constructor,
which you’ll see later in more detail.)
Once the VPTR is initialized to the
proper VTABLE, the object in effect “knows” what type it is. But
this self-knowledge is worthless unless it is used at the point a virtual
function is called.
When you call a virtual function through
a base class address (the situation when the compiler doesn’t have all the
information necessary to perform early binding), something special happens.
Instead of performing a typical function call, which is simply an
assembly-language CALL to a particular address, the compiler generates
different code to perform the function call. Here’s what a call to
adjust( ) for a Brass object looks like, if made through an
Instrument pointer (An Instrument reference produces the same
result):

The compiler begins with the
Instrument pointer, which points to the starting address of the object.
All Instrument objects or objects derived from Instrument have
their VPTR in the same place (often at the beginning of the object), so the
compiler can pick the VPTR out of the object. The VPTR points to the starting
address of the VTABLE. All the VTABLE function addresses are laid out in the
same order, regardless of the specific type of the object. play( )
is first, what( ) is second, and adjust( ) is third. The
compiler knows that regardless of the specific object type, the
adjust( ) function is at the location VPTR+2. Thus, instead of
saying, “Call the function at the absolute location
Instrument::adjust” (early
binding;
the wrong action), it generates code that says, in effect, “Call the
function at VPTR+2.” Because the fetching of the VPTR and the
determination of the actual function address occur at runtime, you get the
desired late binding. You send a message to the object, and the object figures
out what to do with
it.
Under the hood
It can be helpful to see the
assembly-language code generated by a virtual function
call,
so you can see that late-binding is indeed taking place. Here’s the output
from one compiler for the call
i.adjust(1);
inside the function f(Instrument&
i):
push 1
push si
mov bx, word ptr [si]
call word ptr [bx+4]
add sp, 4
The arguments of a C++ function call,
like a C function call, are pushed on the stack from right to left (this order
is required to support C’s variable argument lists), so the argument
1 is pushed on the stack first. At this point in the function, the
register si (part of the Intel X86 processor architecture) contains the
address of i. This is also pushed on the stack because it is the starting
address of the object of interest. Remember that the starting address
corresponds to the value of this, and this
is quietly pushed on the stack as an argument before every member function call,
so the member function knows which particular object it is working on. So
you’ll always see one more than the number of arguments pushed on the
stack before a member function call (except for static member functions,
which have no this).
Now the actual virtual function call must
be performed. First, the VPTR must be produced, so the
VTABLE can be found. For this compiler the VPTR is
inserted at the beginning of the object, so the contents of this
correspond to the VPTR. The line
mov bx, word ptr [si]
fetches the word that si (that is,
this) points to, which is the VPTR. It places the VPTR into the
register bx.
The VPTR contained in bx points to
the starting address of the VTABLE, but the function pointer to call isn’t
at location zero of the VTABLE, but instead at location two (because it’s
the third function in the list). For this memory model each function pointer is
two bytes long, so the compiler adds four to the VPTR to calculate where the
address of the proper function is. Note that this is a constant value,
established at compile time, so the only thing that matters is that the function
pointer at location number two is the one for adjust( ).
Fortunately, the compiler takes care of all the bookkeeping for you and ensures
that all the function pointers in all the VTABLEs of a particular class
hierarchy occur in the same order, regardless of the order that you may override
them in derived classes.
Once the address of the proper function
pointer in the VTABLE is calculated, that function is called. So the address is
fetched and called all at once in the statement
call word ptr [bx+4]
Finally, the stack pointer is moved back
up to clean off the arguments that were pushed before the call. In C and C++
assembly code you’ll often see the caller clean off the arguments but this
may vary depending on processors and compiler
implementations.
Installing the vpointer
Because the VPTR determines the virtual
function behavior of the object, you can see how it’s critical that the
VPTR always be pointing to the proper VTABLE. You don’t ever want to be
able to make a call to a virtual function before the VPTR is properly
initialized. Of course, the place where initialization can be guaranteed is in
the constructor, but none of the Instrument examples has a
constructor.
This is where creation of the default
constructor is essential. In the Instrument examples, the compiler
creates a default constructor that does nothing except initialize the VPTR. This
constructor, of course, is automatically called for all Instrument
objects before you can do anything with them, so you know that it’s always
safe to call virtual functions.
The implications of the automatic
initialization of the VPTR inside the constructor are discussed in a later
section.
Objects are different
It’s important to realize that
upcasting deals only with addresses. If the compiler has
an object, it knows the exact type and therefore (in C++) will not use late
binding for any function calls – or at least, the compiler doesn’t
need to use late binding. For efficiency’s sake, most compilers
will perform early binding when
they are making a call to a virtual function for an object because they know the
exact type. Here’s an example:
//: C15:Early.cpp
// Early binding & virtual functions
#include <iostream>
#include <string>
using namespace std;
class Pet {
public:
virtual string speak() const { return ""; }
};
class Dog : public Pet {
public:
string speak() const { return "Bark!"; }
};
int main() {
Dog ralph;
Pet* p1 = &ralph;
Pet& p2 = ralph;
Pet p3;
// Late binding for both:
cout << "p1->speak() = " << p1->speak() <<endl;
cout << "p2.speak() = " << p2.speak() << endl;
// Early binding (probably):
cout << "p3.speak() = " << p3.speak() << endl;
} ///:~
In p1–>speak( ) and
p2.speak( ), addresses are used, which means the information is
incomplete: p1 and p2 can represent the address of a Pet
or something derived from Pet, so the virtual mechanism must be
used. When calling p3.speak( ) there’s no ambiguity. The
compiler knows the exact type and that it’s an object, so it can’t
possibly be an object derived from Pet – it’s exactly
a Pet. Thus, early binding is probably used. However, if the compiler
doesn’t want to work so hard, it can still use late binding and the same
behavior will
occur.
 |
|