.:: Welcome to software development web corner ::.

Am avut 372419 vizite de la lansarea siteului.

Inapoi Inainte Cuprins

virtual functions

To cause late binding to occur for a particular function, C++ requires that you use the virtual keyword when declaring the function in the base class. Late binding occurs only with virtual functions, and only when you’re using an address of the base class where those virtual functions exist, although they may also be defined in an earlier base class.

To create a member function as virtual, you simply precede the declaration of the function with the keyword virtual. Only the declaration needs the virtual keyword, not the definition. If a function is declared as virtual in the base class, it is virtual in all the derived classes. The redefinition of a virtual function in a derived class is usually called overriding .

Notice that you are only required to declare a function virtual in the base class. All derived-class functions that match the signature of the base-class declaration will be called using the virtual mechanism. You can use the virtual keyword in the derived-class declarations (it does no harm to do so), but it is redundant and can be confusing.

To get the desired behavior from Instrument2.cpp, simply add the virtual keyword in the base class before play( ):

//: C15:Instrument3.cpp
// Late binding with the virtual keyword
#include <iostream>
using namespace std;
enum note { middleC, Csharp, Cflat }; // Etc.

class Instrument {
public:
  virtual void play(note) const {
    cout << "Instrument::play" << endl;
  }
};

// Wind objects are Instruments
// because they have the same interface:
class Wind : public Instrument {
public:
  // Override interface function:
  void play(note) const {
    cout << "Wind::play" << endl;
  }
};

void tune(Instrument& i) {
  // ...
  i.play(middleC);
}

int main() {
  Wind flute;
  tune(flute); // Upcasting
} ///:~

This file is identical to Instrument2.cpp except for the addition of the virtual keyword, and yet the behavior is significantly different: Now the output is Wind::play.

Extensibility

With play( ) defined as virtual in the base class, you can add as many new types as you want without changing the tune( ) function. In a well-designed OOP program, most or all of your functions will follow the model of tune( ) and communicate only with the base-class interface . Such a program is extensible because you can add new functionality by inheriting new data types from the common base class. The functions that manipulate the base-class interface will not need to be changed at all to accommodate the new classes.

Here’s the instrument example with more virtual functions and a number of new classes, all of which work correctly with the old, unchanged tune( ) function:

//: C15:Instrument4.cpp
// Extensibility in OOP
#include <iostream>
using namespace std;
enum note { middleC, Csharp, Cflat }; // Etc.

class Instrument {
public:
  virtual void play(note) const {
    cout << "Instrument::play" << endl;
  }
  virtual char* what() const {
    return "Instrument";
  }
  // Assume this will modify the object:
  virtual void adjust(int) {}
};

class Wind : public Instrument {
public:
  void play(note) const {
    cout << "Wind::play" << endl;
  }
  char* what() const { return "Wind"; }
  void adjust(int) {}
};

class Percussion : public Instrument {
public:
  void play(note) const {
    cout << "Percussion::play" << endl;
  }
  char* what() const { return "Percussion"; }
  void adjust(int) {}
};

class Stringed : public Instrument {
public:
  void play(note) const {
    cout << "Stringed::play" << endl;
  }
  char* what() const { return "Stringed"; }
  void adjust(int) {}
};

class Brass : public Wind {
public:
  void play(note) const {
    cout << "Brass::play" << endl;
  }
  char* what() const { return "Brass"; }
};

class Woodwind : public Wind {
public:
  void play(note) const {
    cout << "Woodwind::play" << endl;
  }
  char* what() const { return "Woodwind"; }
};

// Identical function from before:
void tune(Instrument& i) {
  // ...
  i.play(middleC);
}

// New function:
void f(Instrument& i) { i.adjust(1); }

// Upcasting during array initialization:
Instrument* A[] = {
  new Wind,
  new Percussion,
  new Stringed,
  new Brass,
};

int main() {
  Wind flute;
  Percussion drum;
  Stringed violin;
  Brass flugelhorn;
  Woodwind recorder;
  tune(flute);
  tune(drum);
  tune(violin);
  tune(flugelhorn);
  tune(recorder);
  f(flugelhorn);
} ///:~

You can see that another inheritance level has been added beneath Wind, but the virtual mechanism works correctly no matter how many levels there are. The adjust( ) function is not overridden for Brass and Woodwind. When this happens, the “closest” definition in the inheritance hierarchy is automatically used – the compiler guarantees there’s always some definition for a virtual function, so you’ll never end up with a call that doesn’t bind to a function body. (That would be disastrous.)

The array A[ ] contains pointers to the base class Instrument, so upcasting occurs during the process of array initialization. This array and the function f( ) will be used in later discussions.

In the call to tune( ), upcasting is performed on each different type of object, yet the desired behavior always takes place. This can be described as “sending a message to an object and letting the object worry about what to do with it.” The virtual function is the lens to use when you’re trying to analyze a project: Where should the base classes occur, and how might you want to extend the program? However, even if you don’t discover the proper base class interfaces and virtual functions at the initial creation of the program, you’ll often discover them later, even much later, when you set out to extend or otherwise maintain the program. This is not an analysis or design error; it simply means you didn’t or couldn’t know all the information the first time. Because of the tight class modularization in C++, it isn’t a large problem when this occurs because changes you make in one part of a system tend not to propagate to other parts of the system as they do in C.

How C++ implements late binding

How can late binding happen? All the work goes on behind the scenes by the compiler, which installs the necessary late-binding mechanism when you ask it to (you ask by creating virtual functions). Because programmers often benefit from understanding the mechanism of virtual functions in C++, this section will elaborate on the way the compiler implements this mechanism.

The keyword virtual tells the compiler it should not perform early binding. Instead, it should automatically install all the mechanisms necessary to perform late binding. This means that if you call play( ) for a Brass object through an address for the base-class Instrument, you’ll get the proper function.

To accomplish this, the typical compiler[54] creates a single table (called the VTABLE) for each class that contains virtual functions. The compiler places the addresses of the virtual functions for that particular class in the VTABLE. In each class with virtual functions, it secretly places a pointer, called the vpointer (abbreviated as VPTR), which points to the VTABLE for that object. When you make a virtual function call through a base-class pointer (that is, when you make a polymorphic call ), the compiler quietly inserts code to fetch the VPTR and look up the function address in the VTABLE, thus calling the correct function and causing late binding to take place.

All of this – setting up the VTABLE for each class, initializing the VPTR, inserting the code for the virtual function call – happens automatically, so you don’t have to worry about it. With virtual functions, the proper function gets called for an object, even if the compiler cannot know the specific type of the object.

The following sections go into this process in more detail.

Storing type information

You can see that there is no explicit type information stored in any of the classes. But the previous examples, and simple logic, tell you that there must be some sort of type information stored in the objects; otherwise the type could not be established at runtime. This is true, but the type information is hidden. To see it, here’s an example to examine the sizes of classes that use virtual functions compared with those that don’t:

//: C15:Sizes.cpp
// Object sizes with/without virtual functions
#include <iostream>
using namespace std;

class NoVirtual {
  int a;
public:
  void x() const {}
  int i() const { return 1; }
};

class OneVirtual {
  int a;
public:
  virtual void x() const {}
  int i() const { return 1; }
};

class TwoVirtuals {
  int a;
public:
  virtual void x() const {}
  virtual int i() const { return 1; }
};

int main() {
  cout << "int: " << sizeof(int) << endl;
  cout << "NoVirtual: "
       << sizeof(NoVirtual) << endl;
  cout << "void* : " << sizeof(void*) << endl;
  cout << "OneVirtual: "
       << sizeof(OneVirtual) << endl;
  cout << "TwoVirtuals: "
       << sizeof(TwoVirtuals) << endl;
} ///:~

With no virtual functions, the size of the object is exactly what you’d expect: the size of a single[55] int. With a single virtual function in OneVirtual, the size of the object is the size of NoVirtual plus the size of a void pointer. It turns out that the compiler inserts a single pointer (the VPTR) into the structure if you have one or more virtual functions. There is no size difference between OneVirtual and TwoVirtuals. That’s because the VPTR points to a table of function addresses. You need only one table because all the virtual function addresses are contained in that single table.

This example required at least one data member. If there had been no data members, the C++ compiler would have forced the objects to be a nonzero size because each object must have a distinct address. If you imagine indexing into an array of zero-sized objects, you’ll understand. A “dummy” member is inserted into objects that would otherwise be zero-sized. When the type information is inserted because of the virtual keyword, this takes the place of the “dummy” member. Try commenting out the int a in all the classes in the example above to see this.

Picturing virtual functions

To understand exactly what’s going on when you use a virtual function, it’s helpful to visualize the activities going on behind the curtain. Here’s a drawing of the array of pointers A[ ] in Instrument4.cpp:

The array of Instrument pointers has no specific type information; they each point to an object of type Instrument. Wind, Percussion, Stringed, and Brass all fit into this category because they are derived from Instrument (and thus have the same interface as Instrument, and can respond to the same messages), so their addresses can also be placed into the array. However, the compiler doesn’t know that they are anything more than Instrument objects, so left to its own devices it would normally call the base-class versions of all the functions. But in this case, all those functions have been declared with the virtual keyword, so something different happens.

Each time you create a class that contains virtual functions, or you derive from a class that contains virtual functions, the compiler creates a unique VTABLE for that class, seen on the right of the diagram. In that table it places the addresses of all the functions that are declared virtual in this class or in the base class. If you don’t override a function that was declared virtual in the base class, the compiler uses the address of the base-class version in the derived class. (You can see this in the adjust entry in the Brass VTABLE.) Then it places the VPTR (discovered in Sizes.cpp) into the class. There is only one VPTR for each object when using simple inheritance like this. The VPTR must be initialized to point to the starting address of the appropriate VTABLE. (This happens in the constructor, which you’ll see later in more detail.)

Once the VPTR is initialized to the proper VTABLE, the object in effect “knows” what type it is. But this self-knowledge is worthless unless it is used at the point a virtual function is called.

When you call a virtual function through a base class address (the situation when the compiler doesn’t have all the information necessary to perform early binding), something special happens. Instead of performing a typical function call, which is simply an assembly-language CALL to a particular address, the compiler generates different code to perform the function call. Here’s what a call to adjust( ) for a Brass object looks like, if made through an Instrument pointer (An Instrument reference produces the same result):

The compiler begins with the Instrument pointer, which points to the starting address of the object. All Instrument objects or objects derived from Instrument have their VPTR in the same place (often at the beginning of the object), so the compiler can pick the VPTR out of the object. The VPTR points to the starting address of the VTABLE. All the VTABLE function addresses are laid out in the same order, regardless of the specific type of the object. play( ) is first, what( ) is second, and adjust( ) is third. The compiler knows that regardless of the specific object type, the adjust( ) function is at the location VPTR+2. Thus, instead of saying, “Call the function at the absolute location Instrument::adjust” (early binding ; the wrong action), it generates code that says, in effect, “Call the function at VPTR+2.” Because the fetching of the VPTR and the determination of the actual function address occur at runtime, you get the desired late binding. You send a message to the object, and the object figures out what to do with it.

Under the hood

It can be helpful to see the assembly-language code generated by a virtual function call, so you can see that late-binding is indeed taking place. Here’s the output from one compiler for the call

i.adjust(1);

inside the function f(Instrument& i):

push  1
push  si
mov   bx, word ptr [si]
call  word ptr [bx+4]
add   sp, 4

The arguments of a C++ function call, like a C function call, are pushed on the stack from right to left (this order is required to support C’s variable argument lists), so the argument 1 is pushed on the stack first. At this point in the function, the register si (part of the Intel X86 processor architecture) contains the address of i. This is also pushed on the stack because it is the starting address of the object of interest. Remember that the starting address corresponds to the value of this, and this is quietly pushed on the stack as an argument before every member function call, so the member function knows which particular object it is working on. So you’ll always see one more than the number of arguments pushed on the stack before a member function call (except for static member functions, which have no this).

Now the actual virtual function call must be performed. First, the VPTR must be produced, so the VTABLE can be found. For this compiler the VPTR is inserted at the beginning of the object, so the contents of this correspond to the VPTR. The line

mov bx, word ptr [si]

fetches the word that si (that is, this) points to, which is the VPTR. It places the VPTR into the register bx.

The VPTR contained in bx points to the starting address of the VTABLE, but the function pointer to call isn’t at location zero of the VTABLE, but instead at location two (because it’s the third function in the list). For this memory model each function pointer is two bytes long, so the compiler adds four to the VPTR to calculate where the address of the proper function is. Note that this is a constant value, established at compile time, so the only thing that matters is that the function pointer at location number two is the one for adjust( ). Fortunately, the compiler takes care of all the bookkeeping for you and ensures that all the function pointers in all the VTABLEs of a particular class hierarchy occur in the same order, regardless of the order that you may override them in derived classes.

Once the address of the proper function pointer in the VTABLE is calculated, that function is called. So the address is fetched and called all at once in the statement

call word ptr [bx+4]

Finally, the stack pointer is moved back up to clean off the arguments that were pushed before the call. In C and C++ assembly code you’ll often see the caller clean off the arguments but this may vary depending on processors and compiler implementations.

Installing the vpointer

Because the VPTR determines the virtual function behavior of the object, you can see how it’s critical that the VPTR always be pointing to the proper VTABLE. You don’t ever want to be able to make a call to a virtual function before the VPTR is properly initialized. Of course, the place where initialization can be guaranteed is in the constructor, but none of the Instrument examples has a constructor.

This is where creation of the default constructor is essential. In the Instrument examples, the compiler creates a default constructor that does nothing except initialize the VPTR. This constructor, of course, is automatically called for all Instrument objects before you can do anything with them, so you know that it’s always safe to call virtual functions.

The implications of the automatic initialization of the VPTR inside the constructor are discussed in a later section.

Objects are different

It’s important to realize that upcasting deals only with addresses. If the compiler has an object, it knows the exact type and therefore (in C++) will not use late binding for any function calls – or at least, the compiler doesn’t need to use late binding. For efficiency’s sake, most compilers will perform early binding when they are making a call to a virtual function for an object because they know the exact type. Here’s an example:

//: C15:Early.cpp
// Early binding & virtual functions
#include <iostream>
#include <string>
using namespace std;

class Pet {
public:
  virtual string speak() const { return ""; }
};

class Dog : public Pet {
public:
  string speak() const { return "Bark!"; }
};

int main() {
  Dog ralph;
  Pet* p1 = &ralph;
  Pet& p2 = ralph;
  Pet p3;
  // Late binding for both:
  cout << "p1->speak() = " << p1->speak() <<endl;
  cout << "p2.speak() = " << p2.speak() << endl;
  // Early binding (probably):
  cout << "p3.speak() = " << p3.speak() << endl;
} ///:~

In p1–>speak( ) and p2.speak( ), addresses are used, which means the information is incomplete: p1 and p2 can represent the address of a Pet or something derived from Pet, so the virtual mechanism must be used. When calling p3.speak( ) there’s no ambiguity. The compiler knows the exact type and that it’s an object, so it can’t possibly be an object derived from Pet – it’s exactly a Pet. Thus, early binding is probably used. However, if the compiler doesn’t want to work so hard, it can still use late binding and the same behavior will occur.

	The quality software developer.™ © 2003-2004 ruben\|labs corp. All Rights Reserved. Timp de generare a paginii: 17626 secunde Versiune site: 1.8 SP3 (build 2305-rtm.88542-10.2004)