Am avut 371697 vizite de la lansarea siteului.




Inapoi        Inainte       Cuprins

6: Initialization & Cleanup

Chapter 4 made a significant improvement in library
use by taking all the scattered components of a typical
C library and encapsulating them into a structure (an abstract data type, called a class from now on).

This not only provides a single unified point of entry into a library component, but it also hides the names of the functions within the class name. In Chapter 5, access control (implementation hiding) was introduced. This gives the class designer a way to establish clear boundaries for determining what the client programmer is allowed to manipulate and what is off limits. It means the internal mechanisms of a data type’s operation are under the control and discretion of the class designer, and it’s clear to client programmers what members they can and should pay attention to.

Together, encapsulation and access control make a significant step in improving the ease of library use. The concept of “new data type” they provide is better in some ways than the existing built-in data types from C. The C++ compiler can now provide type-checking guarantees for that data type and thus ensure a level of safety when that data type is being used.

When it comes to safety, however, there’s a lot more the compiler can do for us than C provides. In this and future chapters, you’ll see additional features that have been engineered into C++ that make the bugs in your program almost leap out and grab you, sometimes before you even compile the program, but usually in the form of compiler warnings and errors. For this reason, you will soon get used to the unlikely-sounding scenario that a C++ program that compiles often runs right the first time.

Two of these safety issues are initialization and cleanup. A large segment of C bugs occur when the programmer forgets to initialize or clean up a variable. This is especially true with C libraries, when client programmers don’t know how to initialize a struct, or even that they must. (Libraries often do not include an initialization function, so the client programmer is forced to initialize the struct by hand.) Cleanup is a special problem because C programmers are comfortable with forgetting about variables once they are finished, so any cleaning up that may be necessary for a library’s struct is often missed.

In C++, the concept of initialization and cleanup is essential for easy library use and to eliminate the many subtle bugs that occur when the client programmer forgets to perform these activities. This chapter examines the features in C++ that help guarantee proper initialization and cleanup.

Guaranteed initialization with the constructor

Both the Stash and Stack classes defined previously have a function called initialize( ), which hints by its name that it should be called before using the object in any other way. Unfortunately, this means the client programmer must ensure proper initialization. Client programmers are prone to miss details like initialization in their headlong rush to make your amazing library solve their problem. In C++, initialization is too important to leave to the client programmer. The class designer can guarantee initialization of every object by providing a special function called the constructor. If a class has a constructor, the compiler automatically calls that constructor at the point an object is created, before client programmers can get their hands on the object. The constructor call isn’t even an option for the client programmer; it is performed by the compiler at the point the object is defined.

The next challenge is what to name this function. There are two issues. The first is that any name you use is something that can potentially clash with a name you might like to use as a member in the class. The second is that because the compiler is responsible for calling the constructor, it must always know which function to call. The solution Stroustrup chose seems the easiest and most logical: the name of the constructor is the same as the name of the class. It makes sense that such a function will be called automatically on initialization.

Here’s a simple class with a constructor:

class X {
  int i;
public:
  X();  // Constructor
}; 

Now, when an object is defined,

void f() {
  X a;
  // ...
} 

the same thing happens as if a were an int: storage is allocated for the object. But when the program reaches the sequence point (point of execution) where a is defined, the constructor is called automatically. That is, the compiler quietly inserts the call to X::X( ) for the object a at the point of definition. Like any member function, the first (secret) argument to the constructor is the this pointer – the address of the object for which it is being called. In the case of the constructor, however, this is pointing to an un-initialized block of memory, and it’s the job of the constructor to initialize this memory properly.

Like any function, the constructor can have arguments to allow you to specify how an object is created, give it initialization values, and so on. Constructor arguments provide you with a way to guarantee that all parts of your object are initialized to appropriate values. For example, if a class Tree has a constructor that takes a single integer argument denoting the height of the tree, then you must create a tree object like this:

Tree t(12);  // 12-foot tree

If Tree(int) is your only constructor, the compiler won’t let you create an object any other way. (We’ll look at multiple constructors and different ways to call constructors in the next chapter.)

That’s really all there is to a constructor; it’s a specially named function that is called automatically by the compiler for every object at the point of that object’s creation. Despite it’s simplicity, it is exceptionally valuable because it eliminates a large class of problems and makes the code easier to write and read. In the preceding code fragment, for example, you don’t see an explicit function call to some initialize( ) function that is conceptually separate from definition. In C++, definition and initialization are unified concepts – you can’t have one without the other.

Both the constructor and destructor are very unusual types of functions: they have no return value. This is distinctly different from a void return value, in which the function returns nothing but you still have the option to make it something else. Constructors and destructors return nothing and you don’t have an option. The acts of bringing an object into and out of the program are special, like birth and death, and the compiler always makes the function calls itself, to make sure they happen. If there were a return value, and if you could select your own, the compiler would somehow have to know what to do with the return value, or the client programmer would have to explicitly call constructors and destructors, which would eliminate their safety.

Guaranteed cleanup with the destructor

As a C programmer, you often think about the importance of initialization, but it’s rarer to think about cleanup. After all, what do you need to do to clean up an int? Just forget about it. However, with libraries, just “letting go” of an object once you’re done with it is not so safe. What if it modifies some piece of hardware, or puts something on the screen, or allocates storage on the heap? If you just forget about it, your object never achieves closure upon its exit from this world. In C++, cleanup is as important as initialization and is therefore guaranteed with the destructor.

The syntax for the destructor is similar to that for the constructor: the class name is used for the name of the function. However, the destructor is distinguished from the constructor by a leading tilde (~). In addition, the destructor never has any arguments because destruction never needs any options. Here’s the declaration for a destructor:

class Y {
public:
  ~Y();
}; 

The destructor is called automatically by the compiler when the object goes out of scope. You can see where the constructor gets called by the point of definition of the object, but the only evidence for a destructor call is the closing brace of the scope that surrounds the object. Yet the destructor is still called, even when you use goto to jump out of a scope. (goto still exists in C++ for backward compatibility with C and for the times when it comes in handy.) You should note that a nonlocal goto, implemented by the Standard C library functions setjmp( ) and longjmp( ), doesn’t cause destructors to be called. (This is the specification, even if your compiler doesn’t implement it that way. Relying on a feature that isn’t in the specification means your code is nonportable.)

Here’s an example demonstrating the features of constructors and destructors you’ve seen so far:

//: C06:Constructor1.cpp
// Constructors & destructors
#include <iostream>
using namespace std;

class Tree {
  int height;
public:
  Tree(int initialHeight);  // Constructor
  ~Tree();  // Destructor
  void grow(int years);
  void printsize();
};

Tree::Tree(int initialHeight) {
  height = initialHeight;
}

Tree::~Tree() {
  cout << "inside Tree destructor" << endl;
  printsize();
}

void Tree::grow(int years) {
  height += years;
}

void Tree::printsize() {
  cout << "Tree height is " << height << endl;
}

int main() {
  cout << "before opening brace" << endl;
  {
    Tree t(12);
    cout << "after Tree creation" << endl;
    t.printsize();
    t.grow(4);
    cout << "before closing brace" << endl;
  }
  cout << "after closing brace" << endl;
} ///:~

Here’s the output of the above program:

before opening brace
after Tree creation
Tree height is 12
before closing brace
inside Tree destructor
Tree height is 16
after closing brace

You can see that the destructor is automatically called at the closing brace of the scope that encloses it.

Elimination of the definition block

In C, you must always define all the variables at the beginning of a block, after the opening brace. This is not an uncommon requirement in programming languages, and the reason given has often been that it’s “good programming style.” On this point, I have my suspicions. It has always seemed inconvenient to me, as a programmer, to pop back to the beginning of a block every time I need a new variable. I also find code more readable when the variable definition is close to its point of use.

Perhaps these arguments are stylistic. In C++, however, there’s a significant problem in being forced to define all objects at the beginning of a scope. If a constructor exists, it must be called when the object is created. However, if the constructor takes one or more initialization arguments, how do you know you will have that initialization information at the beginning of a scope? In the general programming situation, you won’t. Because C has no concept of private, this separation of definition and initialization is no problem. However, C++ guarantees that when an object is created, it is simultaneously initialized. This ensures that you will have no uninitialized objects running around in your system. C doesn’t care; in fact, C encourages this practice by requiring you to define variables at the beginning of a block before you necessarily have the initialization information[38].

In general, C++ will not allow you to create an object before you have the initialization information for the constructor. Because of this, the language wouldn’t be feasible if you had to define variables at the beginning of a scope. In fact, the style of the language seems to encourage the definition of an object as close to its point of use as possible. In C++, any rule that applies to an “object” automatically refers to an object of a built-in type as well. This means that any class object or variable of a built-in type can also be defined at any point in a scope. It also means that you can wait until you have the information for a variable before defining it, so you can always define and initialize at the same time:

//: C06:DefineInitialize.cpp
// Defining variables anywhere
#include "../require.h"
#include <iostream>
#include <string>
using namespace std;

class G {
  int i;
public:
  G(int ii);
};

G::G(int ii) { i = ii; }

int main() {
  cout << "initialization value? ";
  int retval = 0;
  cin >> retval;
  require(retval != 0);
  int y = retval + 3;
  G g(y);
} ///:~

You can see that some code is executed, then retval is defined, initialized, and used to capture user input, and then y and g are defined. C, on the other hand, does not allow a variable to be defined anywhere except at the beginning of the scope.

In general, you should define variables as close to their point of use as possible, and always initialize them when they are defined. (This is a stylistic suggestion for built-in types, where initialization is optional.) This is a safety issue. By reducing the duration of the variable’s availability within the scope, you are reducing the chance it will be misused in some other part of the scope. In addition, readability is improved because the reader doesn’t have to jump back and forth to the beginning of the scope to know the type of a variable.

for loops

In C++, you will often see a for loop counter defined right inside the for expression:

for(int j = 0; j < 100; j++) {
    cout << "j = " << j << endl;
}
for(int i = 0; i < 100; i++)
 cout << "i = " << i << endl;

The statements above are important special cases, which cause confusion to new C++ programmers.

The variables i and j are defined directly inside the for expression (which you cannot do in C). They are then available for use in the for loop. It’s a very convenient syntax because the context removes all question about the purpose of i and j, so you don’t need to use such ungainly names as i_loop_counter for clarity.

However, some confusion may result if you expect the lifetimes of the variables i and j to extend beyond the scope of the for loop – they do not[39].

Chapter 3 points out that while and switch statements also allow the definition of objects in their control expressions, although this usage seems far less important than with the for loop.

Watch out for local variables that hide variables from the enclosing scope. In general, using the same name for a nested variable and a variable that is global to that scope is confusing and error prone[40].

I find small scopes an indicator of good design. If you have several pages for a single function, perhaps you’re trying to do too much with that function. More granular functions are not only more useful, but it’s also easier to find bugs.

Storage allocation

A variable can now be defined at any point in a scope, so it might seem that the storage for a variable may not be defined until its point of definition. It’s actually more likely that the compiler will follow the practice in C of allocating all the storage for a scope at the opening brace of that scope. It doesn’t matter because, as a programmer, you can’t access the storage (a.k.a. the object) until it has been defined[41]. Although the storage is allocated at the beginning of the block, the constructor call doesn’t happen until the sequence point where the object is defined because the identifier isn’t available until then. The compiler even checks to make sure that you don’t put the object definition (and thus the constructor call) where the sequence point only conditionally passes through it, such as in a switch statement or somewhere a goto can jump past it. Uncommenting the statements in the following code will generate a warning or an error:

//: C06:Nojump.cpp
// Can't jump past constructors

class X {
public:
  X();
};

X::X() {}

void f(int i) {
  if(i < 10) {
   //! goto jump1; // Error: goto bypasses init
  }
  X x1;  // Constructor called here
 jump1:
  switch(i) {
    case 1 :
      X x2;  // Constructor called here
      break;
  //! case 2 : // Error: case bypasses init
      X x3;  // Constructor called here
      break;
  }
} 

int main() {
  f(9);
  f(11);
}///:~ 

In the code above, both the goto and the switch can potentially jump past the sequence point where a constructor is called. That object will then be in scope even if the constructor hasn’t been called, so the compiler gives an error message. This once again guarantees that an object cannot be created unless it is also initialized.

All the storage allocation discussed here happens, of course, on the stack. The storage is allocated by the compiler by moving the stack pointer “down” (a relative term, which may indicate an increase or decrease of the actual stack pointer value, depending on your machine). Objects can also be allocated on the heap using new, which is something we’ll explore further in Chapter 13.

Stash with constructors and destructors

The examples from previous chapters have obvious functions that map to constructors and destructors: initialize( ) and cleanup( ). Here’s the Stash header using constructors and destructors:

//: C06:Stash2.h
// With constructors & destructors
#ifndef STASH2_H
#define STASH2_H

class Stash {
  int size;      // Size of each space
  int quantity;  // Number of storage spaces
  int next;      // Next empty space
  // Dynamically allocated array of bytes:
  unsigned char* storage;
  void inflate(int increase);
public:
  Stash(int size);
  ~Stash();
  int add(void* element);
  void* fetch(int index);
  int count();
};
#endif // STASH2_H ///:~

The only member function definitions that are changed are initialize( ) and cleanup( ), which have been replaced with a constructor and destructor:

//: C06:Stash2.cpp {O}
// Constructors & destructors
#include "Stash2.h"
#include "../require.h"
#include <iostream>
#include <cassert>
using namespace std;
const int increment = 100;

Stash::Stash(int sz) {
  size = sz;
  quantity = 0;
  storage = 0;
  next = 0;
}

int Stash::add(void* element) {
  if(next >= quantity) // Enough space left?
    inflate(increment);
  // Copy element into storage,
  // starting at next empty space:
  int startBytes = next * size;
  unsigned char* e = (unsigned char*)element;
  for(int i = 0; i < size; i++)
    storage[startBytes + i] = e[i];
  next++;
  return(next - 1); // Index number
}

void* Stash::fetch(int index) {
  require(0 <= index, "Stash::fetch (-)index");
  if(index >= next)
    return 0; // To indicate the end
  // Produce pointer to desired element:
  return &(storage[index * size]);
}

int Stash::count() {
  return next; // Number of elements in CStash
}

void Stash::inflate(int increase) {
  require(increase > 0, 
    "Stash::inflate zero or negative increase");
  int newQuantity = quantity + increase;
  int newBytes = newQuantity * size;
  int oldBytes = quantity * size;
  unsigned char* b = new unsigned char[newBytes];
  for(int i = 0; i < oldBytes; i++)
    b[i] = storage[i]; // Copy old to new
  delete [](storage); // Old storage
  storage = b; // Point to new memory
  quantity = newQuantity;
}

Stash::~Stash() {
  if(storage != 0) {
   cout << "freeing storage" << endl;
   delete []storage;
  }
} ///:~

You can see that the require.h functions are being used to watch for programmer errors, instead of assert( ). The output of a failed assert( ) is not as useful as that of the require.h functions (which will be shown later in the book).

Because inflate( ) is private, the only way a require( ) could fail is if one of the other member functions accidentally passed an incorrect value to inflate( ). If you are certain this can’t happen, you could consider removing the require( ), but you might keep in mind that until the class is stable, there’s always the possibility that new code might be added to the class that could cause errors. The cost of the require( ) is low (and could be automatically removed using the preprocessor) and the value of code robustness is high.

Notice in the following test program how the definitions for Stash objects appear right before they are needed, and how the initialization appears as part of the definition, in the constructor argument list:

//: C06:Stash2Test.cpp
//{L} Stash2
// Constructors & destructors
#include "Stash2.h"
#include "../require.h"
#include <fstream>
#include <iostream>
#include <string>
using namespace std;

int main() {
  Stash intStash(sizeof(int));
  for(int i = 0; i < 100; i++)
    intStash.add(&i);
  for(int j = 0; j < intStash.count(); j++)
    cout << "intStash.fetch(" << j << ") = "
         << *(int*)intStash.fetch(j)
         << endl;
  const int bufsize = 80;
  Stash stringStash(sizeof(char) * bufsize);
  ifstream in("Stash2Test.cpp");
  assure(in, " Stash2Test.cpp");
  string line;
  while(getline(in, line))
    stringStash.add((char*)line.c_str());
  int k = 0;
  char* cp;
  while((cp = (char*)stringStash.fetch(k++))!=0)
    cout << "stringStash.fetch(" << k << ") = "
         << cp << endl;
} ///:~

Also notice how the cleanup( ) calls have been eliminated, but the destructors are still automatically called when intStash and stringStash go out of scope.

One thing to be aware of in the Stash examples: I’m being very careful to use only built-in types; that is, those without destructors. If you were to try to copy class objects into the Stash, you’d run into all kinds of problems and it wouldn’t work right. The Standard C++ Library can actually make correct copies of objects into its containers, but this is a rather messy and complicated process. In the following Stack example, you’ll see that pointers are used to sidestep this issue, and in a later chapter the Stash will be converted so that it uses pointers.

Home   |   Web Faq   |   Radio Online   |   About   |   Products   |   Webmaster Login

The quality software developer.™
© 2003-2004 ruben|labs corp. All Rights Reserved.
Timp de generare a paginii: 17583 secunde
Versiune site: 1.8 SP3 (build 2305-rtm.88542-10.2004)