.:: Welcome to software development web corner ::.

Am avut 372444 vizite de la lansarea siteului.

Inapoi Inainte Cuprins

Composite type creation

The fundamental data types and their variations are essential, but rather primitive. C and C++ provide tools that allow you to compose more sophisticated data types from the fundamental data types. As you’ll see, the most important of these is struct, which is the foundation for class in C++. However, the simplest way to create more sophisticated types is simply to alias a name to another name via typedef.

Aliasing names with typedef

This keyword promises more than it delivers: typedef suggests “type definition” when “alias” would probably have been a more accurate description, since that’s what it really does. The syntax is:

typedef existing-type-description alias-name

People often use typedef when data types get slightly complicated, just to prevent extra keystrokes. Here is a commonly-used typedef:

typedef unsigned long ulong;

Now if you say ulong the compiler knows that you mean unsigned long. You might think that this could as easily be accomplished using preprocessor substitution, but there are key situations in which the compiler must be aware that you’re treating a name as if it were a type, so typedef is essential.

One place where typedef comes in handy is for pointer types. As previously mentioned, if you say:

int* x, y;

This actually produces an int* which is x and an int (not an int*) which is y. That is, the ‘*’ binds to the right, not the left. However, if you use a typedef:

typedef int* IntPtr;
IntPtr x, y;

Then both x and y are of type int*.

You can argue that it’s more explicit and therefore more readable to avoid typedefs for primitive types, and indeed programs rapidly become difficult to read when many typedefs are used. However, typedefs become especially important in C when used with struct.

Combining variables with struct

A struct is a way to collect a group of variables into a structure. Once you create a struct, then you can make many instances of this “new” type of variable you’ve invented. For example:

//: C03:SimpleStruct.cpp
struct Structure1 {
  char c;
  int i;
  float f;
  double d;
};

int main() {
  struct Structure1 s1, s2;
  s1.c = 'a'; // Select an element using a '.'
  s1.i = 1;
  s1.f = 3.14;
  s1.d = 0.00093;
  s2.c = 'a';
  s2.i = 1;
  s2.f = 3.14;
  s2.d = 0.00093;
} ///:~

The struct declaration must end with a semicolon. In main( ), two instances of Structure1 are created: s1 and s2. Each of these has their own separate versions of c, i, f, and d. So s1 and s2 represent clumps of completely independent variables. To select one of the elements within s1 or s2, you use a ‘.’, syntax you’ve seen in the previous chapter when using C++ class objects – since classes evolved from structs, this is where that syntax arose from.

One thing you’ll notice is the awkwardness of the use of Structure1 (as it turns out, this is only required by C, not C++). In C, you can’t just say Structure1 when you’re defining variables, you must say struct Structure1. This is where typedef becomes especially handy in C:

//: C03:SimpleStruct2.cpp
// Using typedef with struct
typedef struct {
  char c;
  int i;
  float f;
  double d;
} Structure2;

int main() {
  Structure2 s1, s2;
  s1.c = 'a';
  s1.i = 1;
  s1.f = 3.14;
  s1.d = 0.00093;
  s2.c = 'a';
  s2.i = 1;
  s2.f = 3.14;
  s2.d = 0.00093;
} ///:~

By using typedef in this way, you can pretend (in C; try removing the typedef for C++) that Structure2 is a built-in type, like int or float, when you define s1 and s2 (but notice it only has data – characteristics – and does not include behavior, which is what we get with real objects in C++). You’ll notice that the struct identifier has been left off at the beginning, because the goal is to create the typedef. However, there are times when you might need to refer to the struct during its definition. In those cases, you can actually repeat the name of the struct as the struct name and as the typedef:

//: C03:SelfReferential.cpp
// Allowing a struct to refer to itself

typedef struct SelfReferential {
  int i;
  SelfReferential* sr; // Head spinning yet?
} SelfReferential;

int main() {
  SelfReferential sr1, sr2;
  sr1.sr = &sr2;
  sr2.sr = &sr1;
  sr1.i = 47;
  sr2.i = 1024;
} ///:~

If you look at this for awhile, you’ll see that sr1 and sr2 point to each other, as well as each holding a piece of data.

Actually, the struct name does not have to be the same as the typedef name, but it is usually done this way as it tends to keep things simpler.

Pointers and structs

In the examples above, all the structs are manipulated as objects. However, like any piece of storage, you can take the address of a struct object (as seen in SelfReferential.cpp above). To select the elements of a particular struct object, you use a ‘.’, as seen above. However, if you have a pointer to a struct object, you must select an element of that object using a different operator: the ‘->’. Here’s an example:

//: C03:SimpleStruct3.cpp
// Using pointers to structs
typedef struct Structure3 {
  char c;
  int i;
  float f;
  double d;
} Structure3;

int main() {
  Structure3 s1, s2;
  Structure3* sp = &s1;
  sp->c = 'a';
  sp->i = 1;
  sp->f = 3.14;
  sp->d = 0.00093;
  sp = &s2; // Point to a different struct object
  sp->c = 'a';
  sp->i = 1;
  sp->f = 3.14;
  sp->d = 0.00093;
} ///:~

In main( ), the struct pointer sp is initially pointing to s1, and the members of s1 are initialized by selecting them with the ‘->’ (and you use this same operator in order to read those members). But then sp is pointed to s2, and those variables are initialized the same way. So you can see that another benefit of pointers is that they can be dynamically redirected to point to different objects; this provides more flexibility in your programming, as you will learn.

For now, that’s all you need to know about structs, but you’ll become much more comfortable with them (and especially their more potent successors, classes) as the book progresses.

Clarifying programs with enum

An enumerated data type is a way of attaching names to numbers, thereby giving more meaning to anyone reading the code. The enum keyword (from C) automatically enumerates any list of identifiers you give it by assigning them values of 0, 1, 2, etc. You can declare enum variables (which are always represented as integral values). The declaration of an enum looks similar to a struct declaration.

An enumerated data type is useful when you want to keep track of some sort of feature:

//: C03:Enum.cpp
// Keeping track of shapes

enum ShapeType {
  circle,
  square,
  rectangle
};  // Must end with a semicolon like a struct

int main() {
  ShapeType shape = circle;
  // Activities here....
  // Now do something based on what the shape is:
  switch(shape) {
    case circle:  /* circle stuff */ break;
    case square:  /* square stuff */ break;
    case rectangle:  /* rectangle stuff */ break;
  }
} ///:~

shape is a variable of the ShapeType enumerated data type, and its value is compared with the value in the enumeration. Since shape is really just an int, however, it can be any value an int can hold (including a negative number). You can also compare an int variable with a value in the enumeration.

You should be aware that the example above of switching on type turns out to be a problematic way to program. C++ has a much better way to code this sort of thing, the explanation of which must be delayed until much later in the book.

If you don’t like the way the compiler assigns values, you can do it yourself, like this:

enum ShapeType { 
  circle = 10, square = 20, rectangle = 50
};

If you give values to some names and not to others, the compiler will use the next integral value. For example,

enum snap { crackle = 25, pop };

The compiler gives pop the value 26.

You can see how much more readable the code is when you use enumerated data types. However, to some degree this is still an attempt (in C) to accomplish the things that we can do with a class in C++, so you’ll see enum used less in C++.

Type checking for enumerations

C’s enumerations are fairly primitive, simply associating integral values with names, but they provide no type checking. In C++, as you may have come to expect by now, the concept of type is fundamental, and this is true with enumerations. When you create a named enumeration, you effectively create a new type just as you do with a class: The name of your enumeration becomes a reserved word for the duration of that translation unit.

In addition, there’s stricter type checking for enumerations in C++ than in C. You’ll notice this in particular if you have an instance of an enumeration color called a. In C you can say a++, but in C++ you can’t. This is because incrementing an enumeration is performing two type conversions, one of them legal in C++ and one of them illegal. First, the value of the enumeration is implicitly cast from a color to an int, then the value is incremented, then the int is cast back into a color. In C++ this isn’t allowed, because color is a distinct type and not equivalent to an int. This makes sense, because how do you know the increment of blue will even be in the list of colors? If you want to increment a color, then it should be a class (with an increment operation) and not an enum, because the class can be made to be much safer. Any time you write code that assumes an implicit conversion to an enum type, the compiler will flag this inherently dangerous activity.

Unions (described next) have similar additional type checking in C++.

Saving memory with union

Sometimes a program will handle different types of data using the same variable. In this situation, you have two choices: you can create a struct containing all the possible different types you might need to store, or you can use a union. A union piles all the data into a single space; it figures out the amount of space necessary for the largest item you’ve put in the union, and makes that the size of the union. Use a union to save memory.

Anytime you place a value in a union, the value always starts in the same place at the beginning of the union, but only uses as much space as is necessary. Thus, you create a “super-variable” capable of holding any of the union variables. All the addresses of the union variables are the same (in a class or struct, the addresses are different).

Here’s a simple use of a union. Try removing various elements and see what effect it has on the size of the union. Notice that it makes no sense to declare more than one instance of a single data type in a union (unless you’re just doing it to use a different name).

//: C03:Union.cpp
// The size and simple use of a union
#include <iostream>
using namespace std;

union Packed { // Declaration similar to a class
  char i;
  short j;
  int k;
  long l;
  float f;
  double d;  
  // The union will be the size of a 
  // double, since that's the largest element
};  // Semicolon ends a union, like a struct

int main() {
  cout << "sizeof(Packed) = " 
       << sizeof(Packed) << endl;
  Packed x;
  x.i = 'c';
  cout << x.i << endl;
  x.d = 3.14159;
  cout << x.d << endl;
} ///:~

The compiler performs the proper assignment according to the union member you select.

Once you perform an assignment, the compiler doesn’t care what you do with the union. In the example above, you could assign a floating-point value to x:

x.f = 2.222;

and then send it to the output as if it were an int:

cout << x.i;

This would produce garbage.

Arrays

Arrays are a kind of composite type because they allow you to clump a lot of variables together, one right after the other, under a single identifier name. If you say:

int a[10];

You create storage for 10 int variables stacked on top of each other, but without unique identifier names for each variable. Instead, they are all lumped under the name a.

To access one of these array elements, you use the same square-bracket syntax that you use to define an array:

a[5] = 47;

However, you must remember that even though the size of a is 10, you select array elements starting at zero (this is sometimes called zero indexing), so you can select only the array elements 0-9, like this:

//: C03:Arrays.cpp
#include <iostream>
using namespace std;

int main() {
  int a[10];
  for(int i = 0; i < 10; i++) {
    a[i] = i * 10;
    cout << "a[" << i << "] = " << a[i] << endl;
  }
} ///:~

Array access is extremely fast. However, if you index past the end of the array, there is no safety net – you’ll step on other variables. The other drawback is that you must define the size of the array at compile time; if you want to change the size at runtime you can’t do it with the syntax above (C does have a way to create an array dynamically, but it’s significantly messier). The C++ vector, introduced in the previous chapter, provides an array-like object that automatically resizes itself, so it is usually a much better solution if your array size cannot be known at compile time.

You can make an array of any type, even of structs:

//: C03:StructArray.cpp
// An array of struct

typedef struct {
  int i, j, k;
} ThreeDpoint;

int main() {
  ThreeDpoint p[10];
  for(int i = 0; i < 10; i++) {
    p[i].i = i + 1;
    p[i].j = i + 2;
    p[i].k = i + 3;
  }
} ///:~

Notice how the struct identifier i is independent of the for loop’s i.

To see that each element of an array is contiguous with the next, you can print out the addresses like this:

//: C03:ArrayAddresses.cpp
#include <iostream>
using namespace std;

int main() {
  int a[10];
  cout << "sizeof(int) = "<< sizeof(int) << endl;
  for(int i = 0; i < 10; i++)
    cout << "&a[" << i << "] = " 
         << (long)&a[i] << endl;
} ///:~

When you run this program, you’ll see that each element is one int size away from the previous one. That is, they are stacked one on top of the other.

Pointers and arrays

The identifier of an array is unlike the identifiers for ordinary variables. For one thing, an array identifier is not an lvalue; you cannot assign to it. It’s really just a hook into the square-bracket syntax, and when you give the name of an array, without square brackets, what you get is the starting address of the array:

//: C03:ArrayIdentifier.cpp
#include <iostream>
using namespace std;

int main() {
  int a[10];
  cout << "a = " << a << endl;
  cout << "&a[0] =" << &a[0] << endl;
} ///:~

When you run this program you’ll see that the two addresses (which will be printed in hexadecimal, since there is no cast to long) are the same.

So one way to look at the array identifier is as a read-only pointer to the beginning of an array. And although we can’t change the array identifier to point somewhere else, we can create another pointer and use that to move around in the array. In fact, the square-bracket syntax works with regular pointers as well:

//: C03:PointersAndBrackets.cpp
int main() {
  int a[10];
  int* ip = a;
  for(int i = 0; i < 10; i++)
    ip[i] = i * 10;
} ///:~

The fact that naming an array produces its starting address turns out to be quite important when you want to pass an array to a function. If you declare an array as a function argument, what you’re really declaring is a pointer. So in the following example, func1( ) and func2( ) effectively have the same argument lists:

//: C03:ArrayArguments.cpp
#include <iostream>
#include <string>
using namespace std;

void func1(int a[], int size) {
  for(int i = 0; i < size; i++)
    a[i] = i * i - i;
}

void func2(int* a, int size) {
  for(int i = 0; i < size; i++)
    a[i] = i * i + i;
}

void print(int a[], string name, int size) {
  for(int i = 0; i < size; i++)
    cout << name << "[" << i << "] = " 
         << a[i] << endl;
}

int main() {
  int a[5], b[5];
  // Probably garbage values:
  print(a, "a", 5);
  print(b, "b", 5);
  // Initialize the arrays:
  func1(a, 5);
  func1(b, 5);
  print(a, "a", 5);
  print(b, "b", 5);
  // Notice the arrays are always modified:
  func2(a, 5);
  func2(b, 5);
  print(a, "a", 5);
  print(b, "b", 5);
} ///:~

Even though func1( ) and func2( ) declare their arguments differently, the usage is the same inside the function. There are some other issues that this example reveals: arrays cannot be passed by value[32], that is, you never automatically get a local copy of the array that you pass into a function. Thus, when you modify an array, you’re always modifying the outside object. This can be a bit confusing at first, if you’re expecting the pass-by-value provided with ordinary arguments.

You’ll notice that print( ) uses the square-bracket syntax for array arguments. Even though the pointer syntax and the square-bracket syntax are effectively the same when passing arrays as arguments, the square-bracket syntax makes it clearer to the reader that you mean for this argument to be an array.

Also note that the size argument is passed in each case. Just passing the address of an array isn’t enough information; you must always be able to know how big the array is inside your function, so you don’t run off the end of that array.

Arrays can be of any type, including arrays of pointers. In fact, when you want to pass command-line arguments into your program, C and C++ have a special argument list for main( ), which looks like this:

int main(int argc, char* argv[]) { // ...

The first argument is the number of elements in the array, which is the second argument. The second argument is always an array of char*, because the arguments are passed from the command line as character arrays (and remember, an array can be passed only as a pointer). Each whitespace-delimited cluster of characters on the command line is turned into a separate array argument. The following program prints out all its command-line arguments by stepping through the array:

//: C03:CommandLineArgs.cpp
#include <iostream>
using namespace std;

int main(int argc, char* argv[]) {
  cout << "argc = " << argc << endl;
  for(int i = 0; i < argc; i++)
    cout << "argv[" << i << "] = " 
         << argv[i] << endl;
} ///:~

You’ll notice that argv[0] is the path and name of the program itself. This allows the program to discover information about itself. It also adds one more to the array of program arguments, so a common error when fetching command-line arguments is to grab argv[0] when you want argv[1].

You are not forced to use argc and argv as identifiers in main( ); those identifiers are only conventions (but it will confuse people if you don’t use them). Also, there is an alternate way to declare argv:

int main(int argc, char** argv) { // ...

Both forms are equivalent, but I find the version used in this book to be the most intuitive when reading the code, since it says, directly, “This is an array of character pointers.”

All you get from the command-line is character arrays; if you want to treat an argument as some other type, you are responsible for converting it inside your program. To facilitate the conversion to numbers, there are some helper functions in the Standard C library, declared in <cstdlib>. The simplest ones to use are atoi( ), atol( ), and atof( ) to convert an ASCII character array to an int, long, and double floating-point value, respectively. Here’s an example using atoi( ) (the other two functions are called the same way):

//: C03:ArgsToInts.cpp
// Converting command-line arguments to ints
#include <iostream>
#include <cstdlib>
using namespace std;

int main(int argc, char* argv[]) {
  for(int i = 1; i < argc; i++)
    cout << atoi(argv[i]) << endl;
} ///:~

In this program, you can put any number of arguments on the command line. You’ll notice that the for loop starts at the value 1 to skip over the program name at argv[0]. Also, if you put a floating-point number containing a decimal point on the command line, atoi( ) takes only the digits up to the decimal point. If you put non-numbers on the command line, these come back from atoi( ) as zero.

Exploring floating-point format

The printBinary( ) function introduced earlier in this chapter is handy for delving into the internal structure of various data types. The most interesting of these is the floating-point format that allows C and C++ to store numbers representing very large and very small values in a limited amount of space. Although the details can’t be completely exposed here, the bits inside of floats and doubles are divided into three regions: the exponent, the mantissa, and the sign bit; thus it stores the values using scientific notation. The following program allows you to play around by printing out the binary patterns of various floating point numbers so you can deduce for yourself the scheme used in your compiler’s floating-point format (usually this is the IEEE standard for floating point numbers, but your compiler may not follow that):

//: C03:FloatingAsBinary.cpp
//{L} printBinary
//{T} 3.14159
#include "printBinary.h"
#include <cstdlib>
#include <iostream>
using namespace std;

int main(int argc, char* argv[]) {
  if(argc != 2) {
    cout << "Must provide a number" << endl;
    exit(1);
  }
  double d = atof(argv[1]);
  unsigned char* cp = 
    reinterpret_cast<unsigned char*>(&d);
  for(int i = sizeof(double); i > 0 ; i -= 2) {
    printBinary(cp[i-1]);
    printBinary(cp[i]);
  }
} ///:~

First, the program guarantees that you’ve given it an argument by checking the value of argc, which is two if there’s a single argument (it’s one if there are no arguments, since the program name is always the first element of argv). If this fails, a message is printed and the Standard C Library function exit( ) is called to terminate the program.

The program grabs the argument from the command line and converts the characters to a double using atof( ). Then the double is treated as an array of bytes by taking the address and casting it to an unsigned char*. Each of these bytes is passed to printBinary( ) for display.

This example has been set up to print the bytes in an order such that the sign bit appears first – on my machine. Yours may be different, so you might want to re-arrange the way things are printed. You should also be aware that floating-point formats are not trivial to understand; for example, the exponent and mantissa are not generally arranged on byte boundaries, but instead a number of bits is reserved for each one and they are packed into the memory as tightly as possible. To truly see what’s going on, you’d need to find out the size of each part of the number (sign bits are always one bit, but exponents and mantissas are of differing sizes) and print out the bits in each part separately.

Pointer arithmetic

If all you could do with a pointer that points at an array is treat it as if it were an alias for that array, pointers into arrays wouldn’t be very interesting. However, pointers are more flexible than this, since they can be modified to point somewhere else (but remember, the array identifier cannot be modified to point somewhere else).

Pointer arithmetic refers to the application of some of the arithmetic operators to pointers. The reason pointer arithmetic is a separate subject from ordinary arithmetic is that pointers must conform to special constraints in order to make them behave properly. For example, a common operator to use with pointers is ++, which “adds one to the pointer.” What this actually means is that the pointer is changed to move to “the next value,” whatever that means. Here’s an example:

//: C03:PointerIncrement.cpp
#include <iostream>
using namespace std;

int main() {
  int i[10];
  double d[10];
  int* ip = i;
  double* dp = d;
  cout << "ip = " << (long)ip << endl;
  ip++;
  cout << "ip = " << (long)ip << endl;
  cout << "dp = " << (long)dp << endl;
  dp++;
  cout << "dp = " << (long)dp << endl;
} ///:~

For one run on my machine, the output is:

ip = 6684124
ip = 6684128
dp = 6684044
dp = 6684052

What’s interesting here is that even though the operation ++ appears to be the same operation for both the int* and the double*, you can see that the pointer has been changed only 4 bytes for the int* but 8 bytes for the double*. Not coincidentally, these are the sizes of int and double on my machine. And that’s the trick of pointer arithmetic: the compiler figures out the right amount to change the pointer so that it’s pointing to the next element in the array (pointer arithmetic is only meaningful within arrays). This even works with arrays of structs:

//: C03:PointerIncrement2.cpp
#include <iostream>
using namespace std;

typedef struct {
  char c;
  short s;
  int i;
  long l;
  float f;
  double d;
  long double ld;
} Primitives;

int main() {
  Primitives p[10];
  Primitives* pp = p;
  cout << "sizeof(Primitives) = " 
       << sizeof(Primitives) << endl;
  cout << "pp = " << (long)pp << endl;
  pp++;
  cout << "pp = " << (long)pp << endl;
} ///:~

The output for one run on my machine was:

sizeof(Primitives) = 40
pp = 6683764
pp = 6683804

So you can see the compiler also does the right thing for pointers to structs (and classes and unions).

Pointer arithmetic also works with the operators --, +, and -, but the latter two operators are limited: you cannot add two pointers, and if you subtract pointers the result is the number of elements between the two pointers. However, you can add or subtract an integral value and a pointer. Here’s an example demonstrating the use of pointer arithmetic:

//: C03:PointerArithmetic.cpp
#include <iostream>
using namespace std;

#define P(EX) cout << #EX << ": " << EX << endl;

int main() {
  int a[10];
  for(int i = 0; i < 10; i++)
    a[i] = i; // Give it index values
  int* ip = a;
  P(*ip);
  P(*++ip);
  P(*(ip + 5));
  int* ip2 = ip + 5;
  P(*ip2);
  P(*(ip2 - 4));
  P(*--ip2);
  P(ip2 - ip); // Yields number of elements
} ///:~

It begins with another macro, but this one uses a preprocessor feature called stringizing (implemented with the ‘#’ sign before an expression) that takes any expression and turns it into a character array. This is quite convenient, since it allows the expression to be printed, followed by a colon, followed by the value of the expression. In main( ) you can see the useful shorthand that is produced.

Although pre- and postfix versions of ++ and -- are valid with pointers, only the prefix versions are used in this example because they are applied before the pointers are dereferenced in the expressions above, so they allow us to see the effects of the operations. Note that only integral values are being added and subtracted; if two pointers were combined this way the compiler would not allow it.

Here is the output of the program above:

*ip: 0
*++ip: 1
*(ip + 5): 6
*ip2: 6
*(ip2 - 4): 2
*--ip2: 5

In all cases, the pointer arithmetic results in the pointer being adjusted to point to the “right place,” based on the size of the elements being pointed to.

If pointer arithmetic seems a bit overwhelming at first, don’t worry. Most of the time you’ll only need to create arrays and index into them with [ ], and the most sophisticated pointer arithmetic you’ll usually need is ++ and --. Pointer arithmetic is generally reserved for more clever and complex programs, and many of the containers in the Standard C++ library hide most of these clever details so you don’t have to worry about them.

	The quality software developer.™ © 2003-2004 ruben\|labs corp. All Rights Reserved. Timp de generare a paginii: 17626 secunde Versiune site: 1.8 SP3 (build 2305-rtm.88542-10.2004)