Inapoi
Inainte
Cuprins
Composite type creation
The fundamental data types and their
variations are essential, but rather primitive. C and C++ provide tools that
allow you to compose more sophisticated data types from the fundamental data
types. As you’ll see, the most important of these is struct, which
is the foundation for class in C++. However, the simplest way to create
more sophisticated types is simply to alias a name to another name via
typedef.
Aliasing names with typedef
This keyword promises more than it
delivers: typedef suggests
“type definition” when “alias” would probably have been
a more accurate description, since that’s what it really does. The syntax
is:
typedef existing-type-description
alias-name
People often use typedef when data
types get slightly complicated, just to prevent extra keystrokes. Here is a
commonly-used typedef:
typedef unsigned long ulong;
Now if you say ulong the compiler
knows that you mean unsigned long. You might think that this could as
easily be accomplished using preprocessor substitution, but there are key
situations in which the compiler must be aware that you’re treating a name
as if it were a type, so typedef is essential.
One place where typedef comes in
handy is for pointer types. As previously mentioned, if you
say:
int* x, y;
This actually produces an int*
which is x and an int (not an int*) which is
y. That is, the ‘*’ binds to the right, not the left.
However, if you use a typedef:
typedef int* IntPtr;
IntPtr x, y;
Then both x and y are of
type int*.
You can argue that it’s more
explicit and therefore more readable to avoid typedefs for primitive
types, and indeed programs rapidly become difficult to read when many
typedefs are used. However, typedefs become especially important
in C when used with
struct.
Combining variables with struct
A
struct is a way to collect
a group of variables into a structure. Once you create a struct, then you
can make many instances of this “new” type of variable you’ve
invented. For example:
//: C03:SimpleStruct.cpp
struct Structure1 {
char c;
int i;
float f;
double d;
};
int main() {
struct Structure1 s1, s2;
s1.c = 'a'; // Select an element using a '.'
s1.i = 1;
s1.f = 3.14;
s1.d = 0.00093;
s2.c = 'a';
s2.i = 1;
s2.f = 3.14;
s2.d = 0.00093;
} ///:~
The struct declaration must end
with a semicolon. In main( ), two instances of Structure1 are
created: s1 and s2. Each of these has their own separate versions
of c, i, f, and d. So s1 and s2
represent clumps of completely independent variables. To select one of the
elements within s1 or s2, you use a ‘.’, syntax
you’ve seen in the previous chapter when using C++ class objects
– since classes evolved from structs, this is where that
syntax arose from.
One thing you’ll notice is the
awkwardness of the use of Structure1 (as it turns out, this is only
required by C, not C++). In C, you can’t just say Structure1 when
you’re defining variables, you must say struct Structure1. This is
where typedef becomes especially handy in C:
//: C03:SimpleStruct2.cpp
// Using typedef with struct
typedef struct {
char c;
int i;
float f;
double d;
} Structure2;
int main() {
Structure2 s1, s2;
s1.c = 'a';
s1.i = 1;
s1.f = 3.14;
s1.d = 0.00093;
s2.c = 'a';
s2.i = 1;
s2.f = 3.14;
s2.d = 0.00093;
} ///:~
By using
typedef in this way, you can pretend (in C; try
removing the typedef for C++) that Structure2 is a built-in type,
like int or float, when you define s1 and s2 (but
notice it only has data – characteristics – and does not include
behavior, which is what we get with real objects in C++). You’ll notice
that the struct identifier has been left off at the beginning, because
the goal is to create the typedef. However, there are times when you
might need to refer to the struct during its definition. In those cases,
you can actually repeat the name of the struct as the struct name
and as the typedef:
//: C03:SelfReferential.cpp
// Allowing a struct to refer to itself
typedef struct SelfReferential {
int i;
SelfReferential* sr; // Head spinning yet?
} SelfReferential;
int main() {
SelfReferential sr1, sr2;
sr1.sr = &sr2;
sr2.sr = &sr1;
sr1.i = 47;
sr2.i = 1024;
} ///:~
If you look at this for awhile,
you’ll see that sr1 and sr2 point to each other, as well as
each holding a piece of data.
Actually, the struct name does not
have to be the same as the typedef name, but it is usually done this way
as it tends to keep things simpler.
Pointers and
structs
In the examples above, all the
structs are manipulated as objects. However, like any piece of storage,
you can take the address of a struct object (as
seen in SelfReferential.cpp above). To select the elements of a
particular struct object, you use a ‘.’, as seen
above. However, if you have a pointer to a struct object, you must select
an element of that object using a different operator: the
‘->’. Here’s an example:
//: C03:SimpleStruct3.cpp
// Using pointers to structs
typedef struct Structure3 {
char c;
int i;
float f;
double d;
} Structure3;
int main() {
Structure3 s1, s2;
Structure3* sp = &s1;
sp->c = 'a';
sp->i = 1;
sp->f = 3.14;
sp->d = 0.00093;
sp = &s2; // Point to a different struct object
sp->c = 'a';
sp->i = 1;
sp->f = 3.14;
sp->d = 0.00093;
} ///:~
In main( ), the struct
pointer sp is initially pointing to s1, and the members of
s1 are initialized by selecting them with the ‘->’
(and you use this same operator in order to read those members). But then
sp is pointed to s2, and those variables are initialized the same
way. So you can see that another benefit of pointers is that they can be
dynamically redirected to point to different objects; this provides more
flexibility in your programming, as you will learn.
For now, that’s all you need to
know about structs, but you’ll become much more comfortable
with them (and especially their more potent successors, classes) as the
book
progresses.
Clarifying programs with enum
An enumerated data type is a way of
attaching names to numbers, thereby giving more meaning to anyone reading the
code. The enum keyword
(from C) automatically enumerates any list of identifiers you give it by
assigning them values of 0, 1, 2, etc. You can declare enum variables
(which are always represented as integral values). The declaration of an
enum looks similar to a struct declaration.
An enumerated data type is useful when
you want to keep track of some sort of feature:
//: C03:Enum.cpp
// Keeping track of shapes
enum ShapeType {
circle,
square,
rectangle
}; // Must end with a semicolon like a struct
int main() {
ShapeType shape = circle;
// Activities here....
// Now do something based on what the shape is:
switch(shape) {
case circle: /* circle stuff */ break;
case square: /* square stuff */ break;
case rectangle: /* rectangle stuff */ break;
}
} ///:~
shape is a variable of the
ShapeType enumerated data type, and its value is compared with the value
in the enumeration. Since shape is really just an int, however, it
can be any value an int can hold (including a negative number). You can
also compare an int variable with a value in the
enumeration.
You should be aware that the example
above of switching on type turns out to be a problematic way to program. C++ has
a much better way to code this sort of thing, the explanation of which must be
delayed until much later in the book.
If you don’t like the way the
compiler assigns values, you can do it yourself, like this:
enum ShapeType {
circle = 10, square = 20, rectangle = 50
};
If you give values to some names and not
to others, the compiler will use the next integral value. For
example,
enum snap { crackle = 25, pop };
The compiler gives pop the value
26.
You can see how much more readable the
code is when you use enumerated data types. However, to some degree this is
still an attempt (in C) to accomplish the things that we can do with a
class in C++, so you’ll see enum used less in
C++.
Type checking for enumerations
C’s enumerations
are fairly primitive, simply
associating integral values with names, but they provide no type checking. In
C++, as you may have come to expect by now, the concept of type is fundamental,
and this is true with enumerations. When you create a named enumeration, you
effectively create a new type just as you do with a class: The name of your
enumeration becomes a reserved word for the duration of that translation unit.
In addition, there’s stricter type
checking for enumerations in C++ than in C. You’ll notice this in
particular if you have an instance of an enumeration
color called a. In C
you can say a++, but in C++ you can’t. This is because incrementing
an enumeration is performing two type conversions, one of them legal in C++ and
one of them illegal. First, the value of the enumeration is implicitly cast from
a color to an int, then the value is incremented, then the
int is cast back into a color. In C++ this isn’t allowed,
because color is a distinct type and not equivalent to an int.
This makes sense, because how do you know the increment of blue will even
be in the list of colors? If you want to increment a color, then it
should be a class (with an increment operation) and not an enum, because
the class can be made to be much safer. Any time you write code that assumes an
implicit conversion to an enum type, the compiler will flag this
inherently dangerous activity.
Unions (described next)
have similar additional type
checking in
C++.
Saving memory with union
Sometimes a program will handle different
types of data using the same variable. In this situation, you have two choices:
you can create a struct containing all the possible different types you
might need to store, or you can use a union. A
union piles all the data into a single space; it figures out the amount
of space necessary for the largest item you’ve put in the union,
and makes that the size of the union. Use a union to save
memory.
Anytime you place a value in a
union, the value always starts in the same place at the beginning of the
union, but only uses as much space as is necessary. Thus, you create a
“super-variable” capable of holding any of the union
variables. All the addresses of the union variables are the same (in a
class or struct, the addresses are different).
Here’s a simple use of a
union. Try removing various elements and see what effect it has on the
size of the union. Notice that it makes no sense to declare more than one
instance of a single data type in a union (unless you’re just doing
it to use a different name).
//: C03:Union.cpp
// The size and simple use of a union
#include <iostream>
using namespace std;
union Packed { // Declaration similar to a class
char i;
short j;
int k;
long l;
float f;
double d;
// The union will be the size of a
// double, since that's the largest element
}; // Semicolon ends a union, like a struct
int main() {
cout << "sizeof(Packed) = "
<< sizeof(Packed) << endl;
Packed x;
x.i = 'c';
cout << x.i << endl;
x.d = 3.14159;
cout << x.d << endl;
} ///:~
The compiler performs the proper
assignment according to the union member you select.
Once you perform an assignment, the
compiler doesn’t care what you do with the union. In the example above,
you could assign a floating-point value to x:
x.f = 2.222;
and then send it to the output as if it
were an int:
cout << x.i;
This would produce
garbage.
Arrays
Arrays are a kind of
composite type because they allow you to clump a lot of
variables together, one right after the other, under a single identifier name.
If you say:
int a[10];
You create storage for 10 int
variables stacked on top of each other, but without unique identifier names for
each variable. Instead, they are all lumped under the name
a.
To access one of these array
elements, you use the same square-bracket syntax that you use to define an
array:
a[5] = 47;
However, you must remember that even
though the size of a is 10, you select array elements starting at
zero (this is sometimes called
zero indexing), so you can
select only the array elements 0-9, like this:
//: C03:Arrays.cpp
#include <iostream>
using namespace std;
int main() {
int a[10];
for(int i = 0; i < 10; i++) {
a[i] = i * 10;
cout << "a[" << i << "] = " << a[i] << endl;
}
} ///:~
Array access is extremely fast. However,
if you index past the end of the array, there is no safety net –
you’ll step on other variables. The other drawback is that you must define
the size of the array at compile time; if you want to change the size at runtime
you can’t do it with the syntax above (C does have a way to create an
array dynamically, but it’s significantly messier). The C++ vector,
introduced in the previous chapter, provides an array-like object that
automatically resizes itself, so it is usually a much better solution if your
array size cannot be known at compile time.
You can make an array of any type, even
of structs:
//: C03:StructArray.cpp
// An array of struct
typedef struct {
int i, j, k;
} ThreeDpoint;
int main() {
ThreeDpoint p[10];
for(int i = 0; i < 10; i++) {
p[i].i = i + 1;
p[i].j = i + 2;
p[i].k = i + 3;
}
} ///:~
Notice how the struct identifier
i is independent of the for loop’s
i.
To see that each element of an array is
contiguous with the next, you can print out the addresses like
this:
//: C03:ArrayAddresses.cpp
#include <iostream>
using namespace std;
int main() {
int a[10];
cout << "sizeof(int) = "<< sizeof(int) << endl;
for(int i = 0; i < 10; i++)
cout << "&a[" << i << "] = "
<< (long)&a[i] << endl;
} ///:~
When you run this program, you’ll
see that each element is one int size away from the previous one. That
is, they are stacked one on top of the other.
Pointers and arrays
The identifier of an array is unlike the
identifiers for ordinary variables. For one thing, an array identifier is not an
lvalue; you cannot assign to it. It’s really just a hook into the
square-bracket syntax, and when you give the name of an array, without square
brackets, what you get is the starting address of the array:
//: C03:ArrayIdentifier.cpp
#include <iostream>
using namespace std;
int main() {
int a[10];
cout << "a = " << a << endl;
cout << "&a[0] =" << &a[0] << endl;
} ///:~
When you run this program you’ll
see that the two addresses (which will be printed in hexadecimal, since there is
no cast to long) are the same.
So one way to look at the array
identifier is as a read-only pointer to the beginning of an array. And although
we can’t change the array identifier to point somewhere else, we
can create another pointer and use that to move around in the array. In
fact, the square-bracket syntax works with regular
pointers as well:
//: C03:PointersAndBrackets.cpp
int main() {
int a[10];
int* ip = a;
for(int i = 0; i < 10; i++)
ip[i] = i * 10;
} ///:~
The fact that naming an array produces
its starting address turns out to be quite important when you want to pass an
array to a function. If you declare an array as a function argument, what
you’re really declaring is a pointer. So in the following example,
func1( ) and func2( ) effectively have the same argument
lists:
//: C03:ArrayArguments.cpp
#include <iostream>
#include <string>
using namespace std;
void func1(int a[], int size) {
for(int i = 0; i < size; i++)
a[i] = i * i - i;
}
void func2(int* a, int size) {
for(int i = 0; i < size; i++)
a[i] = i * i + i;
}
void print(int a[], string name, int size) {
for(int i = 0; i < size; i++)
cout << name << "[" << i << "] = "
<< a[i] << endl;
}
int main() {
int a[5], b[5];
// Probably garbage values:
print(a, "a", 5);
print(b, "b", 5);
// Initialize the arrays:
func1(a, 5);
func1(b, 5);
print(a, "a", 5);
print(b, "b", 5);
// Notice the arrays are always modified:
func2(a, 5);
func2(b, 5);
print(a, "a", 5);
print(b, "b", 5);
} ///:~
Even though func1( ) and
func2( ) declare their arguments differently, the usage is the same
inside the function. There are some other issues that this example reveals:
arrays cannot be passed by
value[32], that is,
you never automatically get a local copy of the array
that you pass into a function. Thus, when you modify an array, you’re
always modifying the outside object. This can be a bit confusing at first, if
you’re expecting the pass-by-value provided with ordinary
arguments.
You’ll notice that
print( ) uses the square-bracket syntax for array arguments. Even
though the pointer syntax and the square-bracket syntax are effectively the same
when passing arrays as arguments, the square-bracket syntax makes it clearer to
the reader that you mean for this argument to be an array.
Also note that the size argument
is passed in each case. Just passing the address of an array isn’t enough
information; you must always be able to know how big the array is inside your
function, so you don’t run off the end of that array.
Arrays can be of any type, including
arrays of pointers. In fact, when
you want to pass command-line arguments into your program, C and C++ have a
special argument list for main( ), which looks like
this:
int main(int argc, char* argv[]) { // ...
The first argument is the number of
elements in the array, which is the second argument. The second argument is
always an array of char*, because the arguments are passed from the
command line as character arrays (and remember, an array can be passed only as a
pointer). Each whitespace-delimited cluster of characters on the command line is
turned into a separate array argument. The following program prints out all its
command-line arguments by stepping through the array:
//: C03:CommandLineArgs.cpp
#include <iostream>
using namespace std;
int main(int argc, char* argv[]) {
cout << "argc = " << argc << endl;
for(int i = 0; i < argc; i++)
cout << "argv[" << i << "] = "
<< argv[i] << endl;
} ///:~
You’ll notice that argv[0]
is the path and name of the program itself. This allows the program to discover
information about itself. It also adds one more to the array of program
arguments, so a common error when fetching
command-line arguments is to grab
argv[0] when you want argv[1].
You are not forced to use
argc and argv as
identifiers in main( ); those identifiers are only conventions (but
it will confuse people if you don’t use them). Also, there is an alternate
way to declare argv:
int main(int argc, char** argv) { // ...
Both forms are equivalent, but I find the
version used in this book to be the most intuitive when reading the code, since
it says, directly, “This is an array of character
pointers.”
All you get from the command-line is
character arrays; if you want to treat an argument as some other type, you are
responsible for converting it inside your program. To facilitate the
conversion to numbers, there are
some helper functions in the Standard C library, declared in
<cstdlib>. The
simplest ones to use are atoi( ),
atol( ), and
atof( ) to convert an ASCII character array
to an int, long, and double floating-point value,
respectively. Here’s an example using atoi( ) (the other two
functions are called the same way):
//: C03:ArgsToInts.cpp
// Converting command-line arguments to ints
#include <iostream>
#include <cstdlib>
using namespace std;
int main(int argc, char* argv[]) {
for(int i = 1; i < argc; i++)
cout << atoi(argv[i]) << endl;
} ///:~
In this program, you can put any number
of arguments on the command line. You’ll notice that the for loop
starts at the value 1 to skip over the program name at argv[0].
Also, if you put a floating-point number containing a decimal point on the
command line, atoi( ) takes only the digits up to the decimal point.
If you put non-numbers on the command line, these come back from
atoi( ) as zero.
Exploring floating-point format
The printBinary( ) function
introduced earlier in this chapter is handy for delving into the internal
structure of various data types. The most interesting of these is the
floating-point format that allows C and C++ to store numbers representing very
large and very small values in a limited amount of space. Although the details
can’t be completely exposed here, the bits inside of
floats and
doubles are divided into three regions: the
exponent, the mantissa, and the sign bit; thus it stores the values using
scientific notation. The following program allows you to play around by printing
out the binary patterns of various floating point numbers so you can deduce for
yourself the scheme used in your compiler’s floating-point format (usually
this is the IEEE standard for floating point numbers, but
your compiler may not follow that):
//: C03:FloatingAsBinary.cpp
//{L} printBinary
//{T} 3.14159
#include "printBinary.h"
#include <cstdlib>
#include <iostream>
using namespace std;
int main(int argc, char* argv[]) {
if(argc != 2) {
cout << "Must provide a number" << endl;
exit(1);
}
double d = atof(argv[1]);
unsigned char* cp =
reinterpret_cast<unsigned char*>(&d);
for(int i = sizeof(double); i > 0 ; i -= 2) {
printBinary(cp[i-1]);
printBinary(cp[i]);
}
} ///:~
First, the program guarantees that
you’ve given it an argument by checking the value of argc, which is
two if there’s a single argument (it’s one if there are no
arguments, since the program name is always the first element of argv).
If this fails, a message is printed and the Standard C Library function
exit( ) is called to terminate the program.
The program grabs the argument from the
command line and converts the characters to a double using
atof( ). Then the double is treated as an
array of bytes by taking the address and casting it to an unsigned char*.
Each of these bytes is passed to printBinary( ) for
display.
This example has been set up to print the
bytes in an order such that the sign bit appears first – on my machine.
Yours may be different, so you might want to re-arrange the way things are
printed. You should also be aware that floating-point formats are not trivial to
understand; for example, the exponent and mantissa are not generally arranged on
byte boundaries, but instead a number of bits is reserved for each one and they
are packed into the memory as tightly as possible. To truly see what’s
going on, you’d need to find out the size of each part of the number (sign
bits are always one bit, but exponents and mantissas are of differing sizes) and
print out the bits in each part separately.
Pointer arithmetic
If all you could do with a pointer that
points at an array is treat it as if it were an alias for that array, pointers
into arrays wouldn’t be very interesting. However, pointers are more
flexible than this, since they can be modified to point somewhere else (but
remember, the array identifier cannot be modified to point somewhere
else).
Pointer arithmetic refers to the
application of some of the arithmetic operators to pointers. The reason pointer
arithmetic is a separate subject from ordinary arithmetic is that pointers must
conform to special constraints in order to make them behave properly. For
example, a common operator to use with pointers is
++, which “adds one to the pointer.”
What this actually means is that the pointer is changed to move to “the
next value,” whatever that means. Here’s an
example:
//: C03:PointerIncrement.cpp
#include <iostream>
using namespace std;
int main() {
int i[10];
double d[10];
int* ip = i;
double* dp = d;
cout << "ip = " << (long)ip << endl;
ip++;
cout << "ip = " << (long)ip << endl;
cout << "dp = " << (long)dp << endl;
dp++;
cout << "dp = " << (long)dp << endl;
} ///:~
For one run on my machine, the output
is:
ip = 6684124
ip = 6684128
dp = 6684044
dp = 6684052
What’s interesting here is that
even though the operation ++ appears to be the same operation for both
the int* and the double*, you can see that the pointer has been
changed only 4 bytes for the int* but 8 bytes for the double*. Not
coincidentally, these are the sizes of int and double on my
machine. And that’s the trick of pointer arithmetic: the compiler figures
out the right amount to change the pointer so that it’s pointing to the
next element in the array (pointer arithmetic is only meaningful within arrays).
This even works with arrays of structs:
//: C03:PointerIncrement2.cpp
#include <iostream>
using namespace std;
typedef struct {
char c;
short s;
int i;
long l;
float f;
double d;
long double ld;
} Primitives;
int main() {
Primitives p[10];
Primitives* pp = p;
cout << "sizeof(Primitives) = "
<< sizeof(Primitives) << endl;
cout << "pp = " << (long)pp << endl;
pp++;
cout << "pp = " << (long)pp << endl;
} ///:~
The output for one run on my machine
was:
sizeof(Primitives) = 40
pp = 6683764
pp = 6683804
So you can see the compiler also does the
right thing for pointers to structs (and classes and
unions).
Pointer arithmetic also works with the
operators --, +, and
-, but the latter two operators are limited: you
cannot add two pointers, and if you subtract pointers the result is the number
of elements between the two pointers. However, you can add or subtract an
integral value and a pointer. Here’s an example demonstrating the use of
pointer arithmetic:
//: C03:PointerArithmetic.cpp
#include <iostream>
using namespace std;
#define P(EX) cout << #EX << ": " << EX << endl;
int main() {
int a[10];
for(int i = 0; i < 10; i++)
a[i] = i; // Give it index values
int* ip = a;
P(*ip);
P(*++ip);
P(*(ip + 5));
int* ip2 = ip + 5;
P(*ip2);
P(*(ip2 - 4));
P(*--ip2);
P(ip2 - ip); // Yields number of elements
} ///:~
It begins with another
macro, but this one uses a
preprocessor feature called
stringizing (implemented with the
‘#’ sign before an expression) that takes any expression and
turns it into a character array. This is quite convenient, since it allows the
expression to be printed, followed by a colon, followed by the value of the
expression. In main( ) you can see the useful shorthand that is
produced.
Although pre- and postfix versions of
++ and -- are valid with pointers, only the prefix versions are
used in this example because they are applied before the pointers are
dereferenced in the expressions above, so they allow us to see the effects of
the operations. Note that only integral values are being added and subtracted;
if two pointers were combined this way the compiler would not allow it.
Here is the output of the program
above:
*ip: 0
*++ip: 1
*(ip + 5): 6
*ip2: 6
*(ip2 - 4): 2
*--ip2: 5
In all cases, the pointer arithmetic
results in the pointer being adjusted to point to the “right place,”
based on the size of the elements being pointed to.
If pointer arithmetic seems a bit
overwhelming at first, don’t worry. Most of the time you’ll only
need to create arrays and index into them with [ ], and the most
sophisticated pointer arithmetic you’ll usually need is ++ and
--. Pointer arithmetic is generally reserved for more clever and complex
programs, and many of the containers in the Standard C++ library hide most of
these clever details so you don’t have to worry about
them.
 |
|