Separate compilation is particularly
important when building large projects. In C and C++, a
program can be created in small, manageable, independently tested pieces. The
most fundamental tool for breaking a program up into pieces is the ability to
create named subroutines or verdana. In C and C++, a subprogram is called a
function, and functions are the pieces of code
that can be placed in different files, enabling separate compilation. Put
another way, the function is the atomic unit of code, since you cannot have part
of a function in one file and another part in a different file; the entire
function must be placed in a single file (although files can and do contain more
than one function).
When you call a function, you typically
pass it some arguments, which are values you’d like the function to
work with during its execution. When the function is finished, you typically get
back a return value, a
value that the function hands back to you as a result. It’s also possible
to write functions that take no arguments and return no
values.
To create a program with multiple files,
functions in one file must access functions and data in other files. When
compiling a file, the C or C++ compiler must know about the functions and data
in the other files, in particular their names and proper usage. The compiler
ensures that functions and data are used correctly. This process of
“telling the compiler” the names of external functions and data and
what they should look like is called declaration.
Once you declare a function or variable, the compiler knows how to check to make
sure it is used
properly.
Declarations vs. definitions
It’s important to understand the
difference between declarations and
definitions because these terms will be used
precisely throughout the book. Essentially all C and C++ programs require
declarations. Before you can write your first program, you need to understand
the proper way to write a declaration.
A declaration introduces a name
– an identifier – to the compiler. It tells the compiler “This
function or this variable exists somewhere, and here is what it should look
like.” A definition, on the other hand, says: “Make this
variable here” or “Make this function here.” It allocates
storage for the name. This meaning works whether you’re talking about a
variable or a function; in either case, at the point of definition the compiler
allocates storage. For a variable, the compiler determines how big that variable
is and causes space to be generated in memory to hold the data for that
variable. For a function, the compiler generates code, which ends up occupying
storage in memory.
You can declare a variable or a function
in many different places, but there must be only one definition in C and C++
(this is sometimes called the ODR: one-definition
rule). When the linker is uniting all the object modules, it will usually
complain if it finds more than one definition for the same function or
variable.
A definition can also be a declaration.
If the compiler hasn’t seen the name x before and you define int
x;, the compiler sees the name as a declaration and allocates storage for it
all at once.
Function declaration
syntax
A function declaration in C and C++ gives
the function name, the argument types passed to the function, and the return
value of the function. For example, here is a declaration for a function called
func1( ) that takes two integer arguments (integers are denoted in
C/C++ with the keyword int) and returns an integer:
int func1(int,int);
The first keyword you see is the return
value all by itself: int. The arguments are enclosed in parentheses after
the function name in the order they are used. The semicolon indicates the end of
a statement; in this case, it tells the compiler “that’s all –
there is no function definition here!”
C and C++ declarations attempt to mimic
the form of the item’s use. For example, if a is another integer
the above function might be used this way:
a = func1(2,3);
Since func1( ) returns an
integer, the C or C++ compiler will check the use of func1( ) to
make sure that a can accept the return value and that the arguments are
appropriate.
Arguments in
function declarations may have names. The compiler ignores the names but they
can be helpful as mnemonic devices for the user. For example, we can declare
func1( ) in a different fashion that has the same
meaning:
int func1(int length, int width);
A gotcha
There is a significant difference between
C and C++ for functions with empty argument lists. In C, the
declaration:
int func2();
means “a function with any number
and type of argument.” This prevents type-checking,
so in C++ it means “a function with no arguments.”
Function definitions
Function definitions look like function
declarations except that they have bodies. A body is a
collection of statements enclosed in braces. Braces denote the beginning and
ending of a block of code. To give func1( ) a definition that is an
empty body (a body containing no code), write:
int func1(int length, int width) { }
Notice that in the function definition,
the braces replace the semicolon. Since braces surround a statement or group of
statements, you don’t need a semicolon. Notice also that the arguments in
the function definition must have names if you want to use the arguments in the
function body (since they are never used here, they are
optional).
Variable declaration syntax
The meaning attributed to the phrase
“variable declaration” has historically been confusing and
contradictory, and it’s important that you understand the correct
definition so you can read code properly. A variable declaration tells the
compiler what a variable looks like. It says, “I know you haven’t
seen this name before, but I promise it exists someplace, and it’s a
variable of X type.”
In a function declaration, you give a
type (the return value), the function name, the argument list, and a semicolon.
That’s enough for the compiler to figure out that it’s a declaration
and what the function should look like. By inference, a variable declaration
might be a type followed by a name. For example:
int a;
could declare the variable a as an
integer, using the logic above. Here’s the conflict: there is enough
information in the code above for the compiler to create space for an integer
called a, and that’s what happens. To resolve this dilemma, a
keyword was necessary for C and C++ to say “This is only a declaration;
it’s defined elsewhere.” The keyword is
extern. It can mean the
definition is external to the file, or that the definition occurs later
in the file.
Declaring a variable without defining it
means using the extern keyword before a description of the variable, like
this:
extern int a;
extern can also apply to function
declarations. For func1( ), it looks like this:
extern int func1(int length, int width);
This statement is equivalent to the
previous func1( ) declarations. Since there is no function body, the
compiler must treat it as a function declaration rather than a function
definition. The extern keyword is thus superfluous and optional for
function declarations. It is probably unfortunate that the designers of C did
not require the use of extern for function declarations; it would have
been more consistent and less confusing (but would have required more typing,
which probably explains the decision).
Here are some more examples of
declarations:
//: C02:Declare.cpp
// Declaration & definition examples
extern int i; // Declaration without definition
extern float f(float); // Function declaration
float b; // Declaration & definition
float f(float a) { // Definition
return a + 1.0;
}
int i; // Definition
int h(int x) { // Declaration & definition
return x + 1;
}
int main() {
b = 1.0;
i = 2;
f(b);
h(i);
} ///:~
In the function declarations, the
argument identifiers are optional. In the definitions, they are required (the
identifiers are required only in C, not C++).
Including headers
Most libraries contain significant
numbers of functions and variables. To save work and ensure consistency when
making the external declarations for these items, C and C++ use a device called
the header file. A header file is a file
containing the external declarations for a library; it conventionally has a file
name extension of ‘h’, such as headerfile.h. (You may also
see some older code using different extensions, such as .hxx or
.hpp, but this is becoming rare.)
The programmer who creates the library
provides the header file. To declare the functions and external variables in the
library, the user simply includes the header file. To include a header file, use
the #include
preprocessor
directive. This tells the preprocessor to open the named header file and insert
its contents where the #include statement appears. A #include may
name a file in two ways: in angle brackets (< >) or in double
quotes.
File names in angle brackets, such
as:
#include <header>
cause the preprocessor to search for the
file in a way that is particular to your implementation, but typically
there’s some kind of “include search path” that you specify in
your environment or on the compiler command line. The mechanism for setting the
search path varies between machines, operating systems, and C++ implementations,
and may require some investigation on your part.
File names in double quotes, such
as:
#include "local.h"
tell the preprocessor to search for the
file in (according to the specification) an “implementation-defined
way.” What this typically means is to search for the file relative to the
current directory. If the file is not found, then the include directive is
reprocessed as if it had angle brackets instead of quotes.
To include the iostream header file, you
write:
#include <iostream>
The preprocessor will find the iostream
header file (often in a subdirectory called “include”) and insert
it.
Standard C++ include
format
As C++ evolved, different compiler
vendors chose different extensions for file names. In addition, various
operating systems have different restrictions on file names, in particular on
name length. These issues caused source code portability problems. To smooth
over these rough edges, the standard uses a format that allows file names longer
than the notorious eight characters and eliminates the extension. For example,
instead of the old style of including iostream.h, which looks like
this:
#include <iostream.h>
#include <iostream>
The translator can implement the include
statements in a way that suits the needs of that particular compiler and
operating system, if necessary truncating the name and adding an extension. Of
course, you can also copy the headers given you by your compiler vendor to ones
without extensions if you want to use this style before a vendor has provided
support for it.
The libraries that have been inherited
from C are still available with the traditional ‘.h’
extension. However, you can also use them with the more modern C++ include style
by prepending a “c” before the name. Thus:
#include <stdio.h>
#include <stdlib.h>
#include <cstdio>
#include <cstdlib>
And so on, for all the Standard C
headers. This provides a nice distinction to the reader indicating when
you’re using C versus C++ libraries.
The effect of the new include format is
not identical to the old: using the .h gives you the older, non-template
version, and omitting the .h gives you the new templatized version.
You’ll usually have problems if you try to intermix the two forms in a
single
program.
Linking
The linker collects object modules (which
often use file name extensions like .o or .obj), generated by the
compiler, into an executable program the operating system can load and run. It
is the last phase of the compilation process.
Linker characteristics vary from system
to system. In general, you just tell the linker the names of the object modules
and libraries you want linked together, and the name of the executable, and it
goes to work. Some systems require you to invoke the linker yourself. With most
C++ packages you invoke the linker through the C++ compiler. In many situations,
the linker is invoked for you invisibly.
Some older linkers
won’t search object files
and libraries more than once, and they search through the list you give them
from left to right. This means that the order of object files and libraries can
be important. If you have a mysterious problem that doesn’t show up until
link time, one possibility is the order in which the files are given to the
linker.
Using libraries
Now that you know the basic terminology,
you can understand how to use a library. To use a library:
- Include the
library’s header
file.
- Use the
functions and variables in the
library.
- Link the
library into the executable
program.
These steps also
apply when the object modules aren’t combined into a library. Including a
header file and linking the object modules are the basic steps for separate
compilation in both C and C++.
How the linker searches a library
When you make an external reference to a
function or variable in C or C++, the linker, upon encountering this reference,
can do one of two things. If it has not already encountered the definition for
the function or variable, it adds the identifier to its list of
“unresolved
references.” If the linker
has already encountered the definition, the reference is
resolved.
If the linker cannot find the definition
in the list of object modules, it searches the libraries.
Libraries have some sort of indexing so the linker doesn’t need to look
through all the object modules in the library – it just looks in the
index. When the linker finds a definition in a library, the entire object
module, not just the function definition, is linked into the executable program.
Note that the whole library isn’t linked, just the object module in the
library that contains the definition you want (otherwise programs would be
unnecessarily large). If you want to minimize executable program size, you might
consider putting a single function in each source code file when you build your
own libraries. This requires more
editing[27],
but it can be helpful to the user.
Because the linker searches files in the
order you give them, you can pre-empt the use of a library function
by inserting a file with your own function, using the
same function name, into the list before the library name appears. Since the
linker will resolve any references to this function by using your function
before it searches the library, your function is used instead of the library
function. Note that this can also be a bug, and the kind of thing C++ namespaces
prevent.
Secret additions
When a C or C++ executable program is
created, certain items are secretly linked in. One of these is the startup
module, which contains initialization routines that must
be run any time a C or C++ program begins to execute. These routines set up the
stack and initialize certain variables in the program.
The linker always searches the standard
library for the compiled versions of any
“standard” functions called in the program. Because the standard
library is always searched, you can use anything in that library by simply
including the appropriate header file in your program; you don’t have to
tell it to search the standard library. The iostream functions, for example, are
in the Standard C++ library. To use them, you just include the
<iostream> header file.
If you are using an add-on library, you
must explicitly add the library name to the list of files handed to the
linker.
Using plain C libraries
Just because you are writing code in C++,
you are not prevented from using C library functions. In fact, the entire C
library is included by default into Standard C++. There has been a tremendous
amount of work done for you in these functions, so they can save you a lot of
time.
This book will use Standard C++ (and thus
also Standard C) library functions when convenient, but only standard
library functions will be used, to ensure the portability of programs. In the
few cases in which library functions must be used that are not in the C++
standard, all attempts will be made to use POSIX-compliant functions. POSIX is a
standard based on a Unix standardization effort that includes functions that go
beyond the scope of the C++ library. You can generally expect to find POSIX
functions on Unix (in particular, Linux) platforms, and often under DOS/Windows.
For example, if you’re using multithreading you are better off using the
POSIX thread library because your code will then be easier to understand, port
and maintain (and the POSIX thread library will usually just use the underlying
thread facilities of the operating system, if these are
provided).
 |
|