C++ Tokens
A token is the smallest element of a C++ program that is meaningful to the compiler.
The C++ parser recognizes these kinds of tokens: identifiers, keywords, literals,
operators, punctuators, and other separators. A stream of these tokens makes up a
translation unit.
Tokens are usually separated by "white space." White space can be one or
more:
- Blanks
- Horizontal or vertical tabs
- New lines
- Formfeeds
- Comments
Syntax
token : keyword, identifier, constant, operator, punctuator
preprocessing-token : header-name, identifier, pp-number, character-constant,
string-literal, operator, punctuator
each nonwhite-space character that cannot be one of the above
The parser separates tokens from the input stream by creating the longest token
possible using the input characters in a left-to-right scan. Consider this code fragment:
a = i+++j;
The programmer who wrote the code might have intended either of these two
statements:
a = i + (++j)
a = (i++) + j
Because the parser creates the longest token possible from the input stream, it chooses
the second interpretation, making the tokens i++, +, and j.
C++ Comments
A comment is text that the compiler ignores but that is useful for programmers.
Comments are normally used to annotate code for future reference. The compiler treats them
as white space. You can use comments in testing to make certain lines of code inactive;
however, #if/#endif preprocessor directives work better for this because you
can surround code that contains comments but you cannot nest comments.
A C++ comment is written in one of the following ways:
- The
/* (slash, asterisk) characters, followed by any sequence of characters
(including new lines), followed by the */ characters. This syntax is the same
as ANSI C.
- The
// (two slashes) characters, followed by any sequence of characters. A
new line not immediately preceded by a backslash terminates this form of comment.
Therefore, it is commonly called a "single-line comment."
The comment characters (/*, */, and //) have no
special meaning within a character constant, string literal, or comment. Comments using
the first syntax, therefore, cannot be nested. Consider this example:
/* Intent: Comment out this block of code.
Problem: Nested comments on each line of code are illegal.
Name-of-the-file = String( "helloWorld.dat" ); /* Initialize file string */
cout << "File: " << Name-of-the-file << "\n"; /* This Print status message*/
*/
The preceding code will not compile because the compiler scans the input stream from
the first /* to the first */ and considers it a comment. In this
case, the first */ occurs at the end of the Initialize file string
comment. The last */, then, is no longer paired with an opening /*.
Note that the single-line form (//) of a comment followed by the
line-continuation token (\) can have surprising effects. Consider this code:
#include <stdio.h>
void main()
{
printf( "This is a number %d", // \
5 );
}
After preprocessing, the preceding code contains errors and appears as follows:
#include <stdio.h>
void main()
{
printf( "This is a number %d", 10); /* This Print the line*/
}
C++ Identifiers
An identifier is a sequence of characters used to denote one of the following:
- Object or variable name
- Class, structure, or union name
- Enumerated type name
- Member of a class, structure, union, or enumeration
- Function or class-member function
- typedef
name
- Label name
- Macro name
- Macro parameter
Syntax
identifier : nondigit, identifier nondigit, identifier digit,
nondigit : one of A & a through Z &z
digit : one of 0 through 9
Microsoft Specific
Only the first 247 characters of Microsoft C++ identifiers are significant. This
restriction is complicated by the fact that names for user-defined types are
"decorated" by the compiler to preserve type information. The resultant name,
including the type information, cannot be longer than 247 characters.
Factors that can influence the length of a decorated identifier are: Whether the
identifier denotes an object of user-defined type or a type derived from a user-defined
type.
- Whether the identifier denotes a function or a type derived from a function.
- The number of arguments to a function.
END Microsoft Specific
The first character of an identifier must be an alphabetic character, either uppercase
or lowercase, or an underscore
( _ ). Because C++ identifiers are case sensitive, fileName is
different from FileName.
Identifiers cannot be exactly the same spelling and case as keywords. Identifiers that
contain keywords are legal. For example, Pint is a legal identifier, even
though it contains int, which is a keyword.
Use of two sequential underscore characters ( __ ) at the beginning of an
identifier, or a single leading underscore followed by a capital letter, is reserved for
C++ implementations in all scopes. You should avoid using one leading underscore followed
by a lowercase letter for names with file scope because of possible conflicts with current
or future reserved identifiers.
C++ Keywords
Keywords are predefined reserved identifiers that have special meanings. They cannot be
used as identifiers in your program. The following keywords are reserved for C++:
Syntax
keyword: one of
| asm1 |
auto |
bad_cast |
bad_typeid |
| bool |
break |
case |
catch |
| char |
class |
const |
const_cast |
| continue |
default |
delete |
do |
| double |
dynamic_cast |
else |
enum |
| except |
explicit |
extern |
false |
| finally |
float |
for |
friend |
| goto |
if |
inline |
int |
| long |
mutable |
namespace |
new |
| operator |
private |
protected |
public |
| register |
reinterpret_cast |
return |
short |
| signed |
sizeof |
static |
static_cast |
| struct |
switch |
template |
this |
| throw |
true |
try |
type_info |
| typedef |
typeid |
typename |
union |
| unsigned |
using |
virtual |
void |
| volatile |
while |
|
|
- Reserved for compatibility with other C++ implementations, but not implemented. Use
__asm.
Microsoft Specific
In Microsoft C++, identifiers with two leading underscores are reserved for compiler
implementations. Therefore, the Microsoft convention is to precede Microsoft-specific
keywords with double underscores. These words cannot be used as identifier names.
| allocate3 |
__inline |
property3 |
| __asm1 |
__int8 |
selectany3 |
| __based2 |
__int16 |
__single_inheritance |
| __cdecl |
__int32 |
__stdcall |
| __declspec |
__int64 |
thread3 |
| dllexport3 |
__leave |
__try |
| dllimport3 |
__multiple_inheritance |
uuid3 |
| __except |
naked3 |
__uuidof |
| __fastcall |
nothrow3 |
__virtual_inheritance |
| __finally |
|
|
- Replaces C++ asm syntax.
- The __based keyword has limited uses for 32-bit target compilations.
- These are special identifiers when used with __declspec; their use in other
contexts is not restricted.
Microsoft extensions are enabled by default. To ensure that your programs are fully
portable, you can disable Microsoft extensions by specifying the ANSI-compatible /Za
command-line option (compile for ANSI compatibility) during compilation. When you do this,
Microsoft-specific keywords are disabled.
When Microsoft extensions are enabled, you can use the previously-listed keywords in
your programs. For ANSI compliance, these keywords are prefaced by a double underscore.
For backward compatibility, single-underscore versions of all the keywords except __except,
__finally, __leave, and __try are supported. In addition, __cdecl
is available with no leading underscore.
C++ Punctuators
Punctuators in C++ have syntactic and semantic meaning to the compiler but do not, of
themselves, specify an operation that yields a value. Some punctuators, either alone or in
combination, can also be C++ operators or be significant to the preprocessor.
Syntax
punctuator : one of
! % ^ & * ( ) +
= { } | ~
[ ] \ ; ' : " < >
? , . / #
The punctuators [ ], ( ), and { } must appear in pairs after
translation phase 4.
Phases of Translation
C and C++ programs consist of one or more source files, each of which contains some of
the text of the program. A source file, together with its include files (files that are
included using the #include preprocessor directive) but not including sections of
code removed by conditional-compilation directives such as #if, is called a
"translation unit."
Source files can be translated at different times in fact, it is common to
translate only out-of-date files. The translated translation units can be kept either in
separate object files or in object-code libraries. These separate translation units are
then linked to form an executable program or a dynamic-link library (DLL).
Translation units can communicate using:
- Calls to functions that have external linkage.
- Calls to class member functions that have external linkage.
- Direct modification of objects that have external linkage.
- Direct modification of files.
- Interprocess communication (for Microsoft Windows-based applications only).
The following list describes the phases in which the compiler translates files:
Character mapping
Characters in the source file are mapped to the internal source representation.
Trigraph sequences are converted to single-character internal representation in this
phase.
Line splicing
All lines ending in a backslash (\) and immediately followed by a newline
character are joined with the next line in the source file, forming logical lines from the
physical lines. Unless it is empty, a source file must end in a newline character that is
not preceded by a backslash.
Tokenization
The source file is broken into preprocessing tokens and white-space characters.
Comments in the source file are replaced with one space character each. Newline characters
are retained.
Preprocessing
Preprocessing directives are executed and macros are expanded into the source file. The
#include statement invokes translation starting with the preceding three
translation steps on any included text.
Character-set mapping
All source-character-set members and escape sequences are converted to their
equivalents in the execution-character set. For Microsoft C and C++, both the source and
the execution character sets are ASCII.
String concatenation
All adjacent string and wide-string literals are concatenated. For example, "String
" "concatenation" becomes "String concatenation".
Translation
All tokens are analyzed syntactically and semantically; these tokens are converted into
object code.
Linkage
All external references are resolved to create an executable program or a dynamic-link
library.
The compiler issues warnings or errors during phases of translation in which it
encounters syntax errors.
The linker resolves all external references and creates an executable program or DLL by
combining one or more separately processed translation units along with standard
libraries.
C++ Operators
Operators specify an evaluation to be performed on one of the following:
- One operand (unary operator)
- Two operands (binary operator)
- Three operands (ternary operator)
The C++ language includes all C operators and adds several new operators. Table 1.1
lists the operators available in Microsoft C++.
Operators follow a strict precedence which defines the evaluation order of expressions
containing these operators. Operators associate with either the expression on their left
or the expression on their right; this is called "associativity." Operators in
the same group have equal precedence and are evaluated left to right in an expression
unless explicitly forced by a pair of parentheses, ( ). Table 1.1 shows the precedence and
associativity of C++ operators (from highest to lowest precedence).
C++ Operator Precedence and Associativity
| Operator |
Name or Meaning |
Associativity |
| :: |
Scope resolution |
None |
| :: |
Global |
None |
| [ ] |
Array subscript |
Left to right |
| ( ) |
Function call |
Left to right |
| ( ) |
Conversion |
None |
| . |
Member selection (object) |
Left to right |
| > |
Member selection (pointer) |
Left to right |
| ++ |
Postfix increment |
None |
| |
Postfix decrement |
None |
| new |
Allocate object |
None |
| delete |
Deallocate object |
None |
| delete[ ] |
Deallocate object |
None |
| ++ |
Prefix increment |
None |
| |
Prefix decrement |
None |
| * |
Dereference |
None |
| & |
Address-of |
None |
| + |
Unary plus |
None |
| |
Arithmetic negation (unary) |
None |
| ! |
Logical NOT |
None |
| ~ |
Bitwise complement |
None |
| sizeof |
Size of object |
None |
| sizeof ( ) |
Size of type |
None |
| typeid( ) |
type name |
None |
| (type) |
Type cast (conversion) |
Right to left |
| const_cast |
Type cast (conversion) |
None |
| dynamic_cast |
Type cast (conversion) |
None |
| reinterpret_cast |
Type cast (conversion) |
None |
| static_cast |
Type cast (conversion) |
None |
| .* |
Apply pointer to class member (objects) |
Left to right |
| >* |
Dereference pointer to class member |
Left to right |
| * |
Multiplication |
Left to right |
| / |
Division |
Left to right |
| % |
Remainder (modulus) |
Left to right |
| + |
Addition |
Left to right |
| |
Subtraction |
Left to right |
| << |
Left shift |
Left to right |
| >> |
Right shift |
Left to right |
| < |
Less than |
Left to right |
| > |
Greater than |
Left to right |
| <= |
Less than or equal to |
Left to right |
| >= |
Greater than or equal to |
Left to right |
| == |
Equality |
Left to right |
| != |
Inequality |
Left to right |
| & |
Bitwise AND |
Left to right |
| ^ |
Bitwise exclusive OR |
Left to right |
| | |
Bitwise OR |
Left to right |
| && |
Logical AND |
Left to right |
| || |
Logical OR |
Left to right |
| e1?e2:e3 |
Conditional |
Right to left |
| = |
Assignment |
Right to left |
| *= |
Multiplication assignment |
Right to left |
| /= |
Division assignment |
Right to left |
| %= |
Modulus assignment |
Right to left |
| += |
Addition assignment |
Right to left |
| = |
Subtraction assignment |
Right to left |
| <<= |
Left-shift assignment |
Right to left |
| >>= |
Right-shift assignment |
Right to left |
| &= |
Bitwise AND assignment |
Right to left |
| |= |
Bitwise inclusive OR assignment |
Right to left |
| ^= |
Bitwise exclusive OR assignment |
Right to left |
| , |
Comma |
Left to right |
Literals
Invariant program elements are called "literals" or "constants."
The terms "literal" and "constant" are used interchangeably here.
Literals fall into four major categories: integer, character, floating-point, and string
literals.
Syntax
literal :
integer-constant
character-constant
floating-constant
string-literal
C++ Integer Constants
Integer constants are constant data elements that have no fractional parts or
exponents. They always begin with a digit. You can specify integer constants in decimal,
octal, or hexadecimal form. They can specify signed or unsigned types and long or short
types.
Syntax
integer-constant :
decimal-constant integer-suffixopt
octal-constant integer-suffixopt
hexadecimal-constant integer-suffixopt
'c-char-sequence'
decimal-constant :
nonzero-digit
decimal-constant digit
octal-constant :
0
octal-constant octal-digit
hexadecimal-constant :
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-constant hexadecimal-digit
nonzero-digit : one of
1 2 3 4 5 6 7 8 9
octal-digit : one of
0 1 2 3 4 5 6 7
hexadecimal-digit : one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
integer-suffix :
unsigned-suffix long-suffixopt
long-suffix unsigned-suffixopt
unsigned-suffix : one of
u U
long-suffix : one of
l L
64-bit integer-suffix :
i64
To specify integer constants using octal or hexadecimal notation, use a prefix that
denotes the base. To specify an integer constant of a given integral type, use a suffix
that denotes the type.
To specify a decimal constant, begin the specification with a nonzero digit. For
example:
- int i = 157; // Decimal constant int j = 0198; // Not a decimal number; erroneous octal
constant int k = 0365; // Leading zero specifies octal constant, not decimal
To specify an octal constant, begin the specification with 0, followed by a sequence of
digits in the range 0 through 7. The digits 8 and 9 are errors in specifying an octal
constant. For example:
- int i = 0377; // Octal constant int j = 0397; // Error: 9 is not an octal digit
To specify a hexadecimal constant, begin the specification with 0x or 0X
(the case of the "x" does not matter), followed by a sequence of digits in the
range 0 through 9 and a (or A) through
f (or F). Hexadecimal digits a (or A)
through f (or F) represent values in the range 10 through 15.
For example:
- int i = 0x3fff; // Hexadecimal constant int j = 0X3FFF; // Equal to i
To specify an unsigned type, use either the u or U suffix. To specify a
long type, use either the l or L suffix. For example:
- unsigned uVal = 328u; // Unsigned value long lVal = 0x7FFFFFL; // Long value specified
// as hex constant unsigned long ulVal = 0776745ul; // Unsigned long value
C++ Character Constants
Character constants are one or more members of the "source character set,"
the character set in which a program is written, surrounded by single quotation marks (').
They are used to represent characters in the "execution character set," the
character set on the machine where the program executes.
Microsoft Specific
For Microsoft C++, the source and execution character sets are both ASCII.
END Microsoft Specific
There are three kinds of character constants:
- Normal character constants
- Multicharacter constants
- Wide-character constants
Note Use wide-character constants in place of multicharacter constants to ensure
portability.
Character constants are specified as one or more characters enclosed in single
quotation marks. For example:
char ch = 'x'; // Specify normal character constant.
int mbch = 'ab'; // Specify system-dependent
// multicharacter constant.
wchar_t wcch = L'ab'; // Specify wide-character constant.
Note that mbch is of type int. If it were declared as type char,
the second byte would not be retained. A multicharacter constant has four meaningful
characters; specifying more than four generates an error message.
Syntax
character-constant :
'c-char-sequence'
L'c-char-sequence'
c-char-sequence :
c-char
c-char-sequence c-char
c-char :
any member of the source character set except the single quotation mark ('),
backslash (\), or newline character
escape-sequence
escape-sequence :
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
simple-escape-sequence : one of
\' \" \? \\
\a \b \f \n \r \t \v
octal-escape-sequence :
\octal-digit
\octal-digit octal-digit
\octal-digit octal-digit octal-digit
hexadecimal-escape-sequence :
\xhexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
Microsoft C++ supports normal, multicharacter, and wide-character constants. Use
wide-character constants to specify members of the extended execution character set (for
example, to support an international application). Normal character constants have type char,
multicharacter constants have type int, and wide-character constants have type wchar_t.
(The type wchar_t is defined in the standard include files STDDEF.H, STDLIB.H, and
STRING.H. The wide-character functions, however, are prototyped only in STDLIB.H.)
The only difference in specification between normal and wide-character constants is
that wide-character constants are preceded by the letter L. For example:
char schar = 'x'; // Normal character constant
wchar_t wchar = L'\x81\x19'; // Wide-character constant
Table 1.2 shows reserved or nongraphic characters that are system dependent or not
allowed within character constants. These characters should be represented with escape
sequences.
C++ Reserved or Nongraphic Characters
| Character |
ASCII Representation |
ASCII Value |
Escape Sequence |
| Newline |
NL (LF) |
10 or 0x0a |
\n |
| Horizontal tab |
HT |
9 |
\t |
| Vertical tab |
VT |
11 or 0x0b |
\v |
| Backspace |
BS |
8 |
\b |
| Carriage return |
CR |
13 or 0x0d |
\r |
| Formfeed |
FF |
12 or 0x0c |
\f |
| Alert |
BEL |
7 |
\a |
| Backslash |
\ |
92 or 0x5c |
\\ |
| Question mark |
? |
63 or 0x3f |
\? |
| Single quotation mark |
' |
39 or 0x27 |
\' |
| Double quotation mark |
" |
34 or 0x22 |
\" |
| Octal number |
ooo |
|
\ooo |
| Hexadecimal number |
hhh |
|
\xhhh |
| Null character |
NUL |
0 |
\0 |
If the character following the backslash does not specify a legal escape sequence, the
result is implementation defined. In Microsoft C++, the character following the backslash
is taken literally, as though the escape were not present, and a level 1 warning
("unrecognized character escape sequence") is issued.
Octal escape sequences, specified in the form \ooo, consist of a backslash and
one, two, or three octal characters. Hexadecimal escape sequences, specified in the form
\xhhh, consist of the characters \x followed by a sequence of
hexadecimal digits. Unlike octal escape constants, there is no limit on the number of
hexadecimal digits in an escape sequence.
Octal escape sequences are terminated by the first character that is not an octal
digit, or when three characters are seen.
For example:
wchar_t och = L'\076a'; // Sequence terminates at a
char ch = '\233'; // Sequence terminates after 3 characters
Similarly, hexadecimal escape sequences terminate at the first character that is not a
hexadecimal digit. Because hexadecimal digits include the letters a through f
(and A through F), make sure the escape sequence terminates at
the intended digit.
Because the single quotation mark (') encloses character constants, use
the escape sequence \' to represent enclosed single quotation marks. The
double quotation mark (") can be represented without an escape sequence.
The backslash character (\) is a line-continuation character when placed at the end of a
line. If you want a backslash character to appear within a character constant, you must
type two backslashes in a row (\\).
C++ Floating-Point Constants
Floating-point constants specify values that must have a fractional part. These values
contain decimal points (.) and can contain exponents.
Syntax
floating-constant :
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
fractional-constant :
digit-sequenceopt . digit-sequence
digit-sequence .
exponent-part :
e signopt digit-sequence
E signopt digit-sequence
sign : one of
+
digit-sequence :
digit
digit-sequence digit
floating-suffix :one of
f l F L
Floating-point constants have a "mantissa," which specifies the value of the
number, an "exponent," which specifies the magnitude of the number, and an
optional suffix that specifies the constants type. The mantissa is specified as a
sequence of digits followed by a period, followed by an optional sequence of digits
representing the fractional part of the number. For example:
18.46
38.
The exponent, if present, specifies the magnitude of the number as a power of 10, as
shown in the following example:
18.46e0 // 18.46
18.46e1 // 184.6
If an exponent is present, the trailing decimal point is unnecessary in whole numbers
such as 18E0.
Floating-point constants default to type double. By using the suffixes f
or l (or F or L the suffix is not case sensitive), the
constant can be specified as float or long double, respectively.
Although long double and double have the same representation, they are
not the same type. For example, you can have overloaded functions like
void func( double );
and
void func( long double );
C++ String Literals
A string literal consists of zero or more characters from the source character set
surrounded by double quotation marks ("). A string literal represents a
sequence of characters that, taken together, form a null-terminated string.
Syntax
string-literal :
"s-char-sequenceopt"
L"s-char-sequenceopt"
s-char-sequence :
s-char
s-char-sequence s-char
s-char :
any member of the source character set except the double quotation mark ("),
backslash (\), or newline character
escape-sequence
C++ strings have these types:
- Array of char[n], where n is the length of the string (in
characters) plus 1 for the terminating '\0' that marks the end of the string
- Array of wchar_t, for wide-character strings
The result of modifying a string constant is undefined. For example:
char *szStr = "1234";
szStr[2] = 'A'; // Results undefined
Microsoft Specific
In some cases, identical string literals can be "pooled" to save space in the
executable file. In string-literal pooling, the compiler causes all references to a
particular string literal to point to the same location in memory, instead of having each
reference point to a separate instance of the string literal. The /Gf compiler option
enables string pooling.
END Microsoft Specific
When specifying string literals, adjacent strings are concatenated. Therefore, this
declaration:
char szStr[] = "12" "34";
is identical to this declaration:
char szStr[] = "1234";
This concatenation of adjacent strings makes it easy to specify long strings
across multiple lines:
cout << "Four score and seven years "
"ago, our forefathers brought forth "
"upon this continent a new nation.";
In the preceding example, the entire string Four score and seven years ago, our
forefathers brought forth upon this continent a new nation. is spliced together.
This string can also be specified using line splicing as follows:
cout << "Four score and seven years \
ago, our forefathers brought forth \
upon this continent a new nation.";
After all adjacent strings in the constant have been concatenated, the NULL
character, '\0', is appended to provide an end-of-string marker for C
string-handling functions.
When the first string contains an escape character, string concatenation can yield
surprising results. Consider the following two declarations:
char szStr1[] = "\01" "23";
char szStr2[] = "\0123";
Microsoft Specific
The maximum length of a string literal is approximately 2,048 bytes. This limit applies
to strings of type char[] and wchar_t[]. If a string literal consists of
parts enclosed in double quotation marks, the preprocessor concatenates the parts into a
single string, and for each line concatenated, it adds an extra byte to the total number
of bytes.
For example, suppose a string consists of 40 lines with 50 characters per line (2,000
characters), and one line with 7 characters, and each line is surrounded by double
quotation marks. This adds up to 2,007 bytes plus one byte for the terminating null
character, for a total of 2,008 bytes. On concatenation, an extra character is added to
the total number of bytes for each of the first 40 lines. This makes a total of 2,048
bytes. (The extra characters are not actually written to the string.) Note, however, that
if line continuations (\) are used instead of double quotation marks, the preprocessor
does not add an extra character for each line.
END Microsoft Specific
Determine the size of string objects by counting the number of characters and adding 1
for the terminating '\0' or 2 for type wchar_t.
Because the double quotation mark (") encloses strings, use the
escape sequence (\") to represent enclosed double quotation marks. The
single quotation mark (') can be represented without an escape sequence. The
backslash character (\) is a line-continuation character when placed at the
end of a line. If you want a backslash character to appear within a string, you must type
two backslashes (\\).
To specify a string of type wide-character (wchar_t[]), precede the opening
double quotation mark with the character L. For example:
wchar_t wszStr[] = L"1a1g";
All normal escape codes listed in Character Constants are valid in string constants.
For example:
cout << "First line\nSecond line";
cout << "Error! Take corrective action\a";
Because the escape code terminates at the first character that is not a hexadecimal
digit, specification of string constants with embedded hexadecimal escape codes can cause
unexpected results. The following example is intended to create a string literal
containing ASCII 5, followed by the characters five:
\x05five"
The actual result is a hexadecimal 5F, which is the ASCII code for an underscore,
followed by the characters ive. The following example produces the desired
results:
"\005five" // Use octal constant.
"\x05" "five" // Use string splicing.
Terms
| Term |
Meaning |
| Declaration |
A declaration introduces names and their types
into a program without necessarily defining an associated object or function. However,
many declarations serve as definitions. |
| Definition |
A definition provides information that allows the
compiler to allocate memory for objects or generate code for functions. |
| Lifetime |
The lifetime of an object is the period during
which an object exists, including its creation and destruction. |
| Linkage |
Names can have external linkage, internal linkage,
or no linkage. Within a program (a set of translation units), only names with external
linkage denote the same object or function. Within a translation unit, names with either
internal or external linkage denote the same object or function (except when functions are
overloaded). (For more information on translation units, see |
| Phases of
Translation |
), in the Preprocessor Reference.) Names
with no linkage denote unique objects or functions. |
| Name |
A name denotes an object, function, set of
overloaded functions, enumerator, type, class member, template, value, or label. C++
programs use names to refer to their associated language element. Names can be type names
or identifiers. |
| Object |
- An object is an instance (a data item) of a user-defined type (a class type). The
difference between an object and a variable is that variables retain state information,
whereas objects can also have behavior.
|
- This manual draws a distinction between objects and variables: "object" means
instance of a user-defined type, whereas "variable" means instance of a
fundamental type.
|
- In cases where either object or variable is applicable, the term "object" is
used as the inclusive term, meaning "object or variable."
|
| Scope |
Names can be used only within specific regions of
program text. These regions are called the scope of the name. |
| Storage class |
The storage class of a named object determines
its lifetime, initialization, and, in certain cases, its linkage. |
| Type |
Names have associated types that determine the
meaning of the value or values stored in an object or returned by a function. |
| Variable |
A variable is a data item of a |
| Fundamental type |
(for example, int, float, or double).
Variables store state information but define no behavior for how that information is
handled. See the preceding list item "Object" for information about how the
terms "variable" and "object" are used in this documentation. |
Program Startup: the main Function
A special function called main is the entry point to all C++ programs. This function is
not predefined by the compiler; rather, it must be supplied in the program text. If you
are writing code that adheres to the Unicode programming model, you can use the
wide-character version of main, wmain. The declaration syntax for main is:
int main( );
or, optionally:
int main( int argc[ , char *argv[ ] [, char *envp[ ] ] ] );
The declaration syntax for wmain is as follows:
int wmain( );
or, optionally:
int wmain( int argc[ , wchar_t *argv[ ] [, wchar_t *envp[ ] ] ] );
Alternatively, the main and wmain functions can be declared as returning void (no
return value). If you declare main or wmain as returning void, you cannot return an exit
code to the parent process or operating system using a return statement; to return an exit
code when main or wmain are declared as void, you must use the exit function.
Constructors
A member function with the same name as its class is a constructor function.
Constructors cannot return values, even if they have return statements. Specifying
a constructor with a return type is an error, as is taking the address of a constructor.
If a class has a constructor, each object of that type is initialized with the
constructor prior to use in a program. Constructors are called at the point an object is
created. Objects are created as:
- Global (file-scoped or externally linked) objects.
- Local objects, within a function or smaller enclosing block.
- Dynamic objects, using the new operator. The new operator allocates an
object on the program heap or "free store."
- Temporary objects created by explicitly calling a constructor. Temporary objects created
implicitly by the compiler.
- Data members of another class. Creating objects of class type, where the class type is
composed of other class-type variables, causes each object in the class to be created.
- Base class subobject of a class. Creating objects of derived class type causes the base
class components to be created.
Initializing Aggregates
An aggregate type is an array, class, or structure type which:
- Has no constructors
- Has no nonpublic members
- Has no base classes
- Has no virtual functions
Initializers for aggregates can be specified as a comma-separated list of values
enclosed in curly braces. For example, this code declares an int array of 10 and
initializes it:
int rgiArray[10] = { 9, 8, 4, 6, 5, 6, 3, 5, 6, 11 };
The initializers are stored in the array elements in increasing subscript order.
Therefore, rgiArray[0] is 9, rgiArray[1] is 8, and so on, until rgiArray[9],
which is 11. To initialize a structure, use code such as:
struct RCPrompt
{
short nRow;
short nCol;
char *szPrompt;
};
RCPrompt rcContinueYN = { 24, 0, "Continue (Y/N?)" };
Temporary Objects
In some cases, it is necessary for the compiler to create temporary objects. These
temporary objects can be created for the following reasons:
- To initialize a const reference with an initializer of a type different from that
of the underlying type of the reference being initialized.
- To store the return value of a function that returns a user-defined type. These
temporaries are created only if your program does not copy the return value to an object.
For example:
UDT Func1(); // Declare a function that returns a user-defined
// type.
...
Func1(); // Call Func1, but discard return value.
// A temporary object is created to store the return
// value.
Because the return value is not copied to another object, a temporary object is
created. A more common case where temporaries are created is during the evaluation of an
expression where overloaded operator functions must be called. These overloaded operator
functions return a user-defined type that often is not copied to another object.
Consider the expression ComplexResult = Complex1 + Complex2 + Complex3.
The expression Complex1 + Complex2 is evaluated, and the result is stored in
a temporary object. Next, the expression temporary + Complex3 is
evaluated, and the result is copied to ComplexResult (assuming the assignment
operator is not overloaded).
- To store the result of a cast to a user-defined type. When an object of a given type is
explicitly converted to a user-defined type, that new object is constructed as a temporary
object.
Temporary objects have a lifetime that is defined by their point of creation and the
point at which they are destroyed. Any expression that creates more than one temporary
object eventually destroys them in the reverse order in which they were created. The
points at which destruction occurs are shown in Table 11.3.
Destruction Points for Temporary Objects
| Reason Temporary Created |
Destruction Point |
| Result of expression |
All temporaries created as a result of expression
evaluation are destroyed at the end of the expression statement (that is, at the
semicolon), or at the end of the controlling expressions for for, if, while,
do, and switch statements. |
| evaluation Result of expressions using
the built-in (not overloaded)
logical operators (|| and &&) |
Immediately after the right operand. At this
destruction point, all temporary objects created by evaluation of the right operand are
destroyed. |
| Initializing const references |
If an initializer is not an l-value of the same
type as the reference being initialized, a temporary of the underlying object type is
created and initialized with the initialization expression. This temporary object is
destroyed immediately after the reference object to which it is bound is destroyed. |
|