Modern C++ Type System

原创

roynzh 2013-01-30 16:02:37 ©著作权

文章标签 Visual C++ Data Types Cast 文章分类 数字化转型

©著作权归作者所有：来自51CTO博客作者roynzh的原创作品，请联系作者获取转载授权，否则将追究法律责任

Variable: The symbolic name of a quantity of data so that the name can be used to access the data it refers to throughout the scope of the code where it is defined. In C++, “variable” is generally used to refer to instances of scalar data types, whereas instances of other types are usually called “objects”.

Object: It refers to any instance of a class or structure, and when it is used in the general sense includes all types, even scalar variables.

POD type (plain old data): This informal category of data types in C++ refers to types that are scalar (see the Fundamental types section) or are POD classes. A POD class has no static data members that aren’t also PODs, and has no user-defined constructors, user-defined destructors, or user-defined assignment operators. Also, a POD class has no virtual functions, no base class, and no private or protected non-static data members. POD types are often used for external data interchange, for example with a module written in the C language (which has POD types only).

C++ is a strongly typed language and it is also statically-typed.

When you declare a variable in your code, you must either specify its type explicitly, or use the auto keyword to instruct the compiler to deduce the type from the initializer.
When you declare a function in your code, you must specify the type of each argument and its return value, or void if no value is returned by the function. The exception is when you are using function templates, which allow for arguments of arbitrary types.

After you first declare a variable, you cannot change its type at some later point. However, you can copy the variable’s value or a function’s return value into another variable of a different type. Such operations are called type conversions, which are sometimes necessary but are also potential sources of data loss or incorrectness.

When you declare a variable of POD type, we strongly recommend you initialize it. Until you initialize a variable, it has a "garbage" value that consists of whatever bits happened to be in that memory location previously. When declaring a variable of non-POD class type, the constructor handles initialization.

int result = 0;              // Declare and initialize an integer. 
double coefficient = 10.8;   // Declare and initialize a floating  
                             // point value. 
auto name = "Lady G.";       // Declare a variable and let compiler  
                             // deduce the type. 
auto address;                // error. Compiler cannot deduce a type  
                             // without an intializing value. 
age = 12;                    // error. Variable declaration must 
                             // specify a type or use auto! 
result = "Kenny G.";         // error. Can’t assign text to an int. 
string result = "zero";      // error. Can’t redefine a variable with 
                             // new type. 
int maxValue;                // Not recommended! maxValue contains  
                             // garbage bits until it is initialized.

1. Fundamental (Built-in) Types

Type	Size	Comment
int	4 bytes	The default choice for integral values.
double	8 bytes	The default choice for floating point values.
bool	1 byte	Represents values that can be either true or false.
char	1 byte	Use for ASCII characters in older C-style strings or std::string objects that will never have to be converted to UNICODE.
wchar_t	2 bytes	Represents "wide" character values that may be encoded in UNICODE format (UTF-16 on Windows, other operating systems may differ). This is the character type that is used in strings of type std::wstring.
unsigned char	1 byte	C++ has no built-in byte type. Use unsigned char to represent a byte value.
unsigned int	4 bytes	Default choice for bit flags.
long long	8 bytes	Represents very large integer values.

2. The Void Type

The void type is a special type; you cannot declare a variable of type void, but you can declare a variable of type void * (pointer to void), which is sometimes necessary when allocating raw (un-typed) memory. However, pointers to void are not type-safe and generally their use is strongly discouraged in modern C++. In a function declaration, a void return value means that the function does not return a value; this is a common and acceptable use of void. While the C language required functions that have zero parameters to declare void in the parameter list, for example, fou(void), this practice is discouraged in modern C++ and should be declared fou().

3. Const Type Qualifier

Any built-in or user-defined type may be qualified by the const keyword. Additionally, member functions may be const-qualified and even const-overloaded. The value of a const type cannot be modified after it is initialized.

4. String Types

Strictly speaking, the C++ language has no built-in “string” type; char and wchar_t store single characters – you must declare an array of these types to approximate a string, adding a terminating null value (for example, ASCII ‘\0’) to the array element one past the last valid character (also called a “C-style string”). C-style strings required much more code to be written or the use of external string utility library functions. But in modern C++, we have the Standard Library types std::string (for 8-bit char-type character strings) or std::wstring (for 16-bit wchar_t-type character strings). These STL containers can be thought of as native string types because they are part of the standard libraries that are included in any compliant C++ build environment. Simply use the #include <string> directive to make these types available in your program. (If you are using MFC or ATL, the CString class is also available, but is not part of the C++ standard.) The use of null-terminated character arrays (the C-style strings previously mentioned) is strongly discouraged in modern C++.

5. User-defined Types

When you define a class, struct, union, or enum, that construct is used in the rest of your code as if it were a fundamental type. It has a known size in memory, and certain rules about how it can be used apply to it for compile-time checking and, at runtime, for the life of your program. The primary differences between the fundamental built-in types and user-defined types are as follows:

The compiler has no built-in knowledge of a user-defined type. It “learns” of the type when it first encounters the definition during the compilation process.
You specify what operations can be performed on your type, and how it can be converted to other types, by defining (through overloading) the appropriate operators, either as class members or non-member functions.
They do not have to be statically typed (the rule that an object's type never changes). Through the mechanisms of inheritance and polymorphism, a variable declared as a user-defined type of class (referred to as an object instance of a class) might have a different type at run-time than at compile time.

6. Pointer Types

Dating back to the earliest versions of the C language, C++ continues to let you declare a variable of a pointer type by using the special declarator * (asterisk). A pointer type stores the address of the location in memory where the actual data value is stored. In modern C++, these are referred to as raw pointers, and are accessed in your code through special operators * (asterisk) or -> (dash with greater-than). This is called dereferencing, and which one that you use depends on whether you are dereferencing a pointer to a scalar or a pointer to a member in an object.

The first thing that you should know is declaring a raw pointer variable will allocate only the memory that is required to store an address of the memory location that the pointer will be referring to when it is dereferenced. Allocation of the memory for the data value itself (also called backing store) is not yet allocated. In other words, by declaring a raw pointer variable, you are creating a memory address variable, not an actual data variable. Dereferencing a pointer variable before making sure that it contains a valid address to a backing store will cause undefined usually a fatal error) in your program.

In practice, the backing store for pointers are most often user-defined types that are dynamically-allocated in an area of memory called the heap (or “free store”) by using a new keyword in C-style programming, the older malloc() C runtime library function was used). Once allocated, these “variables” are usually referred to as “objects”, especially if they are based on a class definition. Memory that is allocated with new must be deleted by a corresponding delete statement (or, if you used the malloc() function to allocate it, the C runtime function free()).

However, it is easy to forget to delete a dynamically-allocated object- especially in complex code, which causes a resource bug called a memory leak. For this reason, the use of raw pointers is strongly discouraged in modern C++. It is almost always better to wrap a raw pointer in a smart pointer, which will automatically release the memory when its destructor is invoked (when the code goes out of scope for the smart pointer); by using smart pointers you virtually eliminate a whole class of bugs in your C++ programs.

7. Windows Data Types

In classic Win32 programming for C and C++, most functions use Windows-specific typedefs and #define macros (defined in windef.h) to specify the types of parameters and return values. These “Windows data types” are mostly just special names (aliases) given to C/C++ built-in types. For a complete list of these typedefs and preprocessor definitions, see Windows Data Types. Some of these typedefs, such as HRESULT and LCID, are useful and descriptive. Others, such as INT, have no special meaning and are just aliases for fundamental C++ types. Other Windows data types have names that are retained from the days of C programming and 16-bit processors, and have no purpose or meaning on modern hardware or operating systems. There are also special data types associated with the Windows Runtime Library, listed as Windows Runtime base data types. In modern C++, the general guideline is to prefer the C++ fundamental types unless the Windows type communicates some additional meaning about how the value is to be interpreted.

8. Value VS. Reference Types

C++ classes are by default value types. They can be specified as reference types, which enable polymorphic behavior to support object-oriented programming. Value types are sometimes viewed from the perspective of memory and layout control, whereas reference types are about base classes and virtual functions for polymorphic purposes. By default, value types are copyable, which means there is always a copy constructor and a copy assignment operator. For reference types, you make the class non-copyable (disable the copy constructor and copy assignment operator) and use a virtual destructor, which supports their intended polymorphism. Value types are also about the contents, which, when they are copied, always give you two independent values that can be modified separately. Reference types are about identity – what kind of object is it? For this reason, "reference types" are also referred to as "polymorphic types".

If you really want a reference-like type (base class, virtual functions), you need to explicitly disable copying, as shown in the MyRefType class in the following code.

// cl /EHsc /nologo /W4 
 
class MyRefType { 
private: 
    MyRefType & operator=(const MyRefType &); 
    MyRefType(const MyRefType &); 
public: 
    MyRefType () {} 
}; 
 
int main() 
{ 
    MyRefType Data1, Data2; 
    // ... 
    Data1 = Data2; 
}

Compiling the above code will result in the following error:

test.cpp(15) : error C2248: 'MyRefType::operator =' : cannot access private member declared in class 'MyRefType'
        meow.cpp(5) : see declaration of 'MyRefType::operator ='
        meow.cpp(3) : see declaration of 'MyRefType'

9. Value Types And Move Efficiency

Copy allocation overhead is avoided due to new copy optimizations. For example, when you insert a string in the middle of a vector of strings, there will be no copy re-allocation overhead, only a move- even if it results in a grow of the vector itself. This also applies to other operations, for instance performing an add operation on two very large objects. How do you enable these value operation optimizations? In some C++ compilers, the compiler will enable this for you implicitly, much like copy constructors can be automatically generated by the compiler. However, in Visual C++, your class will need to "opt-in" to move assignment and constructors by declaring it in your class definition. This is accomplished by using the double ampersand (&&) rvalue reference in the appropriate member function declarations and defining move constructor and move assignment methods. You also need to insert the correct code to "steal the guts" out of the source object.

How do you decide if you need move enabled? If you already know you need copy construction enabled, you probably want move enabled if it can be cheaper than a deep copy. However, if you know you need move support, it doesn't necessarily mean you want copy enabled. This latter case would be called a "move-only type". An example already in the standard library is unique_ptr. As a side note, the old auto_ptr is deprecated, and was replaced by unique_ptr precisely due to the lack of move semantics support in the previous version of C++.

By using move semantics you can return-by-value or insert-in-middle. Move is an optimization of copy. There is need for heap allocation as a workaround. Consider the following pseudocode:

#include <set> 
#include <vector> 
#include <string> 
using namespace std; 
 
//... 
set<widget> LoadHugeData() { 
    set<widget> ret; 
    // ... load data from disk and populate ret 
    return ret; 
} 
//... 
widgets = LoadHugeData();   // efficient, no deep copy 
 
vector<string> v = IfIHadAMillionStrings(); 
v.insert( begin(v)+v.size()/2, "scott" );   // efficient, no deep copy-shuffle 
v.insert( begin(v)+v.size()/2, "Andrei" );  // (just 1M ptr/len assignments) 
//... 
HugeMatrix operator+(const HugeMatrix& , const HugeMatrix& ); 
HugeMatrix operator+(const HugeMatrix& ,       HugeMatrix&&); 
HugeMatrix operator+(      HugeMatrix&&, const HugeMatrix& ); 
HugeMatrix operator+(      HugeMatrix&&,       HugeMatrix&&); 
//... 
hm5 = hm1+hm2+hm3+hm4+hm5;   // efficient, no extra copies

10. Enabling Move For Appropriate Value Types

For a value-like class where move can be cheaper than a deep copy, enable move construction and move assignment for efficiency. Consider the following pseudocode:

#include <memory> 
#include <stdexcept> 
using namespace std; 
// ... 
class my_class { 
    unique_ptr<BigHugeData> data; 
public: 
    my_class( my_class&& other )   // move construction 
        : data( move( other.data ) ) { } 
    my_class& operator=( my_class&& other )   // move assignment 
    { data = move( other.data ); return *this; } 
    // ... 
    void method() {   // check (if appropriate) 
        if( !data )  
            throw std::runtime_error("RUNTIME ERROR: Insufficient resources!"); 
    } 
};

If you enable copy construction/assignment, also enable move construction/assignment if it can be cheaper than a deep copy.

Some non-value types are move-only, such as when you can’t clone a resource, only transfer ownership. Example: unique_ptr.

11. Implicit Type Conversions

When an expression contains operands of different built-in types, and no explicit casts are present, the compiler uses built-in standard conversions to convert one of the operands so that the types match. The compiler tries the conversions in a well-defined sequence until one succeeds. If the selected conversion is a promotion, the compiler does not issue a warning. If the conversion is a narrowing, the compiler issues a warning about possible data loss. Whether actual data loss occurs depends on the actual values involved, but we recommend that you treat this warning as an error. If a user-defined type is involved, then the compiler tries to use the conversions that you have specified in the class definition. If it can't find an acceptable conversion, the compiler issues an error and does not compile the program.

1) Widening Conversions (Promotion)

In a widening conversion, a value in a smaller variable is assigned to a larger variable with no loss of data. Because widening conversions are always safe, the compiler performs them silently and does not issue warnings. The following conversions are widening conversions.

From	To
Any signed or unsigned integral type except long long or __int64	double
bool or char	Any other built-in type
short or wchar_t	int , long, long long
int , long	long long
float	double

2) Narrowing Conversions (Coercion)

The compiler performs narrowing conversions implicitly, but it warns you about potential data loss. Take these warnings very seriously. If you are certain that no data loss will occur because the values in the larger variable will always fit in the smaller variable, then add an explicit cast so that the compiler will no longer issue a warning. If you are not sure that the conversion is safe, add to your code some kind of runtime check to handle possible data loss so that it does not cause your program to produce incorrect results.

Any conversion from a floating point type to an integral type is a narrowing conversion because the fractional portion of the floating point value is discarded and lost.

3) Signed- Unsigned Conversions

A signed integral type and its unsigned counterpart are always the same size, but they differ in how the bit pattern is interpreted for value transformation. The following code example demonstrates what happens when the same bit pattern is interpreted as a signed value and as an unsigned value. The bit pattern stored in both num and num2 never changes from what is shown in the earlier illustration.

using namespace std; 
unsigned short num = numeric_limits<unsigned short>::max(); // #include <limits> 
short num2 = num; 
cout << "unsigned val = " << num << " signed val = " << num2 << endl; 
// Prints: unsigned val = 65535 signed val = -1 
 
// Go the other way. 
num2 = -1; 
num = num2; 
cout << "unsigned val = " << num << " signed val = " << num2 << endl; 
// Prints: unsigned val = 65535 signed val = -1

Notice that values are reinterpreted in both directions. If your program produces odd results in which the sign of the value seems inverted from what you expect, look for implicit conversions between signed and unsigned integral types. In the following example, the result of the expression ( 0 – 1) is implicitly converted from int to unsigned int when it's stored in num. This causes the bit pattern to be reinterpreted.

unsigned int u3 = 0 - 1;  
cout << u3 << endl; // prints 4294967295

The compiler does not warn about implicit conversions between signed and unsigned integral types. Therefore, we recommend that you avoid signed-to-unsigned conversions altogether. If you can't avoid them, then add to your code a runtime check to detect whether the value being converted is greater than or equal to zero and less than or equal to the maximum value of the signed type. Values in this range will transfer from signed to unsigned or from unsigned to signed without being reinterpreted.

4) Pointer Conversions

In many expressions, a C-style array is implicitly converted to a pointer to the first element in the array, and constant conversions can happen silently. Although this is convenient, it's also potentially error-prone. For example, the following badly designed code example seems nonsensical, and yet it will compile in Visual C++ and produces a result of 'p'. First, the "Help" string constant literal is converted to a char* that points to the first element of the array; that pointer is then incremented by three elements so that it now points to the last element 'p'.

char* s = "Help" + 3;

12. Explicit Type Conversions (Cast)

By using a cast operation, you can instruct the compiler to convert a value of one type to another type. The compiler will raise an error in some cases if the two types are completely unrelated, but in other cases it will not raise an error even if the operation is not type-safe. Use casts sparingly because any conversion from one type to another is a potential source of program error. However, casts are sometimes required, and not all casts are equally dangerous. One effective use of a cast is when your code performs a narrowing conversion and you know that the conversion is not causing your program to produce incorrect results. In effect, this tells the compiler that you know what you are doing and to stop bothering you with warnings about it. Another use is to cast from a pointer-to-derived class to a pointer-to-base class. Another use is to cast away the const-ness of a variable to pass it to a function that requires a non-const argument. Most of these cast operations involve some risk.

In C-style programming, the same C-style cast operator is used for all kinds of casts.

(int) x; // old-style cast, old-style syntax 
int(x); // old-style cast, functional syntax

The C-style cast operator is identical to the call operator () and is therefore inconspicuous in code and easy to overlook. Both are bad because they're difficult to recognize at a glance or search for, and they're disparate enough to invoke any combination of static, const, and reinterpret_cast. Figuring out what an old-style cast actually does can be difficult and error-prone. For all these reasons, when a cast is required, we recommend that you use one of the following C++ cast operators, which in some cases are significantly more type-safe, and which express much more explicitly the programming intent.

1) static_cast

static_cast, for casts that are checked at compile time only. static_cast returns an error if the compiler detects that you are trying to cast between types that are completely incompatible. You can also use it to cast between pointer-to-base and pointer-to-derived, but the compiler can't always tell whether such conversions will be safe at runtime.

2) dynamic_cast

dynamic_cast, for safe, runtime-checked casts of pointer-to-base to pointer-to-derived. A dynamic_cast is safer than a static_cast for downcasts, but the runtime check incurs some overhead.

3) const_cast

const_cast, for casting away the const-ness of a variable, or converting a non-const variable to be const. Casting away const-ness by using this operator is just as error-prone as is using a C-style cast, except that with const-cast you are less likely to perform the cast accidentally. Sometimes you have to cast away the const-ness of a variable, for example, to pass a const variable to a function that takes a non-const parameter.

4) reinterpret_cast

reinterpret_cast, for casts between unrelated types such as pointer to int.

Modern C++ Type System_ Cast_02 Note