Value Type Representation Between the Original and Revised C++
For the work I’ve been engaged in currently in machine translation of the original language design [thing1] to the revised design of the language [thing2], I have been variously making stabs at understanding the possible usages of a managed Value type [V] and pointer modifications of that type [V*, __box V*]. Artur Laksberg and Mahesh Hariharan have both provided much helpful feedback.
Here is the canonical trivial value type used in the thing1 language spec:
__value struct V { int i; };
__gc struct R { V vr; }
In V1, we can have four syntactic variants of a value type [where forms 2 and 3 are the same semantically]:
- V v = { 0 };
- V *pv = 0;
- V __gc *pvgc = 0; // Form (2) is an implicit form of (3)
- __box V* pvbx = 0; // must be local
Form (1) is the canonical value object, and it is reasonably well understood, except when someone attempts to invoke an inherited virtual method such as ToString(). For example,
v.ToString(); // error!
In order to invoke this method, the compiler must have access to the associated virtual table of the base class. Because value types are in-state storage without an associated vptr, this requires that v be boxed. In thing1, implicit boxing is not supported but must be explicitly specified by the programmer, as in
__box( v )->ToString(); // thing1: note the arrow
The primary motive behind this design was pedagogical: it wished to make the underlying mechanism visible to the programmer so that she would understand the `cost’ of not providing an instance within her value type. Were V to contain an instance of ToString, the implicit boxing would not be necessary.
In thing2 [yes, referring to the two languages in this way is annoying, isn’t it?], the implicit boxing is carried out transparently:
v.ToString(); // thing2
but at the cost of possibly encouraging the class designer to introduce an instance of ToString within V. The reason the implicit boxing is preferred is because while there is usually one class designer, there are an unlimited number of users, none of whom would have the freedom to modify V to eliminate the possibly onerous explicit box.
Another difference with a value type between thing1 and thing2 is the removal of support for a default constructor. [It has been explained to me that this is because there are instances in which the CLR can create an instance of the value type without invoking the associated default constructor. That is, the thing1 addition of support of a default constructor within a value type cannot be guaranteed. Given that absence of guarantee, it was felt to be better to drop the support altogether rather than have it be non-deterministic in its application.]
This is not as bad as it might seem because each object of a value type is zeroed out automatically, so that the members of a local instance are not undefined. This also meant that in thing1 a default constructor that simply zeroed out its members was being redundant. The problem is that a non-trivial default constructor in a thing1 program has no mechanical mapping to thing2. The code within the constructor will need to be migrated into a named init function that would then be explicitly invoked by the user.
The declaration of a value type object within thing2 is otherwise unchanged. [Which means there is still no support for a destructor within a value type. When you couple that with the continued requirement that non-POD native classes be pointer members within the value type, this makes the use of a value type for wrapping non-POD native classes virtually useless.]
Forms (2) and (3) can address nearly anything in this world or the next [that is, anything managed or native]. So, for example, all the following are permitted in thing1:
R* r;
pv = &v; // address a value type on the stack
pv = __nogc new V; // address a value type on native heap
pv = pvgc; // we are not sure what this addresses
pv = pvbx; // address a boxed value type on managed heap
pv = &r->vr; // an interior pointer to value type within a
// reference type on the managed heap
So, a V* can address a location within an activation record [and therefore can be dangling] or global data segment, within the native heap [and therefore can be undefined], within the managed heap [and therefore will be tracked if it should be relocated by the gc], and within the interior of a reference type object on the managed heap [again, requires tracking].
Forms (2) and (3) map into interior_ptr<V>, although the revised language supports both interior_ptr<V> and V*. The primary behavior difference is that the interior_ptr is a tracking pointer; that is, if the object addressed is on the managed heap and that object is relocated by the gc, the interior_ptr is updated with its new address. A V* is restricted to only address non-managed heap memory. It would be an error to attempt to assign a V* the address, for example, of &r->vr, or the address of pvbx [that is, __box V*]. An interior_ptr requires a nullptr to indicate a pointer to no object; a V* would require a 0. For example,
V *pv = 0; // may not address within managed heap
interior_ptr<V> pvgc = nullptr;
Form (3) is a tracking handle. It addresses the whole object that has been boxed within the managed heap [remember that boxing copies the value type into a reference type of the value]. It is translated in the revised language into a V^:
V^ pvbx = nullptr; // __box V* pvbx = 0;
The following declarations in the original language design all map to interior_ptrs in the revised language design being value types within the System namespace,
Int32 *pi; -> interior_ptr<Int32> pi;
Boolean *pb; -> interior_ptr<Boolean> pb;
E *pe; -> interior_ptr<E> pe; // Enumeration
The built-in types are not considered managed types, although they do serve as aliases to the types within the System namespace. Thus the following mappings hold true between thing1 and thing2:
int * pi; -> int * pi;
int __gc * pi -> interior_ptr< int > pi;
So, when translating a V* in your existing thing1 program, the most conservative strategy is to always turn it to an interior_ptr<V>. This is how it was treated under the original language. In the revised language, the programmer has the option of restricting a value type to non-managed heap addresses by specifying V* rather than interior_ptr<V>. If, on translating your program, you can do a transitive closure of all its uses and be sure that no assigned address is within the managed heap, then leaving it as V* is fine. All V __gc * should, of course, go to interior_ptr<V>.
Comments
Anonymous
February 26, 2004
I am not sure that I understand you when you say v.ToString() results in an error, because v needs to be boxed in order to access an inherited method. V does not need to be boxed; indeed, valuetypes are not boxed in VB.NET or C#, when ToString() is called.
The runtime does not require this. Since valuetypes are sealed, the address of the method to call is known at compile time. Thus, vtables are not necessary.
In addition, CLR provides ToString() with two entry points, so that the method can be called either by an object reference (with an initial pointer to the vtable) or by a pointer directly to the start of valuetype (since valuetypes on the stack don't have vtable). The first method, adjusts the object reference so that it points after the vtable pointer, and then falls into the second method, which starts right after the instructions that do the pointer adjustment.Anonymous
February 26, 2004
Wesner,
Try calling ToString() on an instance of:
_value class Complex
{
public:
Complex( double r, double i ) : m_r(r), m_i(i) {}
public:
//virtual String * ToString() { return String::Format( S"{0} + {1} i", m_r.ToString(), m_i.ToString() ) ; }
private:
double m_r,
m_i ;
} ;
I will fail with:
error C3610: 'Complex': value type must be 'boxed' before method 'ToString' can be called
Uncomment the overriden ToString method or use the __box operator and it works.Anonymous
February 26, 2004
The comment has been removedAnonymous
February 26, 2004
Add
public override String ToString() { return String.Format( "{0} + {1} i", r.ToString(), i.ToString() ) ; }
to the struct and the boxing is gone:
IL_000d: ldloca.s c
IL_000f: ldc.r8 1.
IL_0018: ldc.r8 2.
IL_0021: call instance void NeedToBox.Complex::.ctor(float64, float64)
IL_0026: ldloca.s c
IL_0028: call instance string NeedToBox.Complex::ToString()Anonymous
February 26, 2004
Of course, Complex is boxed in your example, because it is invoking System.ValueType::ToString. System.ValueType is a reference type!!
If you provide an overridden implementation of Complex.ToString. It will not be boxed.
The IL I get is the following:
L_0016: ldc.r8 1
L_001f: ldc.r8 2
L_0028: call Complex..ctor
L_002d: ldloca.s V_4
L_002f: call Complex.ToString
L_0034: stloc.3
Similarly, for primitive types, no boxing occurs with overridden calls to GetHashCode or ToString().
int i=2;
int k=3;
int j = i.GetHashCode();
string str = k.ToString();
L_0000: ldc.i4.2
L_0001: stloc.0
L_0002: ldc.i4.3
L_0003: stloc.1
L_0004: ldloca.s V_0
L_0006: call int.GetHashCode
L_000b: stloc.2
L_000c: ldloca.s V_1
L_000e: call int.ToString
L_0013: stloc.3Anonymous
February 26, 2004
I guess the clarification I need from this post was that ToString() must be overridden, for boxing to be avoid.Anonymous
February 26, 2004
Feel free to delete my prior comments, if I had read more carefully your post, I would have not been confused.Anonymous
May 30, 2009
PingBack from http://outdoorceilingfansite.info/story.php?id=3212Anonymous
June 02, 2009
PingBack from http://portablegreenhousesite.info/story.php?id=32875Anonymous
June 13, 2009
PingBack from http://outdoordecoration.info/story.php?id=3261Anonymous
June 15, 2009
PingBack from http://einternetmarketingtools.info/story.php?id=22318