String Literals are now a Trivial Conversion to String

There was a recent internal thread on the resolution of the following set of overloaded member functions of a reference class R. It represents a change in the earlier definition of C++/CLI, and a difference in type behavior that I reported in an earlier blog, and so I believe it is worth discussing. Here is the code snippet,

 

public ref class R {

public:

  void foo( String^ );

  void foo( const char* );

};

 

void bar( R^ r )

{

  // which one?

  r->foo( "abc" );

}

The question of the moment is, which instance of foo() is invoked? Since there is more than one instance, this requires the function overload resolution algorithm being applied to the call. The presumption of this blog entry and the few that follow this is that the majority of readers are likely unsure how to formally think through the algorithm. What I'd ask you to do before reading on is decide what you believe is the correct program behavior and have an explanation clear in your mind.

The formal resolution of an overload function involves three steps.

 

  1. The collection of the candidate functions. The candidate functions are those methods within the scope that lexically match the name of the function being invoked. For example, since foo() is invoked through an instance of R, all named functions foo that are not a member of R (or of its base class hierarchy) are not candidate functions. In our example, there are two candidate functions. These are the two member functions of R named foo. A call can fail during this phase if the candidate function set is null.

2. The set of viable functions from among the candidate function. A viable function is one that can be invoked with the arguments specified in the call, given the number of arguments and their types. In our example, both candidate functions are also viable functions. A call can fail during this phase if the viable function set is null.

  1. Select the function that represents the best match of the call. This is done by ranking the conversionsapplied to transform the arguments to the type of the viable function parameters. This is relatively straight-forward with a single parameter function; it becomes somewhat more complex when there are multiple parameters. A call can fail during this phase if there is no best match. That is, if the conversions necessary to transform the type of the actual argument to the type of the formal parameter are equally good. The call is then flagged as ambiguous.

In an earlier existence of the language, the resolution of this call invoked the const char* instance as the best match. In the present version of the language, the conversion necessary to match "abc" to const char* and String^ are now equivalent – that is, equally good – and so the call is flagged as bad – that is, as ambiguous.

This leads us to two questions:

  1. What is the type of the actual argument, "abc"?
  2. What is the algorithm for determining when one type conversion is better than another?

The type of the string literal "abc" is const char[4] – remember, there is an implicit null terminating character at the end of every string literal.

The algorithm for determining when one type conversion is better than another involves placing the possible type conversions in a hierarchy. Here is my understanding of that hierarchy – all these conversions, of course, are implicit. Using an explicit cast notation overrides the hierarchy similar to the way parentheses overrides the usual operator precedence of an expression.

1. An exact match is best. Surprisingly, for an argument to be an exact match, it does not need to exactly match the parameter type; it just needs to be close enough. This is the key to understanding what is going on in this example, and how the language has been changed.

2. A promotion is better than a conversion. For example, promoting a short int to an int is better than converting an int into a double.

3. A standard conversion is better than a boxing conversion. For example, converting an int into a double is better that boxing an int into an Object.

4. A boxing conversion is better than an implicit user-defined conversion. For example, boxing an int into an Object is better than applying a conversion operator of a SmallInt value class.

5. An implicit user-defined conversion is better than no conversion at all. An implicit user-defined conversion is the last exit before Error (with the caveat that the formal signature might contain a param array or ellipsis at that position).

So, what does it mean to say that an exact match isn't necessarily exactly a match? For example, const char[4] does not exactly match either const char* or String^, and yet the ambiguity of our example is between two conflicting exact matches!

 

An exact match, as it happens, includes a number of trivial conversions. There are four trivial conversions under ISO-C++ that can be applied and still qualify as an exact match. Three are referred to as lvalue transformations. A fourth type is called a qualification conversion. The three lvalue transformations are treated as a better exact match than one requiring a qualification conversion.

 

One form of the lvalue transformation is the native-array-to-pointer conversion. This is what is involved in matching a const char[4] to const char*. Therefore, the match of foo("abc") to foo(const char*) is an exact match. In the earlier incarnations of our C++/CLI language, this was the best match, in fact.

 

For the compiler to flag the call as ambiguous, therefore, requires that the conversion of a const char[4] to a String^ also be an exact match through a trivial conversion, something that is not currently documented in public language specification, so we can all be forgiven for being surprised at the current behavior.

So this represents a fifth trivial conversion, one unique to C++/CLI. If you think about it, it makes good sense – in the same vein as having a CLI enum more nearly match Object than an arithmetic type. But it also represents the difficulties facing the experienced C++ programmer in crossing over from Kansas to Oz.

Comments

  • Anonymous
    July 20, 2004
    Won't it break most of existing libraries who will try to port to C++/CLI? One override for String^ will break a lot of user code and make calls for the overriden function with string literals look much uglier. Maybe it is better to make two types of string literals differ by, say literal prefix (i.e. old string literals would look like "this", and new ones like c"this")?

  • Anonymous
    July 21, 2004
    "Maybe it is better to make two types of string literals differ by, say literal prefix (i.e. old string literals would look like "this", and new ones like c"this")?"

    That is exactly what Managed Extensions for C++ did with the S prefix, and what they are trying to avoid at this point (I would think)...

  • Anonymous
    June 18, 2009
    PingBack from http://barstoolsite.info/story.php?id=6622

  • Anonymous
    June 19, 2009
    PingBack from http://edebtsettlementprogram.info/story.php?id=22672