Pointer to String chars - Everett style

Garrett asked:

If the source text is in a CLR String, and we want to pass(even read-only) to unmanaged code, it appears that there is no way to get a pointer to the String's buffer directly. We have to use the marshalling stuff to get it there, which in itself makes a copy.

Given that one of managed C++ and CLI/C++ 's goals (imnsho) is to facilitate leveraging existing native c++ code, has any thought been given to this?

Can I get a native pointer to the data in a CLR String? The short answer is yes, so long as you don't mind a wchar_t* - which is native analog of the actual backing store type for a CLR String (the CLR type System::Char). Even in Everett, we supported doing this. You have to use a special function in order to get at it, located in the header file , which shipped with Everett. This header file includes a function, PtrToStringChars, which takes a String* and returns a wchar_t __gc*. You can use the returned pointer - called an “interior pointer” - to munge with the string data in a fairly intuitive “native” way, as in this example code:

#using <mscorlib.dll>
#include <vcclr.h>
using namespace System;

int main(){
String *s = S"abcdefg";
wchar_t __gc* pc = PtrToStringChars(s);
for(int i=0; iLength; i++){
*(pc+i)+=1; //increment each character in the string
}
Console::WriteLine(pc); //writes "bcdefgh"
}

Yeah, but I wanted to use a native function. I'm getting there. Now, you can't convert from a __gc* to a __nogc* (“native pointer“), but you can convert from a __gc* to another type - __pin* - which has a conversion to a __nogc*:

#using

<mscorlib.dll>
#include <vcclr.h>
using namespace System;

int unmanagedStrLenFunction(wchar_t *c){ //counts the length of c
int count=0;
while(*c){
count++;
c++; //heh
}
return count;
}

int

main(){
String *s = S"abcdefg";
wchar_t __gc* pc = PtrToStringChars(s);
wchar_t __pin* ppc = pc;
int x = unmanagedStrLenFunction(ppc);
Console::WriteLine(__box(x)); //writes "7"
}

I could have turned the result of PtrToStringChars directly into a wchar_t __pin* directly, but I wanted to make it absolutely clear.

Wow! Pin pointers are cool! I'm going to use them everywhere! Whoa there, trigger. There are a few things to keep in mind about pin pointers:

  1. They can be extremely costly. The pin pointer works by literally pinning the enclosing type down, so the GC collector can't move it around when its doing collections. Do this too often, or keep the pin pointer around for a relatively long time, and you're seriously hurting the performance of the garbage collector - not a good idea.
  2. They can't be used everywhere. By design, because of the costliness and lifetime problems involved with pin pointers, they can't be: members of a type, function return types, function parameters, or temporary variables.
  3. Pin pointers only pin objects for their lifetime. This leaves open the possibility of GC holes. That is, you can get a native pointer to the GC, release the pin pointer, and then leave yourself a huge GC hole. For example:

__gc class A{
public:
int i;
};

int* gchole(A* a){
int __pin* p = &(a->i);
return p;
}

What's so wrong with that code? On the surface, it looks pretty benign. But remember that the object passed in (a) is only pinned for the lifetime of the pin pointer p. So, when the function returns, you have a native pointer into the GC heap, which would be safe, except p has been destroyed. So, instead, you have a GC hole. The pointer returned from the function gchole is only going to be valid until the next garbage collection - and who knows when that will happen. In short, don't do this, if you want to avoid unexplainable, untraceable, unreproduceable application crashes.

Back to the original question, what about regular char*'s? No chance, not without incurring a copy cost (either by using API functions that turn wchar_t*'s into char*'s, or by marshalling).

In a future article, I'll describe the new syntax versions of the pinning and interior pointer, and some of the (mostly minor) differences.

Comments

  • Anonymous
    December 23, 2003
    The comment has been removed
  • Anonymous
    December 23, 2003
    Yeah, I'm actually not sure if the offset into the class is a constant or not, as the function uses the OffsetToStringData method of the runtime.


    Upon ildasming mscorlib.dll (which everyone should do from time to time, just to see what's in it), it looks as though the offset is 12 - but this probably varies by architecture.
  • Anonymous
    December 23, 2003
    Speaking of ildasm -- I was thinking that it'd be nice if some of these type utilities were built into Visual Studio. I'm trying to do mentoring on the finer points of software devleopment with some folks and I find that they are less likely to use tools that are external to the development environment. (sigh--kids these days ;)

    That, and developer MSIL support in VS.NET would be nice too.

    Heck, while I'm at it, can I get a pony?

    Garrett
  • Anonymous
    August 10, 2005
    The comment has been removed
  • Anonymous
    May 30, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=3178
  • Anonymous
    June 18, 2009
    PingBack from http://outdoordecoration.info/story.php?id=3223