Contiguousness of STL containers and string

In C++, it is well-known that the data in the vector is contiguous. To be more specific, here is the quotation from the standard (C++03, 23.2.4/1)

The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

There're two points. First, vector<bool> is special. It is optimized for size, and the bools are packed. Second, &v[0] is only valid if v.size() > 0.

(BTW: The above statement doesn't exist in C++98. Here is the link to the history: https://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#69)

vector is designed to be an advanced version of raw array, so the guarantee of contiguous is convenient when raw pointer is needed to pass to low level API. We can simply use v.empty()?NULL:&v[0] (if you use default allocator which implies that the return value of "operator []" is a real reference, not a proxy).

In C++0x, it adds "data" member function for the similar purpose.

In contrast, other containers in STL don't store data contiguously.

string is a little different. It is not a STL container and standard doesn't explicitly say whether the data is contiguous. But it also provides "data" member function which returns "const charT *". Here is the definition in C++03:

If size() is nonzero, the member returns a pointer to the initial element of an array whose first size() elements equal the corresponding elements of the string controlled by *this
The program shall not alter any of the values stored in the array.

On the other hand, it is also said that for "operator []"

If pos < size(), returns data()[pos].

Because "operator []" return reference, for string s, we have &s[0] == data(). However, data() is not modifiable according to the bold text above. It is also confusing that we can get non-const reference from data().

There is a issue about whether string data is contiguous: https://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530

Fortunately, it will be explicitly stated in C++0x, see n2798.pdf 21.3.1/3:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

The C++0x draft also changes "operator []", so it no longer relies on "data" (in fact, "data" will have the same behavior of c_str in C++0x):

If pos < size(), returns *(begin() + pos).

So, in C++0x, the data of string is guaranteed to be contiguous, and you can pass &s[0] (if s.size() > 0 and you use default allocator) to low level API like vector. Cheers!

Comments