Spatial Data Type Structure

Hi Folks,

As part of a larger effort to document our protocols, SQL Server has just released documentation on the structure of our spatial types.  The document is also available in PDF.  Note that all of this is preliminary, and while I doubt it will change much, it could.  I had very little to do with this effort; you have Ed to thank.

It turns out that this structure is pretty simple, and there are folks out there who have fairly easily reverse-engineered it.  Now that we’re publishing the structure, let me make one very specific warning: if you are going to play around with the binary format, do not set the valid bit (v) unless you’re absolutely sure the object is valid.

Setting this should not cause SQL Server itself any problems: you won’t crash anything.  We just can’t guarantee that you’ll get the most predictable results.  And things get very difficult to fix.

Let’s work an example.  Consider the following linestring: LINESTRING (0 0, 10 0, 5 0, 10 0).  This is clearly invalid according to the OGC, since it overlaps itself.  We can put this into a geometry element and play with it:

 declare @g geometry = 'LINESTRING (0 0, 10 0, 5 0, 10 0)'
 select @g.STIsValid()            -- 0
 select @g.MakeValid().STLength() -- 10

We can’t run the STLength() without first making the object valid or we’ll get an error.  We can now pull out the binary version of this to play with:

 select @g

And we get:

 0x000000000100040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002

We’re particularly interested in the highlighted byte, which contains a number of flags, all currently false.  Most of these aren’t particularly dangerous to play with.  For example, the lowest-order bit tells SQL Server that the item contains Z values.  If you flip it, you’ll just end up with an error, since the data that follows doesn’t actually contain any Z values:

 declare @g geometry = 0x000000000101040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002
  
 Msg 6522, Level 16, State 1, Line 1
 A .NET Framework error occurred during execution of user-defined routine or aggregate "geometry": 
 System.IO.EndOfStreamException: Unable to read beyond the end of the stream.

The valid bit, the third-least significant, is a little touchy, though.  Let’s see what happens if we flip that on our instance and run a few operations:

 declare @g geometry = 0x000000000104040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002
 select @g.STIsValid()  -- 1
 select @g.STLength()   -- 20

What’s happening?  First, the system is trusting the bit and telling us that the instance is valid.  This shouldn’t be too surprising: the bit was added so that we wouldn’t have to perform an expensive check every time we check the validity, and we’re using the optimization.

Second, the result of the STLength() operation is a bit odd.  It’s calculating the length used to draw the invalid object, not the length of the object itself.

Okay, you say: let’s make this instance valid and see if we can fix things:

 declare @h geometry = @g.MakeValid()
 select @h.STLength()  -- 20
 select @h.ToString()  -- LINESTRING (0 0, 10 0, 5 0, 10 0)

Ack!  We’re trapped.  MakeValid() tries to preserve the input geometry whenever possible, and therefore refuses to touch a valid instance.  Our instance isn’t actually valid, but we said it was.

So if you’re creating a geometry from scratch, how are you to know how to set this bit?  That is, how on Earth do you figure out if an instance is valid?  Unless you’re absolutely sure, let the system do it for you: tell SQL that it isn’t valid and then run MakeValid().  Unlike setting the value true, there’s absolutely no harm in doing this—other than perf.  In fact, when MakeValid() sees an object that isn’t marked as valid, it will first check to see if the instance is valid.  If it is, MakeValid() won’t muck with the data, but will just flip the bit to true.

And keep in mind that although you can play with these structures, most of the time you shouldn’t need to.  Use the built-in methods, and use the builder/sink API if you need to do something fancy.  As always, there are plenty of examples on the SQL Server Spatial Tools CodePlex project.

Cheers,

-Isaac

Comments

  • Anonymous
    August 17, 2009
    The comment has been removed

  • Anonymous
    August 17, 2009
    The comment has been removed

  • Anonymous
    August 17, 2009
    The comment has been removed

  • Anonymous
    August 17, 2009
    The comment has been removed

  • Anonymous
    August 18, 2009
    The comment has been removed

  • Anonymous
    August 21, 2009
    The comment has been removed

  • Anonymous
    August 21, 2009
    The comment has been removed

  • Anonymous
    August 21, 2009
    The comment has been removed

  • Anonymous
    August 22, 2009
    The comment has been removed

  • Anonymous
    August 22, 2009
    Aha---I see!  I've been using the word "overlaps" in two very different ways.  I should be more careful, but the OGC spec unfortunately encourages this.  (More on that in a second.) The OGC spec has a very particular---and perhaps peculiar---definition of "overlaps".  Under this definition, a linestring can never overlap itself. The way I meant it is that the set of points covered by the figure more than once is infinite.  LINESTRING (0 0, 10 0, 5 0, 10 0) overlaps itself over the segment from (5 0) to (10 0).  These are the cases we disallow. Let me get on my soapbox for a second and complain that the OGC doesn't exactly encourage good use of the language.  The crosses and touches predicates are particularly strange.  Consider these two linestrings: G = LINESTRING (0 0, 10 0) H = LINESTRING (0 5, 5 0, 10 5) Do these two touch?  Do they cross?  Does this comport with your intuitive understanding of these terms?  :) Anyway, apologies for the confusion. Cheers. -Isaac

  • Anonymous
    August 22, 2009
    The comment has been removed

  • Anonymous
    August 23, 2009
    Isaac, I've read the OGC SFS 1.1 spec (Ref: OGC 05-126, Date: 2005-11-22, Version: 1.1.0) a number of times and I am confused. The spec defines validity only for Polygons (6.1.11.1 - "The assertions for Polygons (the rules that define valid Polygons) are as follows") and, in the version I have, does not even mention an IsValid function at all for any geometry type. It clearly defines IsSimple() and isClosed(). (Though clearly there are some basic things one doesn't want a linestring to have such as: 1) a single point; 2) duplicate points, that could form the basis of an IsValid() function. I have an old SQL3/MM document that has ST_IsValid() in it but your implementation is only OGC SFS 1.1 compliant, right?) I cannot find in the OGC SFS 1.1 document I have where is says that it...  "has a very particular---and perhaps peculiar---definition of "overlaps".  Under this definition, a linestring can never overlap itself." Where does it say this? Yes, it talks about determining Overlaps on page 26 but this is about determining the relationships between two objects and this is not about validity but about relationship. It seems to me from reading the specifications that linestrings are only subject to basic validity (see above) and isSimple, isClosed but that is all. Polygons are specifically subject to greater validity as one would expect. But the lack of extra documentation for linestrings as against polygons doesn't mean one can "fill in the gap" as if it wasn't properly filled in the first place. Since there are plenty of other people on the SFS 1.1 committee (Microsoft was not on those committees AFAIK) did you discuss these issues of interpretation with them? Finally, Regina, Oracle doesn't have an IsValid bit for any of its geometry types. You can store incorrect geometries if you want: Oracle will only report them as being incorrect if you run sdo_geom.validate_geometry. So, here, ESRI can store involuted polygons (See http://www.spatialdbadvisor.com/oracle_spatial_tips_tricks/110/esri-arcsde-exverted-and-inverted-polygons-and-oracle-spatial/) without Oracle complaining and even thought there are non-standard - having been created before the standards were written. regards Simon

  • Anonymous
    August 24, 2009
    The comment has been removed

  • Anonymous
    August 24, 2009
    The comment has been removed