SQL Server Spatial Application
Introduction
Our initial goal is to show off the new features of the Entity Framework June CTP release in a realistic way. The most significant features of that release are:
- SQL Server's spatial features
- enums
- table-valued functions
- and a few others
Of these, the spatial features appeared to have the most impact on the nature of the application. In addition, spatial applications are less widely known and understood than, for example, e-commerce order/entry apps, or inventory apps. So our first task was to brainstorm about possible apps.
Please feel free to contribute to this project. I am NOT an expert in SQL Server's spatial features, and have no background in this application domain. In one sense that is good: my experience will be similar to your boss handing you a project to evaluate possible uses for this new cool technology she or he has heard about. But my experience will definitely NOT simulate that of a spatial domain expert who is considering using SQL Server. So feel free to jump in, comment, criticize, add creative insights, etc. PLEASE do.
So far this project has gone through the following phases:
- initial brainstorming: throwing ideas about possible applications around, establishing constraints and assumptions
- feasibility: try out pieces of the necessary technology to verify that they appear to work
Initial Brainstorming
Constraints and Assumptions
Media
We likely want the UI to involve maps. There are other possible spatial apps referenced on the internet, that don't involve maps. But in these cases, the geometry data types appear to model objects in a CAD/CAM type of scenario, or possibly building layouts. These apps seemed somewhat more specialized than map-oriented applications, so I decided to exclude them from consideration.
In our initial brainstorming session, it was pointed out that there was a large body of photographs on the web which were geo-coded with location data that might be useful to use.
There was also the "coolness" factor to consider. Our initial tentative conclusion was that the app ideally could involve both maps and photos.
Simplicity
Since we are creating a sample app that users will install and play with, the data that it uses needs to be somewhat small and publicly available for free. We also likely want to have the app use a pre-existing database, rather than initializing it in the app. so that rules out a "Model-First" EF app. Others felt that we likely wanted to do a Code-First app because that is where customer interest is, so we tentatively decided on that. Code-First of course rules out the use of Table Valued Functions.
Availablilty of Data
There is a lot of geographic data available for free. We need to:
- be able to easily convert it into SQL Server data
- select an appropriate subset
We will likely select a subset of US data, and, likely from this local area, just to facilitate testing, because of our familiarity.
Another option that was mentioned was to take geocoded photos and upload those into the app, using spatial SQL INSERT statements.
Another option was to get data from some kind of service (Bing maps for example, or the "Dallas" Data Mart).
App Examples
We discussed a number of sample possiblities. We felt that we wanted more complex queries than just "show me all the stores within x miles of here", so that the app would exercise some of the Spatial SQL capabilities.
Path foto Viewer
Given an airplane flight path, show all photos within a mile of the point on the path where you currently are.
Or, show me all the hikes (with photos) within 10 miles of a travel path.
Real Estate House Finder
Show all houses within a price range in a neighborhood that are more than x miles from a major arterial, within a mile of an elementary school, and within some area that you cannot easily code into a table (zip code wouldn't work, but "within the legal city limits" might, since adjoining houses here in Seattle often use Seattle in their address, when they are actually outside the city limits).This is a good example of a complex spatial query, but house data is often proprietary.
TBD: is there public domain data available re "Recent house sales"?
Initial Research
We are actually in a similar position to someone whose boss has told them to investigat this new "SQL Spatial" technology and come up with a prototype for evaluating whether to go ahead with it or not. At this point it seems like the best thing to do is a feasibility study where we try out different Spatial operations, verify that they work the way we think they do from reading the documentation, verify that they work using Entity Framework code first, and hopefully learn more about the issues involved with any specific app. To do this, a simple console app is adequate, much like the ADO.Net team blog posting about the June CTP release that dealt with spatial, except that we would be extending it a fair amount to take into account possible applications.
Software Prerequisites
The ADO.Net team blog posting on the June CTP spatial features describes how to install the needed software.
Pay attention to the warning about only installing this software on a non-production machine, and if you need to uninstall, read the instructions carefully. I heeded the first bit of advice but not the 2nd, and so had the delightful experience of re-paving my machine.
Features to investigate
SQL Features
The ADO.Net team blog on spatial has code that demos one common scenario: "show me all the X's that are within Y {miles | kilometers | meters etc} of my present location". The posting code hard-codes "my location" and it would be cool to pick it up from your phone, but for now, let's limit the number of new features that we implement.
Additional queries that we may like to execute include:
- all the X's within Y miles from a highway
- all the X's within a given polygon (e.g. zip code, city, state, etc)
- complicated combinations of these (all houses within certain neighborhoods (zip codes?) that are within a mile of an elementary school and less than half a mile from a bus line, between a mile and 2 miles of a major freeway etc).
Non-SQL features
I've mentioned photos and maps. We should investigate what is involved in:
- calling Bing maps to display around a given point
- finding photos within X miles of a point
- we may want to investigate house listings: some of this data may be proprietary (houses for sale), but some of it is not (county data on house sales)
Investigating these initial features may disclose other factors we haven't considered, beyond availability of data, complexity of api to access the data, etc.
Feasibility Study
Can we start by using the app in the ADO.Net team blog on spatial? Its section on Code First with an existing database provides us with a number of points (landmarks) inside the city of Seattle, Washington.
We can simulate finding all X's within Y miles of a highway by creating lines using several landmarks as points, and then find all Landmarks with X distance of that line.
We can simulate finding all X's within an area by creating a polygon made up of lines.
We can create compound SQL queries from the above two items.
We can call Bing maps on a given landmark.
Finding all photos that satisfy some condition will require some research regarding sources of geo-encoded photos on the internet. Or we could insert such photos ourselves and query against them.
Housing listings may require more research, but we can certainly get started.
So it looks like we can get started with our feasibility study. Then we can evaluate the results and decide on an actual prototype to implement.
SQL Features
Lines
First let's create a Lines class, which will generate a table in the database, and then we will populate it with line instances. Initially these instances will contain just two points, but ultimately we'll want to create lines with multiple points, to reflect the real-world situation with actual highways.
So lets add some code to the spatial walkthrough, a new class for Lines, and a DbSet method call for Lines. The modified parts of the code now looks like this:
public class Line
{
public int LineID { get; set; }
public string LineName { get; set; }
public DbGeography LineLocation { get; set; }
}
public class SeattleLandmarksEntities : DbContext
{
public DbSet<Person> People { get; set; }
public DbSet<Landmark> Landmarks { get; set; }
public DbSet<Line> Line { get; set; }
}
Next we need to instantiate a few lines. Being lazy, I'll just add a few using points that are already defined as Landmarks.
This actually turned out to be harder than I initially thought. There are several non-intuitive aspects of the SQL Server Spatial api.
Every geographic or geometric object is of type DbGeography or DbGeometry, whether it is a point, line, polygon, etc. Speaking as a total newbie to the area, this seems way too abstract, since you would expect different methods to apply to the different kinds of objects. But that's the way it is, so I get to "just deal with it".
Instantiating an object has an obscure api: the method takes a single string parameter, which is parsed internally. Here's an example for creating a point, taken from the sample we are using as a base line.
var landmark16 = new Landmark
{
LandmarkID = 16,
Address = "1400 E. Galer Street",
Location = DbGeography.Parse("POINT(-122.31249 47.632342)"),
LandmarkName = "Volunteer Park"
};
The trouble with using "magic strings" as input parameters is that it is hard to guess what the string should be. As far as I could determine, there is no easily accessible and findable source of information describing its format: I couldn't find anything in Books On Line.
The other issue is, I don't want to instantiate a line by passing in a long list of coordinates. I already have a bunch of points: why can't I just create a line by passing 2 points to the constructor? It turns out that great minds think alike, and by using search I found a posting on the SQL Spatial forum asking the same thing (here). There was a proposed solution (using CLR functions) which might have performance issues, but I am going to ignore it for now, for the sake of simplicity. Anyway, the solution showed how to instantiate a line, so I was able to deduce the following code.
// make a line
var line1 = new Line
{
LineID = 1,
LineName = "whatever",
LineLocation = DbGeography.Parse("LINESTRING(-122.31249 47.632342, -122.317575 47.665229)")
};
I added this code to the sample, ran it, and got the same results as before. And we can look at the database in SSMS and verify that a new Line table has been created.
So now we are ready to do something productive: let's calculate the distance between a point and this line. Actually if we look at the existing query, which calculates all Landmarks within .5 miles from a fixed point, we can easily modify it to get all Landmarks within X miles of the single Line we inserted into the Lines table:
// Get landmarks within 2 miles of the line
var line = context.Line.Find(1);
var distanceInMiles2 = 2.5;
var distanceInMeters2 = distanceInMiles * 1609.344;
var landmarks2 =
from l in context.Landmarks
where l.Location.Distance(line.LineLocation) < distanceInMeters
select new
{
Name = l.LandmarkName,
Address = l.Address
};
Console.WriteLine("\nLandmarks within " + distanceInMiles2 + " mile(s) of " +
line.LineName + "'s location:");
foreach (var loc in landmarks2)
{
Console.WriteLine("\t" + loc.Name + " (" + loc.Address + ")");
}
This runs, produces plausible looking output. To verify it, it would be nice to use Bing Maps to plot all the Landmarks, and the Line. I spent about an hour searching on Bing Maps, and am satisfied this can be done, but did not find a quick app I could modify to do this. I did learn enough to be satisfied that it is feasible to use maps as a spatial interface, which was the goal at the point in the project.
I also discovered a blog discussing Spatial data available from the US Census Bureau. The author actually imported the data for a specific county into SQL Server. So another feasibility goal is satisfied: the free availability of a "nice" subset of spatial data.
But not to get distracted: how do I verify that the results of the above query are correct? The set of data is small enough that I should be able to manually plot it directly on Bing Maps, and verify visually that it is ok. After doing that, I discover that the query returned the two end points of the line ("University Library" and "Volunteer Park"), plus "Montlake Bridge", which visually looks less than 2 miles from the line. However there are several other points that look like they should have been selected ("Gas Works Park", "Anhalt Apartment Building"), but inspecting this visually is pretty approximate. My speculation is that we are only getting Landmarks within 2 miles of the end points of the Line, rather than the entire Line. So lets write a query that will get us those Landmarks. I'll write a method that will get the distance of every Landmark from a specified point, and we'll call it twice, once for each endpoint.
Actually my speculation was wrong, and the bugs were pretty typically dumb, of not much interest; but what might be interesting is a debugging method I wrote while trying to figure out what was going on. This method displays the distance of every Landmark object from a specified DbGeography object (point, line, polygon etc). It's pretty obvious code, but it was handy in brute force debugging. I'd suspect you would need to filter it once your set of Landmarks (or other Spatial object) became too numerous.
static void getLandmarkDistancesFromGeography(SeattleLandmarksEntities context, DbGeography geo, string Name)
{
var landmarks2 =
from l in context.Landmarks
select new
{
Name = l.LandmarkName,
Distance = l.Location.Distance(geo) / 1609.344
};
Console.WriteLine("\nLandmark distances from " +
Name );
foreach (var loc in landmarks2)
{
Console.WriteLine("\t" + loc.Name + " (" + loc.Distance + ")");
}
}
And the calls to this, which enabled me to figure out my problem, looked like this: note how you can call it with a variety of spatial objects.
getLandmarkDistancesFromGeography(context, person.Location, person.Name);
getLandmarkDistancesFromGeography(context, context.Landmarks.Find(15).Location, context.Landmarks.Find(15).LandmarkName);
getLandmarkDistancesFromGeography(context, context.Landmarks.Find(16).Location, context.Landmarks.Find(16).LandmarkName);
getLandmarkDistancesFromGeography(context, context.Line.Find(1).LineLocation, context.Line.Find(1).LineName);
The next thing I'd like to verify is that finding the distance from a line works when the line consists of more than two points. This will clearly be the real-world situation with most representations of roads and freeways.
So let's create a line consisting of two line segments. We want each segment to have Landmarks that it will or will not select, exclusive of the other segment, so the distance function is doing a bit more work. Here's the code, which gets added to the SeattleLandmarkSeed.cs file.
var line2 = new Line
{
LineID = 2,
LineName = "Queen Anne Through Downtown",
LineLocation = DbGeography.Parse("LINESTRING(-122.349755 47.647494, -122.352842 47.6186, -122.334571 47.604009 )")
};
.......
context.Line.Add(line2);
Executing the sample, we find via SSMS that the Line table now has two rows.
Let's see all the Landmarks within a half mile of this Line. And, so the code doesn't get too strung out, lets refactor by creating a new method. Here's the new method:
private static void FindLandmarksNearLine(SeattleLandmarksEntities context, int lineID, double DistanceInMiles)
{
var line = context.Line.Find(lineID);
var distanceInMeters2 = DistanceInMiles * 1609.344;
var landmarks2 =
from l in context.Landmarks
where l.Location.Distance(line.LineLocation) < distanceInMeters2
select new
{
Name = l.LandmarkName,
Distance = l.Location.Distance(line.LineLocation),
Limit = distanceInMeters2,
Address = l.Address
};
Console.WriteLine("\nLandmarks within " + DistanceInMiles + " mile(s) of " +
line.LineName + "'s location: (limit in meters: ");
foreach (var loc in landmarks2)
{
Console.WriteLine("\t" + loc.Name
+ " (" + loc.Address + ")"
+ "\t" + " (" + loc.Distance + ")"
+ "\t" + " (" + loc.Limit + ")"
);
}
}
And here is how it is called:
FindLandmarksNearLine(context, 1, 2.5);
FindLandmarksNearLine(context, 2, 0.5);
Now the only problem is, we need to add a couple of Landmarks so that each segment of the line will have Landmarks that it does select, and Landmarks that it does not select. So we do the following, in th SeattleLandmarksSeed.cs file.
var landmark17 = new Landmark
{
LandmarkID = 17,
Address = "",
Location = DbGeography.Parse("POINT(-122.34727 47.64677)"),
LandmarkName = "Aurora Bridge"
};
....
context.Landmarks.Add(landmark17);
Now each segment of the line has Landmarks that get selected and that don't.
If we were at the stage of doing thorough testing, we'd try adding a Line with a large number of segments, but that's for later.
Polygons
The final bit of spatial SQL to test is polygons: can we select all the Landmarks within a given polygon?
First we create a polygon that includes some but not all Landmarks, with this code.
Non-SQL Features
BING Maps
To verify the results of the above code, I thought it would be nice to use Bing Maps to plot all the Landmarks, and the Line (and Polygon). I spent about an hour searching on Bing Maps, and am satisfied this can be done, but did not find a quick app I could modify to do this. I did learn enough to be satisfied that it is feasible to use maps as a spatial interface, which was the goal at the point in the project.
Availability of Data
I discovered a blog discussing Spatial data available from the US Census Bureau. The author actually imported the data for a specific county into SQL Server, and provides sample code, including:
- how to import the data
- sample queries against it
In addition, by word of mouth I have heard of numerous other government sources of public domain spatial data. So another feasibility goal is satisfied: the free availability of a "nice" subset of spatial data.
Photos and Residential Sales data
Photos remain to be investigated. There are possible legal issues involved that at least need to be investigated (see issues in Europe with Google's street view photos) and resolved.
Residential sales data is a more specialized subset that we'll investigate later.