D3: Fun with Concurrency and Model First – Part 2
In my last post I began the story of how I added concurrency checking capabilities to D3—we conquered a bug in the model first workflow activities in VS 2010 beta 1. Next we need to look into ways to customize the SSDL generated so that the concurrency column will end up with the RowVersion type that SQL Server will automatically update when any part of the row changes.
This lead to the next surprise. At first I thought that I would add a new custom activity to the workflow which would fix-up the SSDL, but this turned out to have its own complexities. As it turns out, the model first process runs the workflow within the visual studio process which means the DLLs containing activities for the workflow are loaded in the same way other VS DLLs are loaded, namely from the VS install directories or from the GAC. This also means if the DLL changes, you have to shutdown VS and restart it before model first will reflect the change. Conclusion: If you want to create a general purpose activity that many folks will reuse, then this is a reasonable approach You can carefully debug an activity and set something up to install it centrally. But, if you need a one-off customization, this is just too hard.
So we’re looking into options to help here, but in the meantime I’ve discovered another approach which seems to work well. The default workflow described in DbGen.xaml has two activities. The first one takes the conceptual model and produces an MSL and SSDL string, while the second one takes the SSDL and produces something which will create the database (normally a SQL file, but after my first round of customizations for D3 it will produce a c-sharp file with a method that will create it).
It turns out, though, that the second activity under the covers is a general-purpose mechanism that can be configured to take in a string, run it through a T4 template and produce an output string. The nice thing about using a template is that it allows you to write arbitrary code as a one-off in your solution and easily iterate on your design, modifying and rerunning without any need to restart VS, etc.
Eureka! The second activity can be used twice. In between the two default activities, it can be inserted again and configured to take the SSDL in, transform it with a different template and send it on to the final activity. In addition model first makes some useful bits of information like a path to the EDMX file available to the template, so we can access the CSDL in order to make better decisions about how to modify the SSDL.
The relevant part of the resulting XAML looks like this:
<Sequence>
...
<edm1:SsdlToDdlActivity ExistingSsdlInput="[ExistingSsdl]"
SsdlInput="[Ssdl]"
DdlOutput="[Ssdl]"
TemplatePath="DbGen\SsdlUseRowVersion.tt" />
...
</Sequence>
For D3, the fix-up we need is to find properties in the SSDL which correspond to properties in the CSDL with the following characteristics: type is binary, maxlength is 8, concurrency mode is fixed and store generation pattern is computed. So we read the EDMX into an XElement tree in memory and then use LINQ to XML to gather the names of properties that need to be changed in the SSDL grouped by the name of the table in which they appear:
var query = from property in csdl.Elements(edm + "EntityType").SelectMany(t => t.Elements(edm + "Property"))
where (string)property.Attribute("Type").Value == "Binary" &&
(string)property.Attribute("MaxLength").Value == "8" &&
(string)property.Attribute("ConcurrencyMode").Value == "Fixed" &&
(string)property.Attribute(store + "StoreGeneratedPattern").Value == "Computed"
group (string)property.Attribute("Name").Value
by entitySetName(property.Parent, csdl, edm);
This query depends on knowledge of the algorithm used by the CsdlToSsdlAndMslActivity to map conceptual model properties to the database—namely that the column names are the same as the property names on the entities and (at least for the base types of entity sets) the name of the table is the same as the name of the entity set that contains the type. So we query the csdl for all the elements that represent properties which have the required facets and extract the name of the property, then we group those property names by the name of the entity set containing the entity type to which the property belongs using these two functions:
string entitySetName(XElement entityType, XElement csdl, XNamespace edm)
{
return (from set in csdl.Element(edm + "EntityContainer")
.Elements(edm + "EntitySet")
where entityTypeName(set) == entityType.Attribute("Name").Value
select (string)set.Attribute("Name").Value).Single();
}
string entityTypeName(XElement entitySet)
{
XElement schema = entitySet.Parent.Parent;
string nsPrefix = schema.Attribute("Namespace").Value + ".";
string aliasPrefix = schema.Attribute("Alias").Value + ".";
string typeName = ((string)entitySet.Attribute("EntityType").Value).Trim();
if (typeName.StartsWith(nsPrefix))
{
return typeName.Substring(nsPrefix.Length);
}
else if (typeName.StartsWith(aliasPrefix))
{
return typeName.Substring(aliasPrefix.Length);
}
else
{
return "UNEXPECTED TYPENAME -- Where is the namespace?";
}
}
The result of this query is an IGrouping<string, string> where the first string is the name of the entity set and the second string is a collection of property names from the base type within that set. With that information it’s a simple matter to load the SSDL into memory, search for the relevant entity types (in the case of the SSDL, each table is represented by an entity type and each column within the table is a property of that type) and properties within them. Then we swap out the attributes on the XElement for that property and set the name, the type to rowversion and the store generated pattern to computed. Finally, the template serializes the XML for the SSDL to its output.
But what about derived types?
The above algorithm is all well and good for a model with no inheritance—every entity’s corresponding SSDL can be fixed up properly, but what if I want to put a concurrency property on an inherited type? As it turns out the EF doesn’t allow that; concurrency properties can only be present on the base type for an entity set. While I generally don’t like limitations of any kind, I can’t really think of any scenario where I’d want a concurrency property only on a derived type, and if I do have them on a base type, then I don’t need a different or additional property on any of the derived types.
In fact, for standard TPH and TPC scenarios all of the properties for a particular entity instance appear in a single row of a single table. With TPH all of the instances are in a single table, and in TPC there are multiple tables involved but each type has its own table with all of the properties both from the base type and added in the derived type in that table. So the concurrency property will appear on each table and a rowversion will work just fine because any property changing in that row will cause the concurrency property to change.
I got to thinking about TPT, though, and became a little worried. First off, TPT is the default inheritance strategy used by model first. In this strategy the properties from a base type all live in one table which has a row for every single entity in the set, while extra properties added by derived types are placed in their own tables with just those properties and the primary key. So for example, if I have entity type A with an id, scalar properties a and b, and a version, and I have type B which inherits from A and adds scalar property c, then the database would have table A with columns for id, a, b and version PLUS table B with columns for id and c. (A similar sort of case exists with entity splitting or mixed mode mapping, but it’s easier to think about the pure TPH, TPC and TPT scenarios.) In this kind of mapping, what happens if I have an instance of type B and only property c is modified? Then only table B needs an update, and the rowversion would not be updated since SQL Server only maintains it for changes to the row of the table where it resides. Doh!
After banging my head against the wall for a little while, I decided to check with some folks on the team to see what might be done to solve this problem. Could we relax the restriction on concurrency properties only being allowed in the base type (in which case maybe for TPT scenarios we could add a second concurrency property on each derived type which would track changes on the extra tables)? As it turns out Colin Meek, one of the brightest guys I know and a key member of the team, was way ahead of me. He was able to tell me that the design for the update code that handles concurrency checks already takes this condition into account, and whenever any part of an entity which has a concurrency token is modified, the update system of the EF guarantees that the table with the concurrency value is “touched”. In cases where there isn’t a real modification being made, a fake update statement is issued which doesn’t really change anything but does force the rowversion to update.
So, in the end there was nothing to worry about. The SSDL modification algorithm described above works just fine, and once the updated workflow XAML and template were put into place in the project, model first sales through cleanly and produces code which creates a database with rowversion properties for each entity type. And the whole thing is verified by the successful completion of the unit test described in the first part of the article.
Conclusions
AppWeek has come and gone, and in the end I spent the whole week getting my one test to run, but we have that much more of a stable foundation for D3 work to build on. In addition at least one key bug was found and fixed, and some extensibility scenarios were explored and opportunities for improvement identified. The really good news, though, is that the EF in general is starting to become flexible enough that even if it doesn’t work out of the box in just the way I want, I can customize it for my needs.
- Danny