Working With Large Models In Entity Framework – Part 1
We have seen quite a few requests coming in from various folks asking for some guidance on best practices around working with large entity models in an Entity Framework application. The following post tries to describe the typical issues you would face when using a large entity model and tries to provide some guidance that hopefully will help mitigate some of these issues.
Issues with using one large Entity Model
The easiest way to create an Entity Model today is through the Entity Data Model Wizard in Visual Studio by pointing it at an existing database. The experience is very straight forward if the database size is not too big. Of course ‘big’ is a relative word. In general you should start thinking about breaking up a model when it has reached 50-100 entities. The Entity Framework can handle larger models but you could run into performance problems if the model is too inter connected (more details below). More importantly, though, it just becomes unwieldy to interact with very large models and the application complexity increases as the size of model increases beyond a certain level.
The typical problems you would see with a single large entity model:
I. Performance
One of the major problems you could run into with models generated from big database schemas is performance. There are two main areas where performance gets impacted because of the size of the model:
a. Metadata Load Times
The size of our Xml schema files is somewhat proportional to the number of tables in the database that you generated the model from. As the size of the schema files increase, the time we take to parse and create an in-memory model for this metadata would also increase. This is a onetime cost incurred per ObjectContext instance. We also cache this metadata per app domain based on Entity Connection String. So if you use the same EntityConnection string in multiple ObjectContext instances in a single app domain, you would hit the cost of metadata loading only once. But still this could account for a significant cost if the size of model gets pretty big and the application is not a long running one.
b. View Generation
View generation is a process that compiles the declarative mapping provided by the user into client side Entity Sql views that will be used to query and store Entities to the database. The process runs the first time either a query or SaveChanges happens. The performance of view generation step not only depends on the size of your model but also on how interconnected the model is. If two Entities are connected via an inheritance chain or an Association, they are said to be connected. Similarly if two tables are connected via a foreign key, they are connected. As the number of connected Entities and tables in your schemas increase, the view generation cost increases.
II. Cluttered Designer Surface
When you generate an Edm model from a big database schema, the designer surface is cluttered with a lot of Entities and it would be hard to make sense of how your Entity model in total looks like. If you don’t have a good overview of the Entity Model, how are you going to customize it? If you want to experience the problem I am talking about, try to create a default model for AdventureWorks sample database and try to make sense of the Entity model that is produced.
III. Intellisense experience is not great
When you generate an Edm model from a database with say 1000 tables, you will end up with 1000 different entity sets. Imagine how your intellisense experience would be when you type “context.” in the VS code window.
IV. Cluttered CLR Namespaces
Since a model schema will have a single EDM namespace, the generated code will place the classes in a single namespace. Some users have complained that they don’t like the idea of having so many classes in a single namespace.
Possible Solutions
Unfortunately there is no out of the box solution that we can offer at this point to solve some of these problems. But there are quite a few things that mitigate some specific issues listed above. Some of these make sense in specific scenarios and should be chosen as such.
I. Compile time view generation
Because view generation is a significant part of the overall cost of executing a single query, the Entity Framework enables you to pre-generate these views and include them in the compiled project. The cost is especially significant in big interconnected models as described in the problem definition. So you should definitely pre-generate views for large models. But the prescriptive guidance from EF team is to pre-generate views for all EF applications. You can read more about the process of pre-generating views here.
II. Choosing the right set of tables
There will be cases where your application might not require all the tables in a database to be mapped to the Entity Model. You could run into two different scenarios when you are selecting the subset of tables.
a. Naturally Disconnected Subset
In this scenario, the tables you want to work with are totally disconnected from the other tables in the database i.e. there are no outgoing foreign keys. This case is pretty simple to implement from the designer. If this approach fits your needs, I would strongly suggest using this since it is both straight forward and works great with the designer.
b. Choosing the subset by exposing foreign keys
This is an example where the subset of tables you want to work may have out going foreign keys to other tables in the database. When you do this, you would have to take the responsibility of setting the foreign key appropriately. There would be no navigation property that allows you to get the Entity that represents this foreign key. You could manually query for this Entity in the other container if needed. For example, let’s say your program works with just the Products and Suppliers table in Northwind. You can choose these tables and work with them. But CategoryID column in Products table which is a foreign key would show up as a scalar column instead of being an association. One important thing to note is that the Entity Framework’s update pipeline won’t be able to resolve dependencies across different subsets since you have removed the foreign key information from your storage schema( SSDL file). You would have to manage these dependencies and order the SaveChanges calls correctly when working with multiple subsets.
The schemas for this example can be found at in the attached .zip file under the SubsettingUsingForeignKeys folder.
The solutions I have described in this post have one major advantage in that they don’t require you to edit the Xml directly. You can do this all using the designer. But the above two options might not be ideal for your situation. You might end up in a world where you want to split up your model into smaller models but some types have to exist in multiple models simultaneously. You can still do this using the designer but you would have the same type defined in multiple models if you do this using the designer. The other option is to use a feature in Entity Framework usually referred to as “Using” that allows you to reuse types defined in one CSDL in another CSDL file. In my next post, I will have a couple of examples on how to do model splitting with “Using” and type reuse.
Srikanth Mandadi
Development Lead, Entity Framework
Comments
Anonymous
November 24, 2008
PingBack from http://blog.a-foton.ru/index.php/2008/11/24/working-with-large-models-in-entity-framework-%e2%80%93-part-1/Anonymous
November 24, 2008
I'm really looking forward to seeing these and have donloaded them but they are the raw files. One of the points you have made is that using these patterns you can do all of the work in the designer. Any chance of sharing some EDMX files so we don't have to dizzy ourselves looking at the raw xml and moving back and forth from one file to another to mentally put it all together? thanks julieAnonymous
November 24, 2008
You state above that "the prescriptive guidance from EF team is to pre-generate views for all EF applications." If this is the case, then why do you not provide a better integration scenario in Visual Studio? The steps that you suggest are not onerous, but they are also not obvious either. I would expect Visual Studio to implement the best practice by default, but allow me to easily change it. In the next release of EF, could you please do the best solution by default?Anonymous
November 24, 2008
Julie, Out of the 3 folders in the zip, only one(SubsettingUsingForeignKeys) corresponds to the post today. The other two are for the second part of the post where I will go over type reuse with "Using". Since designer does not support "Using", the Edmx files would not be very useful for these. I will try to share the Edmx file for SubsettingUsingForeignKeys sample but in the mean while you can put it together pretty easily from the CSDL, SSDL and MSL files following the steps from Sanjay in this post : http://blogs.msdn.com/dsimmons/archive/2007/12/07/how-to-use-your-existing-csdl-msl-ssdl-files-in-the-entity-designer-ctp2.aspx. Thanks SrikanthAnonymous
November 24, 2008
J'ai récemment été sollicité pour proposer des solutions afin de résoudre des problèmes de performancesAnonymous
November 24, 2008
I work with model with more than 70 tables and it will grow. I think that it would be great to be able to work with EDM Model like we work with database model in SQL Server. In SQL Server we are able to generate different diagrams which describes some aspects of relations. It could be also implemented in EDM Diagram in some way. Helpful may be creating boundaries inside Model so we would work with all model or only with a part of it, but the part would still have relations with other parts (tables in parts). Example slices: OrderSlice which consists tables: Order, OrderDetails, OrderStatus, OrderType, OrderHistory ProductSlice which consists tables: Product, productCategory, ProductFamily, ProductImages, ProductJme, ProductDescription, etc Is this all has sense to implement in future EF?Anonymous
November 24, 2008
A bit more helpful than Elisa Flasko's comment "Well, big entities are big entities...!" when someone asked this question at TechEd Europe recently.Anonymous
November 24, 2008
Last week, a customer asked me how to solve a big EDM performance problem? In his case, his model wasAnonymous
December 02, 2008
More general information about Entity Framework runtime performance can be found at http://msdn.microsoft.com/en-us/library/cc853327.aspx.Anonymous
December 03, 2008
Weekly digest of interesting stuffAnonymous
December 08, 2008
The comment has been removedAnonymous
December 10, 2008
EntityFramework的开发领导SrikanthMandadi称这个包含两部分内容的文章为Anonymous
February 16, 2009
Hey, we're working with quite a large database and using edmgen2.exe to generate our emdx and .cs files. I found this link very helpful as i didn't know that pre-generating the Views would actually speed everything up. It's created an 80 meg .cs file which VS actually struggles to build.. Once it's built though. It means development is much faster than it used to be. Every time we used to make a change and started up the web site we'd have to wait ages before linq would respond. I'd recommend to anyone to do this view generation stuff before they work with linq to entities on a day to day basis. I hope in the next version alot of the speed issues and this hidden stuff is going available as options or properties. Also that linq to entities catches up with Linq to SQL.Anonymous
April 08, 2009
Direi che una buona pagina da cui partire è questo documento su MSDN :Performance Considerations forAnonymous
May 21, 2009
I'm really looking forward to seeing these and have donloaded them but they are the raw files. One of the points you have made is that using these patterns you can do all of the work in the designer. Any chance of sharing some EDMX files so we don't have to dizzy ourselves looking at the raw xml and moving back and forth from one file to another to mentally put it all together? thanksAnonymous
May 26, 2009
Un de mes clients veut développer un ERP avec EF. Sa base contient plus de 600 tables quasiment toutesAnonymous
May 27, 2009
One of my customers wants to code an ERP. To make it, he wants to use EF. His DB has more than 600 tablesAnonymous
June 23, 2009
Does anyone know if there has been some improvement for big database structure? Does the VS 2010/.NET 4 handle it better? We are in the development process of an application that will grow. For the moment and for the next year we are not expecting a very huge model, but it might become large later on. What has changed with the new upcoming versions? ThanksAnonymous
July 19, 2009
Last week, a customer asked me how to solve a big EDM performance problem? In his case, his model wasAnonymous
July 23, 2009
I'm really looking forward to seeing these and have donloaded them but they are the raw files. One of the points you have made is that using these patterns you can do all of the work in the designer. Any chance of sharing some EDMX files so we don't have to dizzy ourselves looking at the raw xml and moving back and forth from one file to another to mentally put it all together?Anonymous
July 23, 2009
I'm really looking forward to seeing these and have donloaded them but they are the raw files. One of the points you have made is that using these patterns you can do all of the work in the designer. Any chance of sharing some EDMX files so we don't have to dizzy ourselves looking at the raw xml and moving back and forth from one file to another to mentally put it all together?Anonymous
August 08, 2009
una porkeria su Entity Framework..Anonymous
October 08, 2009
52 entities, 55 associations (Foreign keys) The Validate step worked slower and slower. Now it crashes both in VS and at run time ... EF is big and clumsy. I even wonder if it can be fixed. Many unnecessary features that should have been orthogonal to the framework not built in. I actually wanted to use Linq to SQL, that's a lean piece of software. But MS drops it and picks EF as the "winner". I apologize for being this harsh but it's ridiculous to consider a 50-100 tables system as being big. What's a 500 tables system then? I concur with Juan above ...Anonymous
December 12, 2009
could your rewrite the petshop demo whit Entity Framework, so we can get a best practice sample.Anonymous
December 22, 2009
What's wrong with the same type defined in multiple models? For example, with AdventureWorks, tables in the Person schema are related both to tables in the Sales and Human Resources schemas. Why not simply create two models, one for Sales and another for Human Resource, but with Person tables in both models? What are the problems with this approach?Anonymous
June 10, 2010
Good information and good way your blog post. Good luck blogger man.