Udostępnij za pośrednictwem



May 2016

Volume 31 Number 5

[Data Points]

Dapper, Entity Framework and Hybrid Apps

By Julie Lerman

Julie LermanYou’ve probably noticed that I write a lot about Entity Framework, the Microsoft Object Relational Mapper (ORM) that’s been the prime .NET data access API since 2008. There are other .NET ORMs out there but a particular category, micro-ORMs, gets a lot of notice for great performance. The micro-ORM I’ve heard mentioned most is Dapper. What finally piqued my interest enough to take some time out to crack it open recently was various developers reporting that they’ve created hybrid solutions with EF and Dapper, letting each ORM do what it’s best at within a single application.

After reading numerous articles and blog posts, chatting with developers and playing a bit with Dapper, I wanted to share my discoveries with you, especially with those who, like me, perhaps heard of Dapper, but don’t really know what it is or how it works—or why people love it. Keep in mind that I am in no way an expert. Rather, I know enough to satisfy my curiosity for the time being, and hopefully spark your interest so you’ll dig even further.

Why Dapper?

Dapper has an interesting history, having spawned from a resource you might be extremely familiar with: Marc Gravell and Sam Saffron built Dapper while working at Stack Overflow, solving performance issues with the platform. Stack Overflow is a seriously high-traffic site that’s destined to have performance concerns. According to the Stack Exchange About page, Stack Overflow had 5.7 billion page views in 2015. In 2011, Saffron wrote a blog post about the work he and Gravell had done, titled, “How I Learned to Stop Worrying and Write My Own ORM” (aka.ms/Vqpql6), which explains the performance issues Stack was having at the time, stemming from its use of LINQ to SQL. He then details why writing a custom ORM, Dapper, was the answer for optimizing data access on Stack Overflow. Five years later, Dapper is now widely used and open source. Gravell and Stack and team member Nick Craver continue to actively manage the project at github.com/StackExchange/dapper-dot-net.

Dapper in a Nutshell

Dapper focuses on letting you exercise your SQL skills to construct queries and commands as you think they should be. It’s closer to “the metal” than a standard ORM, relieving the effort of interpreting queries such as LINQ to EF into SQL. Dapper does have some cool transformational features such as the ability to explode a list passed to a WHERE IN clause. But for the most part, the SQL you send to Dapper is ready to go and the queries get to the database much more quickly. If you’re good at SQL, you can be sure you’re writing the most performant commands possible. You need to create some type of IDbConnection, such as a SqlConnection with a known connection string, in order to execute the queries. Then, with its API, Dapper can execute the queries for you and—provided the schema of the query results can be matched up with the properties of the targeted type—automatically instantiate and populate objects with the query results. There is another notable performance benefit here: Dapper effectively caches the mapping it learned, resulting in very fast deserialization of subsequent queries. The class I’ll populate, DapperDesigner (shown in Figure 1), is defined to manage designers who make very dapper clothing.

Figure 1 DapperDesigner Class

public class DapperDesigner
{
  public DapperDesigner() {
    Products = new List<Product>();
    Clients = new List<Client>();
  }
  public int Id { get; set; }
  public string LabelName { get; set; }
  public string Founder { get; set; }
  public Dapperness Dapperness { get; set; }
  public List<Client> Clients { get; set; }
  public List<Product> Products { get; set; }
  public ContactInfo ContactInfo { get; set; }
}

The project where I’m executing queries has a reference to Dapper, which I retrieve via NuGet (install-package dapper). Here’s a sample call from Dapper to execute a query for all the rows in the DapperDesigners table:

var designers = sqlConn.Query<DapperDesigner>("select * from DapperDesigners");

Note that for code listings in this article, I’m using select * rather than explicitly projecting columns for queries when I want all columns from a table. sqlConn is an existing SqlConnection object that I’ve already instantiated, along with its connection string, but haven’t opened.

The Query method is an extension method provided by Dapper. When this line executes, Dapper opens the connection, creates a DbCommand, executes the query exactly as I wrote it, instantiates a DapperDesigner object for each row in the results and pushes the values from the query results into the properties of the objects. Dapper can match the result values with properties by means of a few patterns even if the property names don’t match the column names and even if the properties aren’t in the same order as the matching columns. It can’t read minds, though, so don’t expect it to figure out mappings involving, for example, numerous string values where the orders or names of the columns and properties are out of sync. I did try a few odd experiments with it to see how it would respond and there are also global settings that control how Dapper can infer mappings.

Dapper and Relational Queries

My DapperDesigner type has a number of relationships. There’s a one-to-many (with Products), a one-to-one (ContactInfo) and many-to-many (Clients). I’ve experimented with executing queries across those relationships, and Dapper can handle the relationships. It’s definitely not as easy as expressing a LINQ to EF query with an Include method or even a projection. My TSQL skills were pushed to the limit, however, because EF has allowed me to become so lazy over the past years.

Here’s an example of querying across the one-to-many relationship using the SQL I would use right in the database:

var sql = @"select * from DapperDesigners D
           JOIN Products P
           ON P.DapperDesignerId = D.Id";
var designers= conn.Query<DapperDesigner, Product,DapperDesigner>
(sql,(designer, product) => { designer.Products.Add(product);
                              return designer; });

Notice that the Query method requires I specify both types that must be constructed, as well as indicate the type to be returned—expressed by the final type parameter (DapperDesigner). I use a multi-line lambda to first construct the graphs, adding the relevant products to their parent designer objects, and then to return each designer to the IEnumerable the Query method returns.

The downside of doing this with my best attempt at the SQL is that the results are flattened, just like they’d be with the EF Include method. I’ll get one row per product with designers duplicated. Dapper has a MultiQuery method that can return multiple result­sets. Combined with Dapper’s GridReader, the performance of these queries will definitely outshine EF Includes.

Harder to Code, Faster to Execute

Expressing SQL and populating related objects are tasks I’ve let EF handle in the background, so this is definitely more effort to code. But if you’re working with lots of data and runtime performance is important, it can certainly be worth that effort. I have about 30,000 designers in my sample database. Only a few of them have products. I did some simple benchmark tests where I ensured I was comparing apples to apples. Before looking at the test results, there are some important points to understand about how I did these measurements.

Remember that, by default, EF is designed to track objects that are the results of queries. This means it creates additional tracking objects, which involves some effort, and it also needs to interact with those tracking objects. Dapper, in contrast, just dumps results into memory. So it’s important to take EF’s change tracking out of the loop when making any performance comparisons. I do this by defining all my EF queries with the AsNoTracking method. Also, when comparing performance, you need to apply a number of standard benchmark patterns, such as warming up the database, repeating the query many times and throwing out the slowest and fastest times. You can see the details of how I built my benchmarking tests in the sample download. Still, I consider these to be “lightweight” benchmark tests, just to provide an idea of the differences. For serious benchmarks, you’d need to iterate many more times than my 25 (start at 500), and to factor in the performance of the system on which you’re running. I’m running these tests on a laptop using a SQL Server LocalDB instance, so my results are useful only for comparison’s sake.

The times I’m tracking in my tests are for executing the query and building the results. Instantiating connections or DbContexts isn’t counted. The DbContext is reused so the time for EF to build the in-memory model is also not taken into account, as this would only happen once per application instance, not for every query.

Figure 2 shows the “select *” tests for Dapper and the EF LINQ query so you can see the basic construct of my testing pattern. Notice that outside of the actual time gathering, I’m collecting the time for each iteration into a list (called “times”) for further analysis.

Figure 2 Tests to Compare EF and Dapper When Querying All DapperDesigners

[TestMethod,TestCategory("EF"),TestCategory("EF,NoTrack")]
public void GetAllDesignersAsNoTracking() {
  List<long> times = new List<long>();
  for (int i = 0; i < 25; i++) {
    using (var context = new DapperDesignerContext()) {
      _sw.Reset();
      _sw.Start();
      var designers = context.Designers.AsNoTracking().ToList();
      _sw.Stop();
      times.Add(_sw.ElapsedMilliseconds);
      _trackedObjects = context.ChangeTracker.Entries().Count();
    }
  }
  var analyzer = new TimeAnalyzer(times);
  Assert.IsTrue(true);
}
[TestMethod,TestCategory("Dapper")
public void GetAllDesigners() {
  List<long> times = new List<long>();
  for (int i = 0; i < 25; i++) {
    using (var conn = Utils.CreateOpenConnection()) {
      _sw.Reset();
      _sw.Start();
      var designers = conn.Query<DapperDesigner>("select * from DapperDesigners");
      _sw.Stop();
      times.Add(_sw.ElapsedMilliseconds);
      _retrievedObjects = designers.Count();
    }
  }
  var analyzer = new TimeAnalyzer(times);
  Assert.IsTrue(true);
}

There’s one other point to be made about comparing “apples to apples.” Dapper takes in raw SQL. By default, EF queries are expressed with LINQ to EF and must go through some effort to build the SQL for you. Once that SQL is built, even SQL that relies on parameters, it’s cached in the application’s memory so that effort is reduced on repetition. Additionally, EF has the ability to execute queries using raw SQL, so I’ve taken both approaches into account. Figure 3 lists the comparative results of four sets of tests. The download contains even more tests.

Figure 3 Average Time in Milliseconds to Execute a Query and Populate an Object Based on 25 Iterations, Eliminating the Fastest and Slowest

*AsNoTracking queries Relationship LINQ to EF* EF Raw SQL* Dapper Raw SQL
All Designers (30K rows) 96 98 77
All Designers with Products (30K rows) 1 : * 251 107 91
All Designers with Clients (30K rows) * : * 255 106 63
All Designers with Contact (30K rows ) 1 : 1 322 122 116

 

In the scenarios shown in Figure 3, it’s easy to make a case for using Dapper over LINQ to Entities. But the narrow differences between raw SQL queries may not always justify switching to Dapper for particular tasks in a system where you’re otherwise using EF. Naturally, your own needs will be different and could impact the degree of variation between EF queries and Dapper. However, in a high-traffic system such as Stack Overflow, even the handful of milliseconds saved per query can be critical.

Dapper and EF for Other Persistence Needs

So far I’ve measured simple queries where I just pulled back all columns from a table that exactly matched properties of the types being returned. What about if you’re projecting queries into types? As long as the schema of the results matches the type, Dapper sees no difference in creating the objects. EF, however, has to work harder if the results of the projection don’t align with a type that’s part of the model.

The DapperDesignerContext has a DbSet for the DapperDesigner type. I have another type in my system called MiniDesigner that has a subset of DapperDesigner properties:

public class MiniDesigner {
    public int Id { get; set; }
    public string Name { get; set; }
    public string FoundedBy { get; set; }
  }

MiniDesigner isn’t part of my EF data model, so DapperDesigner­Context has no knowledge of this type. I found that querying all 30,000 rows and projecting into 30,000 MiniDesigner objects was 25 percent faster with Dapper than with EF using raw SQL. Again, I recommend doing your own performance profiling to make decisions for your own system.

Dapper can also be used to push data into the database with methods that allow you to identify which properties must be used for the parameters specified by the command, whether you’re using a raw INSERT or UPDATE command or you’re executing a function or stored procedure on the database. I did not do any performance comparisons for these tasks.

Hybrid Dapper Plus EF in the Real World

There are lots and lots of systems that use Dapper for 100 percent of their data persistence. But recall that my interest was piqued because of developers talking about hybrid solutions. In some cases, these are systems that have EF in place and are looking to tweak particular problem areas. In others, teams have opted to use Dapper for all queries and EF for all saves.

In response to my asking about this on Twitter, I received varied feedback.

@garypochron told me his team was “moving ahead with using Dapper in high call areas & using resource files to maintain org of SQL.” I was surprised to learn that Simon Hughes (@s1monhughes), author of the popular EF Reverse POCO Generator, goes the opposite direction—defaulting to Dapper and using EF for tricky problems. He told me “I use Dapper where possible. If it’s a complex update I use EF.”

I’ve also seen a variety of discussions where the hybrid approach is driven by separation of concerns rather than enhancing performance. The most common of these leverage the default reliance of ASP.NET Identity on EF and then use Dapper for the rest of the persistence in the solution.

Working more directly with the database has other advantages besides performance. Rob Sullivan (@datachomp) and Mike Campbell (@angrypets), both SQL Server experts, love Dapper. Rob points out that you can take advantage of database features that EF doesn’t give access to, such as full text search. In the long run, that particular feature is, indeed, about performance.

On the other hand, there are things you can do with EF that you can’t do with Dapper besides the change tracking. A good example is one I took advantage of when building the solution I created for this article—the ability to migrate your database as the model changes using EF Code First Migrations.

Dapper isn’t for everyone, though. @damiangray told me Dapper isn’t an option for his solution because he needs to be able to return IQueryables from one part of his system to another, not actual data. This topic, deferred query execution, has been brought up in Dapper’s GitHub repository at bit.ly/22CJzJl, if you want to read more about it. When designing a hybrid system, using some flavor of Command Query Separation (CQS) where you design separate models for particular types of transactions (something I’m a fan of) is a good path. That way, you aren’t trying to build data access code that’s vanilla enough to work with both EF and Dapper, which often results in sacrificing benefits of each ORM. Just as I was working on this article, Kurt Dowswell published a post called “Dapper, EF and CQS” (bit.ly/1LEjYvA). Handy for me, and handy for you.

For those looking ahead to CoreCLR and ASP.NET Core, Dapper has evolved to include support for these, as well. You can find more information in a thread in Dapper’s GitHub repository at bit.ly/1T5m5Ko.

So, Finally, I Looked at Dapper. What Do I Think?

And what about me? I regret not taking the time to look at Dapper earlier and am happy I’ve finally done so. I’ve always recommended AsNoTracking or using views or procedures in the database to alleviate performance problems. This has never failed me or my clients. But now I know I have another trick up my sleeve to recommend to developers who are interested in squeezing even more performance out of their systems that use EF. It’s not a shoo-in, as we say. My recommendation will be to explore Dapper, measure the performance difference (at scale) and find a balance between performance and ease of coding. Consider StackOverflow’s obvious use: querying questions, comments and answers, then returning graphs of one question with its comments and answers along with some metadata (edits) and user info. They’re doing the same types of queries and mapping out the same shape of results over and over again. Dapper is designed to shine with this type of repetitive querying, getting smarter and faster each time. Even if you don’t have a system with the insane number of transactions that Dapper was designed for, you’re likely to find that a hybrid solution gives you just what you need.


Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at juliel.me/PS-Videos.

Thanks to the following Stack Overflow technical experts for reviewing this article: Nick Craver and Marc Gravell
Nick Craver (@Nick_Craver) is a Developer, Site Reliability Engineer, and sometimes DBA for Stack Overflow. He specializes in performance tuning in all layers, overall system architecture, data center hardware, and maintaining many open source projects such as Opserver.

Marc Gravell is a developer at Stack Overflow, with a particular focus on high-performance libraries and tools for .NET, especially data-access, serialization, and network APIs, contributing to a range of open source projects in these areas.


Discuss this article in the MSDN Magazine forum