Code Generators: Can't live with them, can't live without them

I'm still not sure what I think about code generators. This may sound strange, coming from someone who has spent much of the last few years working on and talking up software factories, of which code generation is a significant part - but it's true. On one hand, I love the idea of eliminating manual coding of routine tasks and recurring patterns, improving productivity and minimising bugs. On the other hand, every code generator I've ever worked with has had problems, whether it is in the cost of maintaining the tool and templates, or issues with the generated code.

I like to divide code generators into two categories. The first is the "black magic" type, where you never change, or even look at the generated code. The good thing about this type is that you can re-run the code generator as often as you want without worrying about overwriting any of your changes. The bad thing is that if the generated code isn't exactly what you want, you're in trouble. There are a few ways you can tweak the code without actually changing anything written by the generator, such as using partial classes or inheritance, but your options are always going to be very limited.

The other category is the "one-time accelerator" type, which will spit out code which is hopefully pretty close to what you want, but which will need to be modified by hand to get it exactly right. The advantage of this approach is that you should always eventually be able to get what you want, but it means that you'll have to manually re-apply your changes every time you regenerate. It also means you need to fully understand the generated code, since you're ultimately responsible for maintaining it.

My main quarrel with code generators stems from the fact that we all want the "black magic" type, but in my experience they hardly ever deliver on their promise. The problem is that all too often the generated code just doesn't do what you want. This leads to a few possible outcomes:

  1. You stubbornly stick with whatever the generator gives you, and consequently you are forced to engineer all sorts of hacks in your own code to work around the shortcomings in the generated code.
  2. You modify the code generation templates, resulting in a vast array of additional configuration knobs and dials, so that the generator is able to build the code you need for your application. But chances are that these changes won't actually help for any future applications, as they will all bring a brand new set of idiosyncrasies and require yet more knobs and dials.
  3. You bite the bullet and modify the generated code to meet your needs, dumping it into the "one-time accelerator" bucket and forcing you to live its implications.

One possible explanation as to why these problems are so common is that if you ever do find a problem that can be solved well by "black magic" code generators, you can probably codify the solution in a framework or component library, eliminating the need for any kind code generation whatsoever. The challenges with "black magic" code generation are the reason why patterns & practices software factories generally don't even try that approach. We tried to mitigate the "one-time accelerator" limitations by only generating a small amount of code in one go, but this brings its own set of problems.

This topic is at the front of my mind as code generators have caused a bit of angst in our team lately. We're using a generator (home-grown, but much the same as other solutions you've probably seen) to generate data access layers, stored procedures and business entities. The code that it generates is generally very good (otherwise we wouldn't be using it), but as always it isn't perfect. The biggest problem is that, for any given table, the generator will give you a complete suite of CRUD operations whether you want them or not. For many people this may not be a big deal - but I downright refuse to have code in my solution that is unnecessary and untested. My fear is that if we leave this code in the solution, at some stage some developer will be tempted to call it - and since nobody ever asked for it or tested it, it may be completely unsuitable for the application. So my rule is, if the generator builds something you don't need for your current task (even if it may be needed later), it's not allowed in the solution.

The problem is, since we're using a lot of agile development techniques, we tend to update our database schemas quite a lot. This means that we need to regenerate our data access artifacts a lot as well. To make matters worse, we've also found the need for the occasional tweak to the generated code to make sure it meets our requirements. So the combination of frequent schema changes, my rules about stripping out unneeded code, and the need to hand-tweak the code means that the generation process is fast becoming more trouble than it's worth. I know we could make changes to our generator or use an existing one with more features to get around some of these problems (such as being able to specify and save which operations are generated for each table), but I fear that we'll never get quite where we want to be. But on the other hand I'm concerned that if we stop using a generator for our data access artifacts then we'll face a swag of different problems, such as inconsistent implementations and increased development time.

This is where you can come in and save the day. The first person to explain how to make code generation work well in this situation (preferably without causing any disruption to our team or schedule) gets a six pack of the Aussie beer of their choice. Unfortunately due to customs regulations you'll need to come by to collect - but believe me it will be worth it.

Comments

  • Anonymous
    November 16, 2007
    For me, 1. and 3. are definitely not an option. So that leaves option number 2.

  • For not creating code there shouldn't be problems :-) - you have to store a list of what has to be generated (or what shouldn't be) somewhere and somehow pass it to code generator at code generating time.

  • For tweaks it depends (I don't know the extent or the nature of your tweaks). Perhaps you could implement tweaked methods in a partial class by hand?

  • Anonymous
    November 17, 2007
    Sounds like your complaints aren't necessarily about generated code; but about code you don't really have ownership of.  Any external library you use will contain code that you're likely not going to use.  If you view the generated code as an external library, unused code is just cost of "reuse". Short of providing configuration to govern which of the C,R,U, or D methods to generate; I don't see much of a "friendly" alternative.  That "configuration" could involve meta-data on the database side (either database-specific properties) or inferred information like if there's a stored procedure named tablename_CreateRow don't generate the create method for that table... Maybe partial methods in C# 3 might be helpful... If you're generating partial classes, maybe the generator can analyse the non-generated file and detect if the C,R,U, or D methods already exist (maybe using CodeDOM).  Or, maybe test for attributes that configure which of the CRUD methods to generate... It's too bad a #define declared in one partial class file doesn't get proliferated to the others; otherwise you could simply generate code like: #if !NO_CREATE_METHOD public void CreateMethod() { } #endif and add #define NO_CREATE_METHOD in the non-generated CS file so the CreateMethod never gets compiled... Or, what about generating multiple files per class?  If you keep each CRUD method in it's own file you can simply not include, or exclude, that CS file in the build...

  • Anonymous
    November 17, 2007
    Being stuck in DB land for a few months now, I have an idea... Maybe, you could expand on your code generator that checks for the existence of a table within your db that explicitly outlines which objects in the database to generate code on, and what methods (CRUD) should be created. If this table doesn't exist, create all ojects and CRUD method...? This way you can dynamically create your code based on the data in this "generator template" table. This would give you some more control over the areas that you have spoken on that cause you frustration. This doesn't even have to be a table, it could be an XML file similar to what netTiers uses, but expanding on it to also contains methods that should be created. I just realized I am echoing Miha's comments... Sorry Miha, but I agree with you whole heartedly.

  • Anonymous
    November 17, 2007
    Life (and software development) is full of trade-offs. I typically prefer the approach of creating a framework and have done so for database access via stored procs.  The developers just called the appropriate method (Add, Update, Select, Delete) and provided the name of the proc.  The downside/tradeoff, is that in the case of a Select, the method returned a Recordset (this was ADO-days) instead of a pre-defined, strongly-typed object. The upside was that developer only created procs they needed and there were only four methods in the whole system that actually had to interact with the database.

  • Anonymous
    November 17, 2007
    The comment has been removed

  • Anonymous
    November 17, 2007
    The comment has been removed

  • Anonymous
    November 17, 2007
    Thanks for the comments so far everyone. Peter, I get your point that every framework or library will contain unused code, and it's fair to ask why I'm worried about unused generated code. I guess I see a difference between unused code in System.Globalization.HijriCalendar and in MyApplication.DataAcceess.DeleteAuditLog. The former is obviously not designed for my application (so some kind analysis will be needed to see if it will help), but if you do choose it you can be confident that it's highly tested. The latter looks like it was designed just for my app (so there may be more temptation to call it without thinking carefully), but it may never be appropriate and it may not do what people assume.

  • Anonymous
    November 17, 2007
    The comment has been removed

  • Anonymous
    November 17, 2007
    This may be too simplistic of an idea, but what about saving a copy of your original generated code, then when you have to regenerate because of a DB change, do a diff against the original generated code and your current codebase.  Generate the new code, and apply the changes to the new generated code? Hope this helps spark an idea that'll work for you. Take care, Jeff

  • Anonymous
    November 17, 2007
    Have you had a look at nettiers (www.nettiers.com). This tool has given us  minimal disruption after schema changes, and via partial classes allows us to keep our additions to both entities and data access methods. Highly recommended. Has advanced features like processing pipeline, extensible entity vaildation and Deepload / Deepsave for working with fuller object graphs.

  • Anonymous
    November 18, 2007
    Code generators can be useful.  And, where they are useful I think that they identify an oportunity for one or a comination of the following:

  • Improve the framework. With good methods you can reduce the amount of generated code needed. You can do this yourself.
  • Improved tools.  Maybe the way that you want to work with the form isn't the way the designers of the form automation tools imagined. This is where the code generator lives.  It would be nice if it were easier to hook a code generator into Visual Studio where you need it.
  • Improved or alternate languages. Language extensions like LINQ reduce the amount of code need for common tasks.  The danger is that over time the language turns into a monster. You could look upon a data file that drives a code generator as a small special purpose alternate language.
  • Anonymous
    November 18, 2007
    The comment has been removed

  • Anonymous
    November 18, 2007
    Tom Hollander just posted a note Code Generators: Can't live with them, can't live without them . His

  • Anonymous
    November 18, 2007
    Most of the comments are about the specific case of generating the DB mapping code, but don't address the key question: When it is worth creating a code generator as opposed to either (a) writing the code by hand or (b) trying to bake the logic (in this case db mapping) into a framework? Please see my blog entry that is trying to address this very question: http://blogs.msdn.com/wojtek/archive/2007/11/18/code-generators-when-can-you-live-with-them.aspx

  • Anonymous
    November 18, 2007
    Tom Hollander just posted a note Code Generators: Can't live with them, can't live without them

  • Anonymous
    November 18, 2007
    The comment has been removed

  • Anonymous
    November 18, 2007
    I haven't found too many scenarios like you describe where the code generation can't be replaced by a general purpose framework. Sorry for the typo.

  • Anonymous
    November 18, 2007
    The comment has been removed

  • Anonymous
    November 20, 2007
    The comment has been removed

  • Anonymous
    November 26, 2007
    "Don't generate code, generate the config files and have generic processes to manage record maintenance and other common requirements"... Well, I quite disagree with this. This is for example the pure meta-data based dynamic approach. To me, it should be avoided whenever you can (except if everything is really dynamic, which means you do not know your entities at design time). Experience at customers we have met trying to use this kind of approach or frameworks like N-Hibernate are nightmares. Performance is only a small part of the issue. The real issue is the operations and the debugging. Personnally, I hate messages like "Data Layer Error, Save Method, Entity #232"... whereas when you use strongly typed classes aligned to your business vocabulary, it is much easier to maintain : "Error on Customer.Save, Id 31". Furthermore, I think we should learn from the past. Code Generators have always been at the heart of CASE Tools that have been quite popular on mainframes. So to me, they are definitely part of the solution for industrializing software. But this does not mean you need to do it yourself. As Wojtek posted, it can be very costly as complexity grows fast. But instead one can rely on products on the market, that have fully tested their generated code. My 0.02 Cents, Daniel

  • Anonymous
    November 28, 2007
    I second the comments by Daniel above.  Go the (N)Hibernate (config driven runtime framework) approach and you won't scale to large domains (if you want to go there).  If you find yourself wanting to customize generated code then imagine how much fun you'll have trying to customize (N)Hibernate. Codegen is only as good as its input (including 'templates'), i.e. GIGO. If your only input (besides templates) is the data model then you will soon run into limitations (e.g. your unwanted output code). Consider:

  1. DataModel+Templates -> CodeGenCode vs.
  2. (MetaData+DataModel)+Templates -> CodeGenCode This MetaData is not 'configuration' for the templates but something higher level than (but deferring detail to) the DataModel.  E.g. "Customer contains fields X,Y,Z where X is typed as per field X1542 in table Cust101".  The code generator gets detail from the data model but structure from the meta data. The metadata becomes a knowledgebase unto itself and can grow to serve any number of needs. Having said all this, codegen is like a cookie: messy to make but yummy in the end, even when only half baked. Cheers, -Matthew Hobbs