Condividi tramite


Source for a C# compiler written in pure C#.

For anybody looking for the full source to a bootstrapping C# compiler, today’s your lucky day.

 

A while ago (back in 2001 before we shipped v1.0), I wrote a C# compiler called “Blue”. I know it’s 3.5 years after I wrote it, but I figured releasing it now was better late then never. Some fast facts:

- It’s written 100% in C#.

- It uses Reflection to import all references and Reflection.Emit to emit the IL.

- Everything (particularly the parser and lexer) were written by hand.

- It has the standard compiler pipeline as described in the dragon book.

- It produces verifiable IL (you can run PEVerify on the output and it passes).

 

Here’s the source: https://mikewinisp.members.winisp.net/blog/blue/blue.zip

There’s a ReadMe.html in there with more information. Unzip it do a directory , and then in a v1.0 or v1.1 command shell, run build_all.bat.

 

 

It’s a giant reflection-emit test.

Blue demonstrates a whole bunch of different parts of Reflection.Emit such as:

- Baking types.

- Creating nested types.

- Creating delegates - you’ll notice there’s no DelegateBuilder.

- Creating Enums – I recall having to dance around some issues with EnumBuilder.

- Emitting debugging information for all the code constructs (sequence points, local names, parameter names, etc) - I gave a brief example of how Ref.Emit can generate debuggable code, here. Blue is an extreme example here because it includes most C# language features and generates fully debuggable + verifiable code.

 

What’s missing?

I view Blue primarily as a sample of Reflection.Emit, and not actually as a viable production compiler. Although it’s complete enough to compile itself, it’s missing a bunch of features like:

- Custom attributes

- unsafe code

- checked/unchecked, locked keyword, using keyword

- floating point, decimal types.

- Multi dimensional arrays. (note that it can handle jagged arrays)

- Some operator overloading. (Allows binary operators like + to

- I didn’t quite get function overload resolution entirely correct. I didn’t follow the spec close enough.

be overloaded, doesn't yet handle overloaded type casts, unary ops, or ++/--.)

- Error handling: It still may assert when compiling an illegal program. (I would use csc.exe to verify inputs before sending them through blue)

I notice it also doesn’t work on whidbey because it gets confused when importing generics from mscorlib. There also may be some issues with v2.0 Ref.Emit that we’re still looking at. So you need to do this demo with V1.1. I may explore trying to update the sample to at least run on v2.0.

 

But at least it dogfoods itself…

Despite the limitations, it’s still complete enough to compile itself (eg, “dogfood”, “bootstrap”). One confession: when I was trying to have it dogfood itself, there were a few places were features that I had naturally used when writing blue weren’t yet supported in blue. In order to get it dogfood, I made local changes to avoid these features. I marked these with “@dogfood” in the code.

 

The blue download includes a batch file, build_all.bat, which will first build blue.exe from csc, and then use blue to compile itself to produce dogfood.exe:

@echo ***** Building the Blue compiler

@rem Build the compiler first using C#. This will produce a main.exe

csc /out:blue.exe @blue_source_core

@echo ***** C# compiled the blue sources to produce Blue.exe

@echo *** Show that the newly compiled blue.exe (which came from csc.exe) is Verifiable IL

peverify blue.exe

@rem Now have the compiler build itself and produce a Dogfood.exe

blue.exe /out:Dogfood.exe @blue_source_core

@echo ***** Blue.exe compiled itself to produce Dogfood.exe

@echo *** Show that blue.exe produced verifiable code

peverify dogfood.exe

 

You’ll notice the peverify stages pass.

 

You can verify that blue.exe actually compiles itself correctly by cycling the compilation 1 more step:

dogfood.exe /out:Dogfood2.exe @blue_source_core

 

Now Dogfood and dogfood2 were produced by the same algorithm (Blue) which was just compiled 2 different ways. So we expect them to be the same output. We can ildasm both dogfood.exe and dogfood2.exe and verify that (the only difference is assembly name and mvid).

 

Now what?

This is just sample code I’m throwing out there. It was just a pet project of mine a few years ago, so it’s not a supported product. And I’m not very proud of the code quality either (it was one of the first things I wrote in C#). And it’s definitely not a production quality tool either. All that aside, feel free to play around with it or try out various ideas. For example, you could:

- Play around with codegen. Perhaps go transform all the calls to be late bound and then compare perf.

- Add “inline IL”, similar to inline assembly in C/C++.

- add any of the missing v1.1 features.

- cannabalize it to get the ref-emit code.

- add any of the new C# v2.0 features

If anybody does end up doing something interesting with it, let me know!

Comments

  • Anonymous
    February 06, 2005
    Awesome!

    I just have one question. Are you releasing it into the public domain, or are you releasing it under some certain license. I would suggest public domain, BSD or MT licenses.

  • Anonymous
    February 06, 2005
    I appreciate this.

  • Anonymous
    February 06, 2005
    This is in the public domain. Go wild. If you do anything interesting with it, let me know.

  • Anonymous
    February 06, 2005
    Nice job Mike, thanks!

  • Anonymous
    February 06, 2005
    Source for a C# compiler written in pure C#.

  • Anonymous
    February 07, 2005
    I know you aren't citing this as the canonical example of how to write a compiler, but I have to make one suggestion (maybe others will be interested):

    In factoring the expression grammar to permit recursive descent parsing, you've broken up the productions into halves, but are left with the problem that it creates trees the wrong way around, so in order to get around that you pass the left part to the recursive routine (the other half of the broken up production) (I hope I didn't get that too wrong, I've only looked at the code for 5 minutes).

    There is an easier way: that is, don't use recursion for same-precedence-level expressions, and instead use iteration.

    Thus, for a simple grammar that looks like this:

    E ::= T (('+'|'-') T)
    T ::= F (('
    '|'/') F)
    F ::= <ident> | <const> // etc...

    One can code E, T, F, like the following (in pseudocode, indentation indicates blocks, with Pascal-style set expressions, case statements implicitly end in break; I don't know how the formatting is going to end up with this):

    function E(skip: bool) : Exp
    Exp result = T(skip)
    while true
    switch (PeekToken())
    case '+': result = new Add(result, T(true));
    case '-': result = new Sub(result, T(true));
    default: break loop;
    return result;

    function T(skip: bool) : Exp
    Exp result = F(skip)
    while true
    switch (PeekToken())
    case '
    ': result = new Mul(result, F(true));
    case '/': result = new Div(result, F(true));
    default: break loop;
    return result;

    function F(skip: bool) : Exp
    if (skip)
    NextToken();
    switch (PeekToken())
    <const>: return new Const(<constValue>);
    <ident>: return new VarOrFuncEtc(<ident>);
    default: throw new Error("Expected a factor");

  • Anonymous
    February 07, 2005
    Barry - thanks for the suggestion - that's a good point. It's like inlining tail recursion.

    I think you're right about the recursion. On a historical note - I was actually creating the trees backwards for a while. I didn't notice it until I tried to dogfood it, and then applied the quick fix you observed.

    It's also very true that this is definitely not a model compiler example. That's actually why I decided to just release it as a private sample and not put it on MSDN. We believed that if we put it on MSDN, that people we interpret that as Microsoft's official blessing that this was the way to write compilers; when in reality, this is just a glorified reflection-emit demo.

  • Anonymous
    February 07, 2005
    link

  • Anonymous
    February 07, 2005
    http://blog.csharpbox.com/default.aspx#ae043b654-9502-4077-9925-95d1c71f1b01

  • Anonymous
    February 07, 2005
    TrackBack From:http://www.cnblogs.com/atempcode/archive/2005/02/08/103274.html

  • Anonymous
    February 07, 2005
    TrackBack From:http://www.cnblogs.com/atempcode/archive/2005/02/08/103275.html

  • Anonymous
    February 09, 2005
    This is sweet! I've been trying to look for an example besides the ones that come with the fx SDK, this is the best one! Thanks for posting this Mike!

  • Anonymous
    February 15, 2005
    Any interest in making a sourceforge project out of this and getting some collaboration going to put some polish on it?

  • Anonymous
    February 15, 2005
    Dan - that's a great idea and several people have given me that suggestion. I'd like to do it and am following up with it. If I can get it, I'll certainly post back here to announce it.

  • Anonymous
    February 27, 2005
    Can you provide us some sample applications developed using BLUE.

    It will be really helpful if we get some script files of all the features so that we can do the R&D on the compiler.

  • Anonymous
    February 28, 2005
    Sandeep - there's not really any sample applications since Blue's not a production quality compiler. Blue dogfoods itself, but certainly wasn't developed with itself.

    However, I do have about 100 unit tests for Blue. I'll try to get those up.

  • Anonymous
    May 08, 2006
    David&amp;nbsp; Srbecky asked:

    Can a EnC capable compiler work on top of System.Refletion.Emit? (ie. If...

  • Anonymous
    August 02, 2006
    PingBack from http://xiu.shoeke.com/archives/2006/07/29/links-for-2006-07-29/

  • Anonymous
    August 08, 2007
    When people start using C# generics for the first time, they are sometimes surprised that they can’t

  • Anonymous
    January 14, 2008
    Azul: Teaching .NET Some Espa

  • Anonymous
    October 08, 2008
    Source for a C# compiler written in pure C#...

  • Anonymous
    October 31, 2008
    PingBack from http://megos.wordpress.com/2008/11/01/compiler-bootstrapping/

  • Anonymous
    January 20, 2009
    PingBack from http://www.hilpers-esp.com/441514-hacer-un-compilador-con-c