Condividi tramite


Roslyn Code Quoter tool – generating syntax API calls to generate any C# program

Whether you’ve played with the Roslyn CTP or are planning to do so in the future, chances are that sooner or later you’ll run into a need to generate code.

One approach is to just use a StringBuilder, concatenate the source together, and then just call any of the several parse methods (SyntaxTree.ParseCompilationUnit, Syntax.ParseExpression, Syntax.ParseStatement, etc.) on the source text to get a syntax node or the whole syntax tree:

 var root = Syntax.ParseCompilationUnit("class C{}");

Another way is to manually call the fluent API on the Syntax class (factory methods to create nodes, tokens and trivia). For those who don’t know, nodes are non-terminals, tokens are terminals, and trivia are whitespace, comments, preprocessor directives, etc. To create the syntax tree for this program: “class C{}”, you’d need to make a series of syntax API calls similar to this:

 var root = Syntax.CompilationUnit()
.WithMembers(
    Syntax.List<MemberDeclarationSyntax>(
        Syntax.ClassDeclaration(
            Syntax.Identifier(
                @"C"))
        .WithKeyword(
            Syntax.Token(
                SyntaxKind.ClassKeyword,
                Syntax.TriviaList(
                    Syntax.Space)))
        .WithOpenBraceToken(
            Syntax.Token(
                SyntaxKind.OpenBraceToken))
        .WithCloseBraceToken(
            Syntax.Token(
                SyntaxKind.CloseBraceToken))))
.WithEndOfFileToken(
    Syntax.Token(
        SyntaxKind.EndOfFileToken));

The first thing you notice is that the second approach is slightly more verbose (I’ve calculated that it explodes ~50-100 times in code length on an average program). So using this to serialize syntax trees is (though absolutely possible) quite impractical (might as well use Charles Petzold’s CSAML). It just so happens that the C# syntax is the most terse human-readable format to serialize C# syntax trees.

However the second approach still has its advantages:

  1. Not always you need to construct the whole tree from scratch – more likely you already have most of the nodes you need, you just need to recombine them in a certain configuration using just a few glue syntax nodes. In this case the factory methods are really handy.
  2. With a StringBuilder it quickly gets hairy when you generate code based on some non-trivial logic (e.g. open curly, repeat this part N times, after each time (but not after the last one) insert a comma, close curly, indent this much, etc). You need to worry about creating a syntactically valid tree, keep track of closing braces, brackets and parentheses, indentation, etc.
  3. It’s faster than parsing text – the parser will eventually call the same APIs to construct the tree, but it first needs to parse and that takes time and memory. By specifying the structure of the tree in our API calls, we eliminate the need for the parser.

So sometimes there’s a need to manually write calls to those Syntax.* APIs to construct syntax nodes. And it can get tedious. Also, sometimes it’s not obvious which API need to be called to construct syntax nodes of the desired shape.

So I decided to write a sample demo tool that would automate generating syntax API calls to construct any given program. The tool is called Quoter, because what it does is basically quasi-quotation: given a source program, generate a program that, when run, will generate the source program. Among programming languages that natively support quasi-quotation are F#, Nemerle, and, beside others, … C#! Surprised? Yes, C# has at least two features where the compiler is quoting your code – generating code that describes how to generate your code at runtime. The first one is, of course, expression trees, where you just write an expression like you would otherwise, and the compiler emits calls to Expression.* factory methods for you. The second one is more subtle: when you have calls on a variable or expression typed as dynamic, the compiler bakes in the information about the calls you made into a call site, which is basically a description of the syntax tree that you had in the source program, that is interpreted (and cached!) at runtime. I guess you could say that expression trees do “early bound quoting” (type resolution occurs at compile time), whereas dynamic does “late bound quoting” (it produces unbound trees and binding occurs at runtime).

How would one write a quoting tool? A good approach would be to use a parser to obtain the syntax tree, and then apply a syntax visitor to that tree – the classic Visitor design pattern where you declare a VisitXXX method for each type of syntax node, token and trivia. You get a class declaration syntax node and you emit “Syntax.ClassDeclaration(“ + ... + “)”. Simple, but tedious. We have lots and lots of kinds of nodes!

Fortunately for me, a lot of the Roslyn classes that model nodes, tokens and trivia, as well as most factory methods, are themselves generated (we didn’t write all of them by hand). Due to this nature they all look uniform and adhere to simple, predictable rules. In fact, to generate a call to Syntax.ClassDeclaration() for a node of type ClassDeclarationNode, you could use Reflection to inspect all public static methods on the type Roslyn.Compilers.CSharp.Syntax, and select the one that has ClassDeclarationSyntax type as its ReturnType.

Moreover, for the properties of nodes that aren’t initialized from the factory method, you could enumerate them, put them in a bag, and then for each property pick out a With*** modification method that accepts the type of the property as parameter. Using this simple approach, the main recursive method of the tool can look surprisingly simple:

     /// <summary>
    /// The main recursive method that given a SyntaxNode recursively quotes the entire subtree.
    /// </summary>
    private ApiCall QuoteNode(SyntaxNode node, string name)
    {
        List<ApiCall> quotedPropertyValues = QuotePropertyValues(node);
        MethodInfo factoryMethod = PickFactoryMethodToCreateNode(node);
 
        var factoryMethodCall = new MethodCall()
        {
            Name = factoryMethod.DeclaringType.Name + "." + factoryMethod.Name
        };
 
        var codeBlock = new ApiCall(name, factoryMethodCall);
 
        AddFactoryMethodArguments(factoryMethod, factoryMethodCall, quotedPropertyValues);
        AddModifyingCalls(node, codeBlock, quotedPropertyValues);
 
        return codeBlock;
    }

You will immediately notice that I myself am using simple string concatenation and basically, the StringBuilder approach to generate the code that will generate the code. Using the Roslyn syntax APIs here to generate calls to the Roslyn syntax APIs that will generate the target source code is left as an exercise to the reader. I almost wish that the tool would quote it’s own source, find a fixed point and eventually converge to write itself, however for obvious reasons this is not going to happen (quoting is not a contraction mapping on the space of C# programs).

For now, visiting the tree creates a simple data structure I defined (ApiCall), which is basically a tree of strings. This is the simplest representation I could find to represent the simple method calls of the form Syntax.A(b, c).WithD(e).WithF(g). If we later want to actually use the generated code to construct the syntax tree, we could copy-paste the generated program into our own project. To verify that my Quoter tool does the right thing, I wrote a simple Evaluator based on Roslyn scripting that executes the generated code, produces a syntax tree, gets its text and compares it to the original source:

             var sourceText = "class C{}";
            var generatedCode = new Quoter()
            {
                OpenParenthesisOnNewLine = false,
                ClosingParenthesisOnNewLine = false
            }.Quote(sourceText);
 
            var evaluator = new Evaluator();
            var generatedNode = evaluator.Evaluate(generatedCode) as CompilationUnitSyntax;
            var resultText = generatedNode.GetFullText();
            if (sourceText != resultText)
            {
                throw new Exception();
            }
 
            Console.WriteLine(generatedCode);

And here’s the source code of the Evaluator:

 using Roslyn.Compilers.Common;
using Roslyn.Compilers.CSharp;
using Roslyn.Scripting;
using Roslyn.Scripting.CSharp;
 
public class Evaluator
{
    private ScriptEngine engine;
    private Session session;
 
    public Evaluator()
    {
        engine = new ScriptEngine(
            importedNamespaces: new[] { "Roslyn.Compilers", "Roslyn.Compilers.CSharp" });
        session = Session.Create();
        session.AddReference(typeof(CommonSyntaxNode).Assembly);
        session.AddReference(typeof(SyntaxNode).Assembly);
    }
 
    public object Evaluate(string code)
    {
        var result = engine.Execute(code, session);
        return result;
    }
}

I tested it on round-tripping this little program and it seems to me that we’ve got all the C# 4.0 syntax covered.

A couple of tips. You can adjust OpenParenthesisOnNewLine and ClosingParenthesisOnNewLine boolean properties to configure how the generated code is formatted. I personally prefer this more verbose format because the nesting of the blocks is clearly visible:

 Syntax.CompilationUnit()
.WithMembers
(
    Syntax.List<MemberDeclarationSyntax>
    (
        Syntax.ClassDeclaration
        (
            Syntax.Identifier
            (
                @"C"
            )
        )
        .WithKeyword
        (
            Syntax.Token
            (
                SyntaxKind.ClassKeyword,
                Syntax.TriviaList
                (
                    Syntax.Space
                )
            )
        )
        .WithOpenBraceToken
        (
            Syntax.Token
            (
                SyntaxKind.OpenBraceToken
            )
        )
        .WithCloseBraceToken
        (
            Syntax.Token
            (
                SyntaxKind.CloseBraceToken
            )
        )
    )
)
.WithEndOfFileToken
(
    Syntax.Token
    (
        SyntaxKind.EndOfFileToken
    )
)

Another tip: you generally don’t need to generate whitespace trivia yourself. Just generate a tree without any whitespace trivia and then call SyntaxNode.NormalizeWhitespace() method that will automatically insert whitespace using the common C# formatting rules. NormalizeWhitespace() is quite a simple formatter. To use the full-blown feature-rich formatter used by Roslyn Services and Visual Studio to format code, you’ll need to additionally reference Roslyn.Services.dll and Roslyn.Services.CSharp.dll and then call the Format() extension method on the node.

The full source (Quoter.cs, 833 lines) is published over at https://code.msdn.microsoft.com/Roslyn-Code-Quoter-f724259e. I’ll be happy to answer any questions.

Comments

  • Anonymous
    November 02, 2012
    Great stuff. I hope it will be updated when necessary to keep it in sync with roslyn progress.

  • Anonymous
    February 17, 2013
    The comment has been removed

  • Anonymous
    February 17, 2013
    One way to do it would be to change the Quoter source code to also accept rewriting rules or transformations. And then you could add a rule such as "Replace all identifier nodes named "SDFIJWE" with this code", and after the Quoter parses the source code into a tree, apply a visitor to the tree that would rewrite it based on the rules. You can have custom rules as SyntaxRewriter and pass them to Quoter to transform the tree after parsing it.

  • Anonymous
    September 28, 2014
    I tried to change the code to get this working with latest Roslyn/Microsoft.CodeAnalysis, but was not successful.  Is it possible to share the same program that works with latest release. Thanks, Sreenath.

    • Anonymous
      October 20, 2016
      i too tried to get this working with the latest microsoft.CodeAnalysis but came unstuck at the point that the CSharp.Syntax class no longer exists! - is there an alternative class that i could use to reflect from in a similar way? have you got a more recent version?