Custom Rule Projections

Article
09/22/2010

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]

When your language's parser succeeds in parsing an input stream, it builds an abstract syntax tree (AST), which is a hierarchical representation that shows each syntax or token rule used to parse the input. The parser generates an output stream, based on the rules that occur in the AST. You can customize this output by specifying projections for syntax and token rules. A projection describes how to transform the part of the input stream that matched a rule into an output.

A Simple Example

The following example is of a projection. Note that the => operator in the Main syntax rule indicates the start of the projection definition.

module test
{
    language something
    {
        syntax Main = a:A => Inputs{a};
        token A     = 'X'#2..4;
    }
}

This language successfully parses any string that consists of 2 to 4 "X" literals. When the string "XX" is input, the following output is generated.

Inputs{
  "XX"
}

If you remove the rule projection (the string "=> Inputs{a}" in the Main rule), the following output is produced by the default projections.

Main[
  "XX"
]

Projection Syntax

A syntax or token rule consists of one or more productions. Each production specifies one or more terms that the input stream is compared to. As the parser reads the input stream, it tries to match parts of the stream against any of the productions in any of the grammar rules.

A production may specify a projection, which describes how to create the output from the part of the input stream that matched that a production's pattern declaration. If no projection is specified, then a default projection creates the output (see The Default Rule Projections).

A projection specifies how to transform the input data to arrive at the desired output. Typically you derive the output from the terms in the syntax or token rule production that recognized part of the input stream. You bind variables to each term in the production that you want to reference in the projection. The projection then consists of several of these variables, often arranged in a list or collection. And the special functions valuesof, labelof, or id can be applied to these variables. The entire output that is derived from the input can be highly nested, because of the hierarchical nature of language syntax.

If present, a projection immediately follows the pattern declaration in a syntax rule production, and consists of the => operator, followed by a Term Projection, which is the body of the projection. The Term Projection consists of one or more terms and each term can be one of the following:

An atom, which can be one of the following:
- A literal.
- A reference to a variable that has been bound to one of the terms in the production pattern declaration.
- An operation on a reference.
A list with an optional label.
A collection with an optional label.

Customizing Your Output

One way to define a projection is to use code name “Intellipad” tool to examine the output created by the default projection and customize it to your requirements. There are several ways to customize the output of the default projection:

Naming and renaming nodes.
Consolidating node content with the valuesof() ("M" Reference) function.
Removing and adding nodes.
Changing the graph structure from a list to a collection.

For a description of the default projection output format, see The Default Rule Projections.

To illustrate this topic, input data is used that lists the types in a .NET assembly, their names, and access properties. A DSL is created that transforms the input into a format that can be loaded into the repository. The following sample data is used as input to the custom language.

TYPE Name=myString Access=public 
TYPE Name=myInteger Access=private

The complete code for this application appears at the end of this topic. As the method used to customize the default projection's output is illustrated, more and more of this code is examined.

Renaming Nodes

Frequently you may want to rename a node to improve legibility or to conform to the standards of an application that accesses the output. A very common requirement is to change the name of the default Main node to something more meaningful.

The starts with the following code.

        syntax Main     = Types;
        syntax Types    = Type+;
        syntax Type     = TypeLit Name Access;

The default projection produces the following output fragment.

Main[
  Types[
    [
      Type[
        "TYPE",
        Name[
//  More here...

The following code change shows how to rename the Main node to AssemblyTypes.

        syntax Main = at:Types => AssemblyTypes[at];

Note the binding of the at variable to the Types rule reference and, to the right of the => operator, the new name AssemblyTypes, and the list operator [] that brackets the bound variable at. The result is the output fragment.

AssemblyTypes[
  Types[
    [
      Type[
        "TYPE",
        Name[
//  More here...

Consolidating Lists

If you use any of the quantifier operators, a new, unnamed list is inserted after the value being quantified. You may want to suppress this extra list. The + operator on the Type reference in the Types rule in the preceding code causes this extra list to appear. Note the two bracket symbols following the Types label in the preceding section's output fragment.

The following code fragment shows how to remove the extra level of brackets. The valuesof() ("M" Reference) operator removes the extra list.

        syntax Main    = at:Types => AssemblyTypes[at];
        syntax Types   = ts:Type+ => Types[valuesof(ts)];
        syntax Type    = TypeLit Name Access ;

This code produces the following output fragment:

AssemblyTypes[
  Types[
    Type[   // one or more Type nodes, comma separated
...      // Contents of Type node goes here
      ]
   ]
]

Changing Graph Structure to a Collection

The default projection creates a list structure, as indicated by the bracket operators []. Often the output structure is required to be a collection to be compatible with the relational data model, where the order of the input does not matter. To make this happen, do the following:

For any syntax rule production that has no projection, create one. The projection can specify a collection made up of variables bound to the pattern terms from the production and the collection can be labeled with the name of the syntax rule.
Change all instances of list operators to collection operators {};

If this procedure is applied to the code sample, the following fragment is generated.

        syntax Main     = at:Types => AssemblyTypes{at};
        syntax Types    = ts:Type+ => Types{valuesof(ts)};
        syntax Type     = tl:TypeLit n:Name a:Access => Type{tl, n, a};

This code fragment generates the following output fragment.

AssemblyTypes{
  Types{
    Type{

Removing Nodes

Often a language contains rules that require the input to contain character strings or keywords for structural or validation reasons, but the keyword is not really required in the output. For example, the following code fragment generates, for each type, a type literal, and name and access nodes.

        syntax Main     = at:Types => AssemblyTypes{at};
        syntax Types    = ts:Type+ => Types{valuesof(ts)};
        syntax Type     = tl:TypeLit n:Name a:Access => Type{tl, n, a};
        token TypeLit   = "TYPE";
        syntax Name     = nl:NameLit nv:NameValue => Name{nl, nv};
        token NameLit   = "Name=";
        syntax NameValue = cc:chs => NameValue{cc};       
        syntax Access   = al:AccessLit av:AccessValue => Access {al, av};
        token AccessLit = "Access=";
        token AccessValue = 
                            "public" 
                          | "private" 
                          | "internal" 
                          | "protected";

It produces the following output fragment.

AssemblyTypes{
  Types{
    Type{
      "TYPE",
      Name[
        "Name=",
        NameValue[
          "System.String"
        ]
      ],
      Access[
        "Access=",
        "public"
      ]
    },
    Type{
      "TYPE",
//  etc.

The string "TYPE", which is generated from the TypeLit node, contributes nothing to the meaning of the output, because the collection it is part of is already labeled Type, so it is reasonable to delete it. To remove a node, omit it from the list of nodes in the custom projection. So to suppress creation of the TypeLit node, change the Type syntax rule to the following.

        syntax Type     = tl:TypeLit n:Name a:Access => Type{n, a};

Now the following output is generated.

AssemblyTypes{
  Types{
    Type{
      Name[
        "Name=",
        NameValue[
          "System.String"
        ]
      ],
      Access[
        "Access=",
        "public"
      ]
    },
    Type{    
      Name[
// etc

Note that it would be just as reasonable to suppress the "Name=" and "Access=" literals the same way.

Adding Nodes

The following code shows how we can easily add nodes.

        syntax Main     = at:Types => AssemblyTypes{at};
        syntax Types    = ts:Type+ => Types{valuesof(ts)};
        syntax Type     = tl:TypeLit n:Name a:Access => ExtraNode {Type{tl, n, a}};

This code generates the following output.

AssemblyTypes{
  Types{
    ExtraNode{
      Type{

Complete Code Sample

The following is the complete code sample used in this topic. All instances of lists have been changed to collections.

module Types 
{
   language Parser
   {
        syntax Main     = at:Types => AssemblyTypes{at};
        syntax Types    = ts:Type+ => Types{valuesof(ts)};
        syntax Type     = tl:TypeLit n:Name a:Access => ExtraNode {Type{n, a}};
        token TypeLit   = "TYPE";
        syntax Name     = nl:NameLit nv:NameValue => Name{nl, nv};
        token NameLit   = "Name=";
        syntax NameValue = cc:chs => NameValue{cc};       
        syntax Access   = al:AccessLit av:AccessValue => Access {al, av};
        token AccessLit = "Access=";
        token AccessValue = 
                            "public" 
                          | "private" 
                          | "internal" 
                          | "protected";
        token Char  = 
                    "A".."Z" 
                    | "a".."z" 
                    | "0".."9" 
                    | ".";
        token chs   = Char+; 
        token echs  = chs "@" chs;
              
        token LF                = "\u000A";
        token CR                = "\u000D";
        token Space             = "\u0020";
        interleave Whitespace   = Space | LF | CR;       
   } 
}

This code generates the following output.

AssemblyTypes{
  Types{
    ExtraNode{
      Type{
        Name{
          "Name=",
          NameValue{
            "System.String"
          }
        },
        Access{
          "Access=",
          "public"
        }
      }
    },
    ExtraNode{
      Type{
        Name{
          "Name=",
          NameValue{
            "System.Integer32"
          }
        },
        Access{
          "Access=",
          "private"
        }
      }
    }
  }
}

Share via