The Default Rule Projections

Article
09/22/2010

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]

If you do not specify a custom projection, the language parser created by Microsoft code name “M” generates a default projection, which this topic describes.

Default Output Format

The default projection is a representation of the input that reflects the Abstract Syntax Tree (AST) that was generated by successfully parsing the input data. Because languages in general are sensitive to the order of their input, the output is in the form of a list, rather than a collection. The list is made up of nodes, which are themselves either lists or terminal literals.

The syntax rules of the grammar generate lists when they are successfully applied to input data. Each syntax rule in the grammar used in parsing the input generates a list that is labeled with the name of the syntax rule. The contents of the list are lists or terminal literals derived from the right side of the matching syntax rule.

In the default projection, the top node is always be labeled Main, because the top-most syntax rule is always named Main.

Thus the output displays the exact syntax tree that successfully matches the input. When developing and testing a custom language, seeing the syntax tree can be a valuable aid in debugging the design of the language.

Eventually in parsing language input, the parser arrives at literal values that match parts of the input. These literal values, or tokens, cause string values to be generated that are attached to the nodes for the syntax rule projections that include them.

Examples

Consider the following input text.

TYPE Name=System.String Access=public  
TYPE Name=System.Integer32 Access=private

This text could be for an application derived from the .NET Framework Reflection API that lists the types in an assembly, their names, and access property. A default projection might look like the following. Following this is the “M” code and an explanation of the contents of the projection.

Main[
  Types[
    [
      Type[
        "TYPE",
        Name[
          "Name=",
          NameValue[
            "System.String"
          ]
        ],
        Access[
          "Access=",
          "public"
        ]
      ],
      Type[
        "TYPE",
        Name[
          "Name=",
          NameValue[
            "System.Integer32"
          ]
        ],
        Access[
          "Access=",
          "private"
        ]
      ]
    ]
  ]
]

Note the list notation ("[" and "]" characters), and the fact that the entire list is labeled Main, which is due to the requirement that the top-most syntax rule be named Main.

The entire “M” code is shown at the end of this topic. The following code shows the first three syntax rules used in defining the language.

        syntax Main  = Types;
        syntax Types = Type+;
        syntax Type  = TypeLit Name Access ;

The Main list contains a single list named Types. The syntax rule Types specifies that it is made up of 1 to many Type nodes, each of which is made up of a TypeLit node followed by a Name node, followed by an Access node. Note that this structure is reflected in the output projection. There is a Types list that contains an unlabeled list that corresponds to its right side (Type+), and this unlabeled list contains two Type lists.

The right side of the Type rule references three additional rules, which are defined by the following code.

        token TypeLit   = "TYPE";
        syntax Name     = NameLit NameValue;
        syntax Access   = AccessLit AccessValue;

The first rule is a token rule, which recognizes the literal TypeLit. The following 2 rules are syntax rules. As a result, the projection for the Type rule is a Type list that contains the literal TypeLit, a Name list, and an Access list.

Likewise, each Name list consists of a Namelit literal and a NameValue list, and similarly for the Access list. We can repeat this process until we have a complete parse of the input text.

The Complete Code

module Types 
{
   language Parser
   {
        syntax Main = Types;
        syntax Types = Type+;
        syntax Type     = TypeLit Name Access ;

        token TypeLit   = "TYPE";
        syntax Name     = NameLit NameValue;
        token NameLit   = "Name=";
        syntax NameValue = chs;       
        syntax Access   = AccessLit AccessValue;
        token AccessLit = "Access=";
        token AccessValue = 
                            "public" 
                          | "private" 
                          | "internal" 
                          | "protected";
        token Char  = 
                    "A".."Z" 
                    | "a".."z" 
                    | "0".."9" 
                    | ".";
        token chs   = Char+;              
        token LF                = "\u000A";
        token CR                = "\u000D";
        token Space             = "\u0020";
        interleave Whitespace   = Space | LF | CR;       
   } 
}

Share via

The Default Rule Projections

Default Output Format

Examples

The Complete Code

See Also

Concepts

Additional resources