The Default Rule Projections
[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]
If you do not specify a custom projection, the language parser created by Microsoft code name “M” generates a default projection, which this topic describes.
Default Output Format
The default projection is a representation of the input that reflects the Abstract Syntax Tree (AST) that was generated by successfully parsing the input data. Because languages in general are sensitive to the order of their input, the output is in the form of a list, rather than a collection. The list is made up of nodes, which are themselves either lists or terminal literals.
The syntax rules of the grammar generate lists when they are successfully applied to input data. Each syntax rule in the grammar used in parsing the input generates a list that is labeled with the name of the syntax rule. The contents of the list are lists or terminal literals derived from the right side of the matching syntax rule.
In the default projection, the top node is always be labeled Main
, because the top-most syntax rule is always named Main
.
Thus the output displays the exact syntax tree that successfully matches the input. When developing and testing a custom language, seeing the syntax tree can be a valuable aid in debugging the design of the language.
Eventually in parsing language input, the parser arrives at literal values that match parts of the input. These literal values, or tokens, cause string values to be generated that are attached to the nodes for the syntax rule projections that include them.
Examples
Consider the following input text.
TYPE Name=System.String Access=public
TYPE Name=System.Integer32 Access=private
This text could be for an application derived from the .NET Framework Reflection API that lists the types in an assembly, their names, and access property. A default projection might look like the following. Following this is the “M” code and an explanation of the contents of the projection.
Main[
Types[
[
Type[
"TYPE",
Name[
"Name=",
NameValue[
"System.String"
]
],
Access[
"Access=",
"public"
]
],
Type[
"TYPE",
Name[
"Name=",
NameValue[
"System.Integer32"
]
],
Access[
"Access=",
"private"
]
]
]
]
]
Note the list notation ("[" and "]" characters), and the fact that the entire list is labeled Main
, which is due to the requirement that the top-most syntax rule be named Main
.
The entire “M” code is shown at the end of this topic. The following code shows the first three syntax rules used in defining the language.
syntax Main = Types;
syntax Types = Type+;
syntax Type = TypeLit Name Access ;
The Main
list contains a single list named Types
. The syntax rule Types
specifies that it is made up of 1 to many Type
nodes, each of which is made up of a TypeLit
node followed by a Name node, followed by an Access node. Note that this structure is reflected in the output projection. There is a Types
list that contains an unlabeled list that corresponds to its right side (Type+
), and this unlabeled list contains two Type
lists.
The right side of the Type
rule references three additional rules, which are defined by the following code.
token TypeLit = "TYPE";
syntax Name = NameLit NameValue;
syntax Access = AccessLit AccessValue;
The first rule is a token
rule, which recognizes the literal TypeLit
. The following 2 rules are syntax rules. As a result, the projection for the Type
rule is a Type
list that contains the literal TypeLit
, a Name
list, and an Access
list.
Likewise, each Name
list consists of a Namelit
literal and a NameValue
list, and similarly for the Access
list. We can repeat this process until we have a complete parse of the input text.
The Complete Code
module Types
{
language Parser
{
syntax Main = Types;
syntax Types = Type+;
syntax Type = TypeLit Name Access ;
token TypeLit = "TYPE";
syntax Name = NameLit NameValue;
token NameLit = "Name=";
syntax NameValue = chs;
syntax Access = AccessLit AccessValue;
token AccessLit = "Access=";
token AccessValue =
"public"
| "private"
| "internal"
| "protected";
token Char =
"A".."Z"
| "a".."z"
| "0".."9"
| ".";
token chs = Char+;
token LF = "\u000A";
token CR = "\u000D";
token Space = "\u0020";
interleave Whitespace = Space | LF | CR;
}
}