Tokens: Handling Variable Fields (MGrammar)
[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]
This tutorial builds on the preceding ones, Hello World (MGrammar), and Handling Spaces (MGrammar). In this tutorial, you tokenize the input: you use the token keyword to identify fixed parts of the input text (keywords or field names), and to specify required conditions imposed on variable parts of the text (field values). You also learn how to make the domain specific language (DSL) accept more than a single line of input text.
This tutorial builds a DSL that parses a text file that consists of object model information that can be generated by using the .NET Framework Reflection API against an assembly. The text file obeys the following rules:
Each line starts with the string
TYPE
.The line contains the string
Name=
followed by the name of the type.After the
Name
field, there is a stringAccess=
followed by one of the following strings: public,private
,protected
,internal
.The
Access
field is followed by a stringEmail=
followed by an e-mail address.
A typical file might look like this.
TYPE Name=System.String Access=public Email=janedoe@contoso.com
TYPE Name=System.Integer32 Access=private Email=bbrown@contoso.com
TYPE Name=System.Byte Access=public Email=johndoe@contoso.com
TYPE Name=System.Boolean Access=public Email=janedoe@contoso.com
You will learn to do the following in this tutorial:
Add token rules that enable you to identify the keywords in the input.
Add token rules that define the allowable characters for variable fields.
Modify the grammar rules so that the DSL parses multiple input lines.
To show a DSL that recognizes one line of input
The following DSL was created in the preceding tutorial (Handling Spaces (MGrammar). It parses a single line of input, and allows an arbitrary number of spaces between the fields. Open Intellipad and add the following code.
module Types { language Parser { syntax Main = Type Name Access Email; syntax Type = "TYPE"; syntax Name = "Name=System.String"; syntax Access = "Access=public"; syntax Email = "Email=janedoe@contoso.com"; syntax Space = "\u0020"; interleave Whitespace = Space; } }
To test this language, enter the following line into the left "DSL Input Mode" pane.
TYPE Name=System.String Access=public Email=janedoe@contoso.com
Change the input (in the left-most pane) by changing the type name to
System.Integer32
. Note that errors are generated. Press CTRL+Z to restore the previous valid input.
To handle different field values
First we break the Name rule apart, into a token rule that identifies the "Name=" string, and another token rule that specifies what a name should look like. We add the first token rule with the following code.
token NameLit = "Name=";
Next, define what a
Type
name must look like by defining what characters are allowed. There must be alphabetic and numeric characters, and for types you must also use the "." character. The following rule says that a character consists of any of those.token Char = "A".."Z" | "a".."z" | "0".."9" | ".";
Note the use of the "|" character to specify "or", and the use of the range operator "..". We can specify a string of these characters by using the "+" operator in the following code.
token chs = Char+;
The
chs
rule recognizes a string of one or more of the characters allowed by theChar
rule.Next, specify that the value of a name must conform to the
chs
rule, with the following code.syntax NameValue = chs;
Finally, change the
Name
rule to the following.syntax Name = NameLit NameValue;
Note that you can now change the value of the
Name
field without generating errors.Now apply this procedure to the
Access
field. However, instead of allowing theAccess
field to be as unconstrained as theName
field, restrict it to being one of a set of values. The result is the following code.syntax Access = AccessLit AccessValue; token AccessLit = "Access="; token AccessValue = "public" | "private" | "internal" | "protected";
Finally, do the same thing to the
Email
field, resulting in the following code.syntax Email = EmailLit EmailValue; token EmailLit = "Email="; syntax EmailValue = chs;
Note that this code generates errors because the character "@" is not allowed. That character does not appear in type names, so replace the
EmailValue
rule in the preceding fragment with the following code. Note that this is not a general parser for e-mail addresses, which can be considerably more complex.token echs = chs "@" chs; syntax EmailValue = echs;
To handle multiple lines of input
Replace the input in the left pane with the following code. Note the errors that are generated: the parser does not recognize the "return" character ("\r").
TYPE Name=System.String Access=public Email=janedoe@contoso.com TYPE Name=System.Integer32 Access=private Email=bbrown@contoso.com
Now change the
interleave
statement to handle returns, and also line feeds ("\l"), with the following code, which replaces the existinginterleave
statement.token LF = "\u000A"; token CR = "\u000D"; interleave Whitespace = Space | LF | CR;
Now the error panel says that the text "TYPE" is unexpected in the 2nd line of input text. This is because the
Main
rule defines a single type, whereas you really want to specify a collection of one or more types. Replace theMain
andType
rules with the following code.syntax Main = Types; syntax Types = Type | Types Type; syntax Type = TypeLit Name Access Email; token TypeLit = "TYPE";
Note the
Types
rule: this is a common grammar usage for specifying one or more of something.
Example
The following is the complete “M” code used in this tutorial.
module Types
{
language Parser
{
syntax Main = Types;
syntax Types =
Type
| Types Type;
syntax Type = TypeLit Name Access Email;
token TypeLit = "TYPE";
syntax Name = NameLit NameValue;
token NameLit = "Name=";
syntax NameValue = chs;
syntax Access = AccessLit AccessValue;
token AccessLit = "Access=";
token AccessValue =
"public"
| "private"
| "internal"
| "protected";
syntax Email = EmailLit EmailValue;
token EmailLit = "Email=";
syntax EmailValue = echs;
token Char =
"A".."Z"
| "a".."z"
| "0".."9"
| ".";
token chs = Char+;
token echs = chs "@" chs;
token LF = "\u000A";
token CR = "\u000D";
token Space = "\u0020";
interleave Whitespace = Space | LF | CR;
}
}