Partager via


Lightweight syntax option in F# 1.1.12.3

We're glad to announce that F# 1.1.12.3 supports the optional use of lightweight syntax through the use of whitespace to make indentation significant.  At the time of this release this is an experimental feature, though it is assumed that its use will become widespread.

The F# indentation-aware syntax option is a conservative extension of the explicit language syntax, in the sense that it simply lets you leave out certain tokens such as in and ;; by having the parser take indentation into account. This can make a surprising difference to the readability of code. 

[ Note: This feature is similar in spirit to the use of indentation by Python and Haskell, and we thank Simon Marlow (of Haskell fame) for his help in designing this feature and sketching the implementation technique. We also thank all the F# users at MSR Cambridge who've been helping us iron out the details of this feature. ]

Compiling your code with the indentation-aware syntax option is useful even if you continue to use explicit tokens, as it reports many indentation problems with your code and ensures a regular, clear formatting style. The F# library is written in this way.

In this article we call the indentation-aware syntax option the "light" syntax option. It is also occasionally called the "hardwhite" or "white" option (because whitespace is "hard", i.e. significant as far as the lexer and the parser is concerned).

The light syntax option is enabled using the #light directive in a source file. This directive scopes over all of the subsequent text of a file.

When the light syntax option is enabled, comments are considered pure whitespace. This means the indentation position of comments is irrelevant and ignored. Comments act entirely as if they were replaced by whitespace characters.

TAB characters may not be used when the light syntax option is enabled. You should ensure your editor is configured to replace TAB characters with spaces, e.g. in Visual Studio 2005 go to "Tools\Options\Text Editor\F#\Tabs" and select "Insert spaces".

Using the light syntax option makes code clearer by doing three things:

  • Fewer tokens. Nearly all end-of-line separator tokens become optional in well-indented code. In particular, ;; , in and ; tokens can generally be omitted.

  • Clearer disambiguation. It uses indentation to disambiguate the parsing of certain constructs, e.g. nested if/then/else blocks and nested match blocks. This greatly reduces the number of parentheses in code with nested branching constructs.

  • Sanity checks. It applies additional sanity checks on formatting, reporting places where "undentation" has been used. Unindentation is where a language construct has been used at a column position that is "undented" from an enclosing construct, which breaks the important principle that nested constructs appear at increasing column positions. Some manifestations of undentation are permitted in certain positions in the language syntax.

The basic rules applied when the light syntax option is activated are shown below, illustrated by example.

 // When the light syntax option is // enabled top level expressions do not// need to be delimited by ';;' since every construct // starting at first column is implicitly a new // declaration. NOTE: you still need to enter ';;' to // terminate interactive entries to fsi.exe, though // this is added automatically when using F# // Interactive from Visual Studio.#lightprintf "Hello"printf "World"
 // Without the light syntax option the // source code must contain ';;' to separate top-level // expressions. //////printf "Hello";;printf "World";;
 // When the light syntax option is // enabled 'in' is optional. The token after the '=' // of a 'let' definition begins a new block, where // the pre-parser inserts an implicit separating 'in'// token between each 'let' binding that begins at // the same column as that token.#lightlet SimpleSample() =    let x = 10 + 12 - 3     let y = x * 2 + 1      let r1,r2 = x/3, x%3     (x,y,r1,r2)
 // Without the light syntax option 'in' // is very often required. The 'in' is optional when // the light syntac option is used.////let SimpleSample() =    let x = 10 + 12 - 3 in    let y = x * 2 + 1 in     let r1,r2 = x/3, x%3 in    (x,y,r1,r2)
 // When the light syntax option is // enabled 'done' is optional and the scope of // structured constructs such as match, for, while // and if/then/else is determined by indentation.#lightlet FunctionSample() =    let tick x = printf "tick %d\n" x     let tock x = printf "tock %d\n" x     let choose f g h x =         if f x then g x else h x     for i = 0 to 10 do         choose (fun n -> n%2 = 0) tick tock i     printf "done!\n" 
 // Without the light syntax option // 'done' is requiredlet FunctionSample() =    let tick x = printf "tick %d\n" x in     let tock x = printf "tock %d\n" x in     let choose f g h x =         if f x then g x else h x in     for i = 0 to 10 do        choose (fun n -> n%2 = 0) tick tock i     done;    printf "done!\n" 
 // When the light syntax option is // enabled the scope of if/then/else is implicit from // indentation.#lightlet ArraySample() =    let numLetters = 26     let results = Array.create numLetters 0     let data = "The quick brown fox"     for i = 0 to data.Length - 1 do         let c = data.Chars(i)          let c = Char.ToUpper(c)          if c >= 'A' && c <= 'Z' then             let i = Char.code c - Char.code 'A'              results.[i] <- results.[i] + 1    printf "done!\n" 
 // Without the light syntax option // 'begin'/'end' or parentheses are often needed // to delimit structured language constructslet ArraySample() =    let numLetters = 26 in     let results = Array.create numLetters 0 in     let data = "The quick brown fox" in     for i = 0 to data.Length - 1 do         let c = data.Chars(i) in         let c = Char.ToUpper(c)  in         if c >= 'A' && c <= 'Z' then begin            let i = Char.code c - Char.code 'A' in             results.[i] <- results.[i] + 1        end    done;    printf "done!\n" 

Undentation. In general, nested expressions must occur at increasing column positions in indentation-aware code, called the "incremental indentation" rule. Warnings or syntax errors will be given where this is not the case. However, for certain constructs "'undentation" is permitted. In particular, undentation is permitted in the following situations:

 // The bodies of functions may be undented// from the 'fun' or 'function' symbol. This means the 'fun' is// ignored when determining whether the body of the function // satisfies the incremental indentation rule. The block// may not undent further than the next significant construct.#lightlet HashSample(tab: Collections.HashTable<_,_>) =    tab.Iterate (fun c v ->         printf "Entry (%O,%O)\n" c v) 
 // The bodies of a '(' ... ')' or // 'begin' ... 'end' may be undented when the expressions// follow a 'then' or 'else'. They may not undent further // than the 'if'.#lightlet IfSample(day: System.DayOfWeek) =    if day = System.DayOfWeek.Monday then (        printf "I don't like Mondays"    )
 // Likewise the bodies of modules and module types// delimited by 'sig' ... 'end', 'struct' ... 'end' and // 'begin' ... 'end' may be undented, e.g.#lightmodule MyNestedModule = begin   let one = 1   let two = 2end

More details: offside lines and contexts. Indentation-aware syntax is sometimes called the "offside rule". This pleasant terminology comes from a 1965 paper where Peter Landin introduced the idea, and derives from football (soccer), where the last defending player causes an imaginary line to be drawn across the pitch, and if an attacker is beyond this line the referee will blow the whistle and call "offside!". In F# code offside lines occur at column positions. For example, a = token associated with let introduces an offside line at the column of the first token after the = token.

When a token occurs prior to an offside line, one of three things happens:

  • (1) enclosing constructs are terminated. This may result in a syntax error, e.g. when there are unclosed parentheses.

  • (2) extra delimiting tokens are inserted. In particular, when the offside line associated with the token after a do in a while...do construct is violated, a done token is inserted.

  • (3) an "undentation" warning or error is given, indicating that the construct is badly formatted. This is usually simple to remove by adding extra indentation and applying standard structured formatting to your code.

When a token occurs directly on an offside line, an extra delimiting token may be inserted. For example, when a token occurs directly on the offside line of a context introduced by a let, an appropriate delimiting separator token is inserted i.e. an in token.

Offside lines are also introduced by other structured constructs, in particular at the column of the first token after the then in an if/then/else construct, and likewise after try, else, -> and with (in a match/with or try/with) and with (in a type augmentation). "Opening" bracketing tokens ( , { and begin also introduce an offside line. In all these cases the offside line introduced is determined by the column number of the first token following the significant token. Offside lines are also introduced by let, if and module. In this cases the offside line occurs at the start of the identifier.

The "light" syntax option is implemented as a pre-parse of the token stream coming from a lexical analysis of the input text (according to the lexical rules above), and uses a stack of contexts. When a column position becomes an offside line a "context" is pushed. "Closing" bracketing tokens (" ) ", " } " and "end") automatically terminate offside contexts up to and including the context introduced by the corresponding "opening" token.

Here are some examples of the offside rule being applied to F# code:

 // 'let' and 'type' declarations in // modules must be precisely aligned.#lightlet x = 1 let y = 2  <-- unmatched 'let'let z = 3   <-- warning FS0058: possible                  incorrect indentation: this token is offside of                   context at position (2:1)
 // The '|' markers in patterns must align.// The first '|' should always be inserted. Note: a future revision// may also permit the optional complete omission of the '|' markers.#lightlet f () =     match 1+1 with     | 2 -> printf "ok"  | _ -> failwith "no!"   <-- syntax error

Enjoy!

Don and James for the F# team

Comments

  • Anonymous
    August 23, 2006
    We're very pleased to announce that F# 1.1.12.3 is available for download.
    This release incorporates...

  • Anonymous
    August 25, 2006
    Wonderful! It is rather easy to create buggy programs in Caml using nested match, and you will not get any warnings.

    Maybe an option is to still use "(, ), begin, end, in", and get a warning if the indentation isn't consistent with the "(, ), ..."

  • Anonymous
    May 27, 2007
    We kindly invite all of you hurting yourself an many others with this just another sharp language... Ouch! Stop hurting! Stop #!

  • Anonymous
    May 29, 2007
    thanks for the implementation notes at the end.  ive been a boo fan for a while, an indent-aware python-inspired dotnet language. although the parser has always supported indentation, theres a number of added features we're trying to coax our tokenizer into doing.  we still havent added the parsers routines to dump inline ndoc, and the interpretter still cannot process the backspace key (since characters are fed in via an S.I.Stream).  just two silly things, but both seem much more complex when you try actually effecting change.  i greatly enjoy hearing tales of other people modifying their compilers and systems to the benefit of coders and wrists everywhere, and its wonderful hearing the technical successes behind this growing and altogether-wonderful project.

  • Anonymous
    June 25, 2007
    Mattias, OCaml automatically indents code for you.

  • Anonymous
    August 28, 2008
    PingBack from http://scripts.mit.edu/~birge/blog/functional-programming-and-f-sharp-newton-basin-fractal-code/

  • Anonymous
    May 31, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=3374

  • Anonymous
    June 08, 2009
    PingBack from http://jointpainreliefs.info/story.php?id=715

  • Anonymous
    June 09, 2009
    PingBack from http://insomniacuresite.info/story.php?id=5373

  • Anonymous
    June 18, 2009
    PingBack from http://homelightingconcept.info/story.php?id=1150

  • Anonymous
    September 05, 2009
    Coming from Ocaml but having also programmed in Python, Boo and Haskell i have to say that i don't like that feature much for a caml like language. It makes sense for python and for haskell too but for staying close to ocaml it should stay optional and not be the default setting. Just saving a few in's here and there is not a good argument for introducing such a "feature". It also hides the fact that things declared with "let ... in" are defined for the actual scope (just using identing for this is not as clear). So i think it does not make the source code clearer but is more confusing even more for beginners. Ocaml folks have their own indentation rules mostly coming from the tuareg mode of emacs and if they switch to F# they will be a bit disapointed. A great strength of ocaml is the ability to indent the code as the programmer wishes. I tried the Visual Studio 2008 F# prerelease and was dismayed of all the "errors" i did get when writing code until i discovered this "feature". So please let it at most be optional and please not the default setting for writing F# code. A lot of people switching from ocaml or other languages will thank you a lot.