Writing Succinct and Correct Numerical Computations with F#
Applies to: Functional Programming
Authors: Yin Zhu
Get this book in Print, PDF, ePub and Kindle at manning.com. Use code “MSDN37b” to save 37%.
Summary: This article presents a brief survey of the F# language with the focus on the features that are used when writing numerical computations. It covers the F# development style, type safety, and units of measure, as well as the language's expressivity.
This topic contains the following sections.
- Introducing F# for Mathematical Programming
- Understanding the F# Development Process
- Declaring and Using Types in F#
- Writing Correct Code with Types
- Expressing Complex Problems
- Summary
- Additional Resources
- See Also
This article is associated with Real World Functional Programming: With Examples in F# and C# by Tomas Petricek with Jon Skeet from Manning Publications (ISBN 9781933988924, copyright Manning Publications 2009, all rights reserved). No part of these chapters may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, electrostatic, mechanical, photocopying, recording, or otherwise—without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews.
Introducing F# for Mathematical Programming
When writing programs to solve mathematical problems, the software engineer must translate concepts from mathematics into computer programs. If the gap between the two is large, the translation can be quite difficult. In imperative languages such as C++, C# or Java, mathematical equations need to be translated into commands that specify how to get the required results step by step.
Functional languages including F# use a more mathematical notation. Programs are written as equations that can be evaluated, which is a more natural approach for most of the numerical problems. In addition, F# provides a very succinct syntax, which makes the gap between mathematical equations and computer programs smaller.
Before discussing the syntax of F# in more detail, the following section gives a brief and high-level overview of the common development process used by F# programmers.
Understanding the F# Development Process
When developing some code, it is difficult to predict how the final version will look. This is not surprising, but only a few languages help with the transition from simple ideas to robust programs. In many ways, the F# language provides the solution to this problem in both the language and the associated tools.
It is easy to start writing F# code using the basic functional concepts, such as lists, and test and run the code immediately using F# Interactive. By the time the initial version of a program can be executed, most of the algorithms have already been tested. As the code evolves, it can easily be modified to be more robust and integrated with the .NET platform. The following section discusses the interactive development in F#.
Using the F# Interactive Environment
Some languages provide an interactive environment using a read-eval-print loop (REPL) console. Developers can write snippets of code and the console immediately evaluates them and prints the result. F# provides a command-line REPL tool as well, but the interactive environment is also integrated inside Visual Studio. This means that all of the code can be written in the Visual Studio editor (using the full code completion and syntax highlighting support) and then executed immediately in an integrated console. The screenshot in Figure 1 shows an example of this usage.
Figure 1. Using F# Interactive inside Visual Studio
The screenshot in Figure 1 demonstrates a simple scenario that shows how to develop a function that calculates a specified number of days in a year as a percentage. The application can be created using the following process.
Developing and Testing F# Code Interactively
The first step is to create sample inputs by setting total to 365 and days to 90. To do that, write the first two lines, select them, and send the input to F# Interactive using Alt+Enter. The values will appear in the F# Interactive window.
Try writing the first computation in the interactive window (100 / total * days). The result is 0, which is incorrect, because the calculation uses integer division, and 100/365 gives a zero. The problem can be fixed by converting numbers to floating points using the float function or by reordering the operations.
Next, write the computation correctly by first multiplying the number of days by 100. Once the calculation is corrected, wrap it into a reusable function calculatePercent.
To make the function available interactively, select it and hit Alt+Enter to compile and declare it in F# Interactive. Then call the function using the sample input to verify that it works as expected (calculatePercent 90).
These steps demonstrate how an interactive environment can help when developing code. Although the example is trivial, it was complex enough to demonstrate how the interactive testing can help to discover errors early.
Evolving from Simplicity to Robustness
F# programming always starts with simple code. The code from the previous snippet could hardly be shorter, but it is fully type checked. Calling calculatePercent with a string as an argument would be immediately highlighted as an error because the argument was inferred to be an integer. This simplicity is supported by other functional concepts including type declarations and list comprehensions that will be demonstrated later in this article. This simplicity makes it possible to try several different approaches to solve the problem and choose the best option early in the development process.
Unlike other languages popular for their simplicity, F# lives on the other side as well. It can be used for writing robust code that integrates with other .NET languages and uses object-oriented features. The usual process is to start with a simple codebase, refactor it, and wrap complete and tested algorithms in class types that can be used by other developers from F# as well as C#. This also means that the API design is clarified later in the process, after identifying the needed extensibility points. Postponing this decision makes it easier to develop code without unnecessary complexity.
The following section demonstrates the simplicity of the F# language as well as some of the more advanced feature such as type declarations.
Declaring and Using Types in F#
As already mentioned, F# is a statically typed language, which means that the type of all expressions is known at compile time. Thanks to type inference, the types are rarely written explicitly. This makes the F# syntax quite similar to numerical languages such as Matlab or R. However, the safety and robustness of the F# language is closer to C# or Java. This section demonstrates some of the basic F# language constructs and features that help with writing correct programs. This article presents just a brief overview that is focused on numerical computing. More detailed information can be found in articles referenced at the end of the document.
Working with Basic Types
Numerical code in F# is usually written using primitive data types such as sequences, lists, arrays, tuples, and records. Many of the numerical libraries also provide their own matrix and vector types, but they usually follow the same design as standard types such as lists and arrays. This section demonstrates some of the basic types and expressions.
Tuples can be used to store a fixed number of values of different types:
> let tuple = (true, 42);;
val tuple : bool * int = (true, 42)
> let (b, n) = tuple;;
val n : int = 42
val b : bool = true
> if b then n else -1;;
val it : int = 42
The syntax (true, 42) is used to create a tuple value. F# Interactive infers that the value type is bool * int, which denotes a tuple containing a Boolean and an integer. The same syntax that was used to create a tuple can be used to deconstruct a tuple and extract the individual components.
Lists can be used to store an arbitrary number of values of the same type. They are immutable, which means that lists cannot be changed once they are created. This makes it easier to write correct programs. (For more information, see the references at the end of the article.)
> let data = [ 3.1; 2.7; 2.5; 5.6; 6.4; 4.1 ];;
val data : float list = [3.1; 2.7; 2.5; 5.6; 6.4; 4.1]
> List.sum data / float (List.length data);;
val it : float = 4.066666667
> [ for d in data do
if d > 3.0 then yield d - 3.0 ];;
val it : float list = [0.1; 2.6; 3.4; 1.1]
The first command creates a list containing floating point values. Lists can be processed using functions from the List module such as List.sum or List.length. The second snippet shows how to use these functions (together with the float function, which converts any numeric type to a floating point number) to calculate an average. The last command uses list comprehensions to create a new list. The result subtracts 3 from the numbers that were originally larger than 3.
The next data type is an array. Arrays are similar to lists and can be created using the [| … |] notation. The main differences are that arrays can be modified and are more efficient for larger, randomly accessed data collections:
> let data = Array.init 10 (fun i -> sin (float i / 10.0 * 3.1415));;
val data : float [] =
[| 0.0; 0.3090081825; 0.5877702605; 0.8090006559;
0.951045063; 0.9999999989; 0.9510736937;
0.809055115; 0.5878452173; 0.3090963002|]
> data.[5] <- 1.0;;
val it : unit = ()
> data.[0 .. 5];;
val it : float [] = [| 0.0; 0.3090081825; 0.5877702605;
0.8090006559; 0.951045063; 1.0|]
The snippet uses a function Array.init to create an array. The function takes the desired length of the array and a function that is called to initialize every element of the array. The notation fun i -> … creates an unnamed (lambda) function that can be easily passed as an argument to other function. After initializing the array, the snippet uses the <- operator to modify the sixth value in the array and then uses array slicing to create a new array containing the first six elements of the array.
Writing Type Declarations
The previous section demonstrated the succinctness of F# when working with primitive data types that are already available in the language. Many numerical computations can be written just using these types. A two-dimensional point can be stored as a tuple and a polygon can be represented as a list of tuples.
As the code becomes more robust, it is useful to declare your own data types. This makes it possible to add a meaning to individual elements. A tuple may represent a vector as well as a range. When using a named type, there is no potential for confusion. Moreover, types declared in F# can also be used from C#.
To show how F# makes the writing of types easier, look at the following immutable vector type written in C#:
public class Vector {
readonly double x;
readonly double y;
public Vector(double x, double y) {
this.x = x;
this.y = y;
}
public double X {
get { return this.x; }
}
public double Y {
get { return this.y; }
}
}
The type provides a single constructor for creating Vector values that just copies the arguments to local fields, so that they can be accessed later. Then, it defines two properties for reading the coordinates. The Vector type is immutable, which means that a value of this type cannot be modified once it is created. In functional programming, this is a preferred way of declaring types. Some of the pros and cons of this approach are discussed below. In F#, the same type can be written as follows:
type Vector(x : float, y : float) =
member this.X = x
member this.Y = y
The F# version uses implicit constructor syntax, which means that parameters are written immediately after the type name. This makes the values x and y automatically accessible in the rest of the code. F# encourages programming with immutable types, so the declaration is very easy to write. For example, the member declaration is a succinct way for declaring read-only properties. A mutable vector would be more complicated to write, but it is still slightly simpler than the C# equivalent.
As a functional language, F# prefers the use of immutable types, but it is possible to define a mutable type when needed. Immutable types are particularly useful for high-level data structures that represent program data. As discussed in Overview: Understanding What a Program Does, immutable types make it easier to understand what a program does and also simplify testing. In addition, programs written using immutable types tend to be easier to parallelize.
The disadvantage of using immutable types is that operations for working with them need to create a new instance of the type. However, an operation doesn't need to create a deep copy of the object. The state that doesn't change can be reused and so the allocation isn't as high. A good example is a functional list, which is explained in Overview: Working with Immutable Data. For some purposes (such as representing a matrix), mutable types are more appropriate. When the mutation is used in a localized way, it doesn't break any of the nice properties of functional code. More details can be found in Chapter 10 of Real World Functional Programming.
Writing Correct Code with Types
The well-known benefits of using compile-time typing are that it prevents many common mistakes and the compiled code is more efficient. For example, languages that don't have a strict Boolean type, like C or C++, make it possible to write:
if (num = 10) { y = 41; }
The code can be compiled correctly, but it is most likely not what the programmer wanted to write. Instead of comparing num with 10, it assigns 10 to the variable! In a type-safe language like F# or C#, this confusion cannot happen because Boolean is a separate type.
In functional languages, types specify a lot of information, because they are very expressive. Thanks to the type inference, types are nonintrusive and do not have to be written explicitly. Additionally, types are not only useful for writing correct code but serve as valuable information:
For the developer, as part of the documentation.
For the IDE, which can use types to provide useful hints when writing the code.
The next section shows one example of a feature that nicely demonstrates the purpose of types in F#. The goal of types is to make sure that the code is correct as early as possible and to provide useful hints when writing it.
Working with Units of Measure
In 1999, NASA’s Climate Orbiter was lost because a part of the development team expected that distances were measured in meters and weight in kilograms, while the other part provided data in inches and pounds. This incident was one of the motivations for an F# feature called units of measure. Units of measure nicely demonstrate the overall goal of F# types: help the programmers to discover errors early and to write correct code.
The following listing demonstrates working with units in F# Interactive. It implements a calculation that tests whether a car violates the specified maximum speed limit:
> [<Measure>] type km
[<Measure>] type mile
[<Measure>] type h;;
(...)
> let maxSpeed = 50.0<km/h>
let actualSpeed = 40.0<mile/h>
val maxSpeed : float<km/h>
val actualSpeed : float<mile/h>
The snippet starts with declaring three unit types (km, mile and h). The next command creates two values (maxSpeed and actualSpeed) annotated with units. The first value is in kilometers per hour and the second is in miles per hour. As the output from F# Interactive shows, this information is captured in the type. The following snippet implements the calculation:
> if (actualSpeed > maxSpeed) then
printfn "Speeding!";;
Error FS0001: Type mismatch XE "type mismatch:units of measure" . Expecting a float<mile/h> but given a float<km/h>. The unit of measure 'mile/h' does not match the unit of measure 'km/h'
> let mphToKmph (speed:float<mile/h>) =
speed * 1.6<km/mile>;;
val mphToKmph : float<mile/h> -> float<km/h>
> if (mphToKmph actualSpeed > maxSpeed) then
printfn "Speeding!";;
Speeding!
The first command tries to compare the actual speed with the speed limit. In a language without units of measure, this comparison would be perfectly valid and the result would be false (because 40 is less than 50). In F#, the compiler reports an error because the type float<km/h> is a different from float<mile/h>!
To solve the problem, the next command implements a function that converts the speed from one type to another. The function takes an argument of type float<mile/h> and returns float<km/h>. It is implemented by multiplying the argument by a conversion factor with a unit km/mile. Using the conversion function, it is possible to perform the comparison. As the last command shows, 40 mph is more than 50 km/h.
If the code is included in a standalone (compiled) application, the error about units is reported during compilation, so an invalid program cannot be executed. Additionally, units are shown in Visual Studio tool tips. This is very helpful when writing programs. For example, if a value that should represent speed has a type float<km^2>, something is probably wrong. Finally, the verification of units is done at compile-time and there is no runtime cost to using units of measure.
Expressing Complex Problems
The previous section demonstrated that F# makes it possible to write succinct code but, at the same time, keep the code fully type-safe and robust. The previous examples were quite simple, so this section looks at some more complex problems.
When solving a complex problem (such as querying data sets), the best way is to create a library that makes it possible to express the problem using reusable primitives (such as filtering or iteration). Writing the solution to a particular problem is then easy—just compose several primitives from the library. This section demonstrates a few options for creating such libraries.
Processing Collections Using Functions
Programming is about abstraction. Why do languages introduce foreach loops? Iterating over all items of a collection is a common pattern that is used very frequently and can be captured by reusable language syntax.
Functional languages make it possible to capture many such patterns by combining functions. For example, foreach loop can be viewed as a function that calls the body (a function) for every element in the collection. In F#, it isn't necessary to modify the language in order to capture a new abstraction because a function such as foreach can be written in a library. The following snippet generates a multiplication table. It uses a function that captures a pattern of generating a two-dimensional array using a specified operation:
> let multiplicationTable =
Array2D.init 5 5 (fun i j -> (i+1) * (j+1))
;;
val multiplicationTable : int [,] =
[ [1; 2; 3; 4; 5]
[2; 4; 6; 8; 10]
[3; 6; 9; 12; 15]
[4; 8; 12; 16; 20]
[5; 10; 15; 20; 25] ]
The above example uses a function named init from the Array2D module. The function behaves like two nested loops and takes another function as an argument. The snippet uses anonymous function written as (fun i j -> (i+1) * (j+1)). The argument specifies that the value at the location i, j should be calculated using the expression (i+1) * (j+1).
In a language like C, this would have to be written explicitly using two nested loops. When used in 10 places, this can save quite a lot of writing. Of course, initializing an array is a simple operation. The following example implements several more complex operations using F# sequences:
> let fibs = Seq.unfold (fun (a,b) ->
Some(a+b, (b,a+b))) (1,1)
;;
val fibs : seq<int>
> fibs
|> Seq.takeWhile (fun x -> x <= 4000000)
|> Seq.filter (fun x -> x%2 = 0)
|> Seq.map (fun x -> 3*x + 1)
|> Seq.sum;;
val it : int = 13841207
The example first initializes an infinite sequence using the Seq.unfold function. The function captures a pattern of generating a sequence of numbers using some state. At every step, the function takes the previous state and calculates one number of the sequence and the new state. In the example, the state is a tuple containing two integers and the initial step is (1, 1). The function used to generate the next step is (fun (a,b) -> Some(a+b, (b, a+b)). When given a tuple with the current state, it sums the two previous numbers as the next element (a + b) and returns a new state (b, a+b) that contains the current and the previous Fibonacci numbers. The value is wrapped using Some to indicate that there is always a next element (returning None would terminate the sequence).
The numbers of the returned sequence fibs are not generated after it is declared. The value fibs is just a specification of how to calculate numbers when they are actually needed. This concept is called lazy computations and makes it possible to separate computational logic (generating Fibonacci numbers) from the details of the usage (how many numbers are needed).
The next part of the snippet builds on this sequence to express a more complex numerical calculation. It uses the pipelining operator (|>) to compose several processing steps. The result of one step is simply passed on to the next step. It uses Seq.takeWhile to specify that the calculation should only consider numbers smaller than 4 million. Then, it only takes even numbers and maps each number x to 3*x+1. Finally, it sums all of the numbers generated using this specification.
Aside from its conciseness, this piece of F# code is also very declarative. It hides the implementation details and it closely corresponds to how a mathematician would describe the problem. Writing the same code using explicit loops in a language like C would make the code longer and also more difficult to understand.
Writing Calculations Using Overloaded Operators
The previous snippet composed several processing steps using the pipelining operator (|>). It may be surprising, but the operator is not built in the F# language. Instead, it is declared in the standard F# library:
let (|>) input f = f input
The declaration specifies that the operator name is |>. When used, binary operators can be written using infix syntax. This means that the left argument is input and the right hand side argument is some function f. The operator simply calls the function with the input as an argument.
The F# language is very flexible and makes it possible to define large number of operators such as --> or <$>. Using these operators often makes numerical or symbolic code clearer, because the program code can closely follow the usual mathematical notation. To give another example, the next snippet extends the Vector type and adds the dot product operator (*) and point-wise multiplication (*.):
type Vector(x: float, y : float) =
member this.x = x
member this.y = y
/// Calculates dot product (scalar) of two vectors
static member ( * ) (v1: Vector, v2 : Vector) =
v1.x * v2.x + v1.y * v2.y
/// Calculates point-wise product of two vectors
static member ( .* ) (v1: Vector, v2 : Vector) =
new Vector(v1.x*v2.x, v1.y*v2.y)
In the previous example, the (|>) operator was defined as a standalone operator using let. The snippet above uses a static member to declare an operator that belongs to a specific type. The (*) operator will continue to be available as the standard multiplication operator for integers and floating point values, but can now be used as the dot product when working with vectors. The following F# Interactive session demonstrates the two operators:
> let a, b = Vector(0.5, 1.0), Vector(0.3, 2.0);;
(...)
> a * b;;
val it : float = 2.15
> a .* b;;
val it : Vector = FSI_0002+Vector { x = 0.15; y = 2.0; }
The first command creates two vector values using the constructor of the Vector type. The next two commands demonstrate how to use the overloaded operator for multiplication (*), which returns a scalar value and the custom operator for point-wise multiplication (.*) that returns a new vector of the Vector type. The F# Interactive prints the internal structure of the vector, but it is possible to specify a nicer formatting (using a global fsi.AddPrinter method).
Creating Languages Using F# Quotations
The expressions written in the previous sections were all executable code that can be evaluated to get some result. Most of the F# code is written in this way. Using higher-order functions, it is possible to write a very expressive library that can capture almost most typical patterns.
However, some problems require looking at an entire expression (or an equation) and analyzing it before it can be successfully evaluated. Some mathematical equations also do not specify how to calculate the result. Instead, they can specify a set of constraints that needs to be solved or a symbolic representation of some object. For this purpose, F# supports a feature called quotations:
let inequalities = <@ fun num ->
(num*num - 5.0*num - 3.0 > 0.0) &&
(0.1*num*num < 100.0) @>
In this example, the F# code is wrapped in <@ … @>. This means that the code will not be compiled as an ordinary F# program that can be executed. Instead, the source code is represented as a data (expression tree) and can be accessed at runtime. For example, the above snippet declares a quoted function that tests whether a value num satisfies certain constraints.
How could one write a library that finds some valid solution for num? A normal function can only be evaluated, so an algorithm could only implement a naive search. Code written as a quotation can be analyzed, so it is possible to write an algorithm that takes the inequalities value and finds a solution symbolically. The next section briefly introduces Microsoft Solver Foundation, which uses this approach in practice.
Solving Constraints Using F# Quotations
The use of F# quotations sketched in the previous section is almost exactly the approach used by an F# domain-specific language (DSL) for Microsoft Solver Foundation (MSF). This section gives a very high-level overview of a practical example that uses MSF to implement Support Vector Machines (SVM). A complete example can be found in a blog post referenced at the end of the article. However, the details are not important for now. The main purpose is to demonstrate the flexibility of the F# language.
Support Vector Machines can be formally written as a quadratic optimization problem (essentially, finding the maximum of a function that may include members x and x2). When given some sample inputs xi and expected results yi, the computer needs to find the values of variable αi to maximize an expression. In the case of SVM, the expression looks like this:
The values of x represent the inputs of the training set, and the ϕ function implements feature projection (returning inputs for the training algorithm). The y values describe the required result, and α are coefficients that the SVM needs to infer. The values of α need to be within a given range that specifies additional constraints for the solver:
Using F# quotations, it is possible to write this specification as ordinary F# code. The F# library for Microsoft Solver Foundation can then analyze the equations and use algorithms to solve quadratic optimization problems. The encoding of the equations is very close to the mathematical notation used above:
let svmSolver =
let index = [ 0..n-1]
Solver <@
qp()
let alpha = vararray1 index
maximise
(sum index (fun i -> alpha.[i]) -
(sum index (fun i -> sum index (fun j -> .
coef.[i,j] * alpha.[i] * alpha.[j]))))
where [
foreach index (fun i -> 0.0 <= alpha.[i])
foreach index (fun i -> alpha.[i] <= C)
sum index (fun i -> alpha.[i] * y.[i]) = 0.0 ]
@>
As mentioned before, the details of the example are not important, but it demonstrates several interesting points. In particular, the code enclosed in F# quotations doesn't have to be an executable F# program. The constructs such as qp, maximise, or sum are functions that do not have implementations. They are used only for writing mathematical formulas that Solver Foundation reads and solves.
Summary
This article introduced the F# language from the numerical computing point of view. It began with a brief discussion about the common F# development style. In F#, code can be written using simple concepts, tested interactively, and then encapsulated in robust .NET objects.
The first couple of examples demonstrated that F# is a very succinct language. Unlike other concise numerical languages, such as Matlab or R, the F# language uses static typing, which means that it can catch many errors early. The article demonstrated this using the F# support for checking units of measure.
Finally, the last part of the article focused on expressivity. Thanks to the support for higher-order functions, it is possible to capture a large number of common computational patterns such as iteration or filtering. Custom operators can be used to make code more readable. Finally, quotations can be used to evaluate code in a different way—for example, finding a solution to a set of constraints instead of just evaluating a function with a sample input.
Additional Resources
This article briefly introduced several features of the F# language. The main purpose was to demonstrate how numerical computations are written in F#. More information about F#, functional programming, and standard types can be found in the following articles:
Numerical computations in F# can be written using a wide range of libraries. The following articles discuss the most prominent F# and .NET libraries for numerical computing:
The last example demonstrated how to write a core part of Support Vector Machines using Microsoft Solver Foundation (MSF). The following articles include complete example and additional demos of using MSF:
Support Vector Machines (SVMs) in F# U (Yin Zhu's blog)
Solvers, Optimization and More on DSLs (Robert Pickering's blog)
To download the code snippets shown in this article, go to https://code.msdn.microsoft.com/Chapter-4-Numerical-3df3edee
See Also
This article is based on Real World Functional Programming: With Examples in F# and C#. Book chapters related to the content of this article are:
Book Chapter 10: “Efficiency of data structures” explains how to write efficient programs in F#. The chapter covers advanced functional techniques and explains how and when to use arrays and functional lists.
Book Chapter 12: “Sequence expressions and alternative workflows” contains detailed information on processing in-memory data (such as lists and seq<'T> values) using higher-order functions and F# sequence expressions.
Book Chapter 13: “Asynchronous and data-driven programming” explains how asynchronous workflows work and uses them to write an interactive script that downloads a large dataset from the Internet.
Previous article: Overview: Introduction to Numerical Computing in F#
Next article: Overview: Numerical Libraries for F# and .NET Framework