Freeing your Azure data with F# Type Providers
Editor’s note: The following post was written by Visual Studio and Development Technologies MVP Isaac Abraham as part of our Technical Tuesday series with support from his technical editor, Visual Studio and Development Technologies MVP Steffen Forkmann.
F# is a mature, open source, cross-platform, functional-first programming language. It empowers users and organizations to tackle complex computing problems with simple, maintainable and robust code. In this post, I want to discuss how we can use F# to reduce the friction and barrier to entry to dealing with cloud storage within .NET compared to conventional mechanisms that you might be used to.
F#, Type Providers and the Cloud
One of the features that I love showing people new to F# are Type Providers, not only because they are fantastically powerful, but also because they’re just plain awesome to demo! An F# type provider is a component that provides types, properties, and methods for use in your program without you needing to manually author and maintain these types. As we start to deal with more and more of these disparate – and distant - data sources – it’s crucial that we make accessing such systems as painless as possible. Azure Storage is one such system. It’s cheap, readily available and quickly scalable. Blobs and Tables are two elements of Azure Storage that we’ll review in this article.
Working with Blobs
When working with blobs in .NET, we normally use the .NET Azure SDK, which allows us to interrogate our Storage assets relatively easily. Here’s a C# snippet that shows how we might interrogate a container and a blob that has a well-known path: -
Of course, we’re having to use magic strings here. There’s no compile-time safety to ensure that the container or blob actually exists. Indeed, we can’t actually validate this until we run our application and reach this section of code, unless we resort to unit tests or perhaps copy and paste our code into LINQPad or similar.
The F# Azure Storage Type Provider solves all these problems in one go, by generating a strongly-typed object model at edit and compile time that matches the contents of your blob storage. Here’s how we would achieve the same code as above in F#: -
In two lines we can achieve the same thing in a completely strongly-typed manner. You won’t need to write a console test runner either – you can simply open an F# script file and start exploring your storage assets. We can’t mistype the name of a container because we get full IntelliSense as we “dot into” each level of blobs: -
And because this is statically typed, and checked at compile time, if the blob is removed from your container, your code will not even compile. Of course, if you do need to fall back to weak-typing for e.g. dynamically generated blobs etc., you can easily fall back to the standard SDK directly from within the Type Provider (as seen from the AsCloudBlobContainer() method above).
Working with large data sets
In the example above, we’re downloading the entire contents of the blob to our application. When working with large files in blob storage, this might be a problem, so the type provider allows us to treat text files as streams of lines: -
Here we’re lazily streaming a potentially large file, and reading just up until we find the first 10 lines that contain the word “alice” – we don’t have to download the entire file, and we are using standard sequence query functionality such as filter and take (you can think of Seq as equivalent to LINQ’s IEnumerable extension methods).
Working with Tables
Tables are another Storage component that is simple and relatively easy to reason about. Cheap and lightweight, it’s a good way to start storing and querying tabular data in Azure. The trade-off is that it contains relatively few computational features e.g. Tables do not allow relationship or aggregations. Here’s how we might query a table structure that looks like this: -
The need for stronger typing
If we wish to query this using the standard SDK, we’ll need to manually create a POCO that implements ITableEntity, or inherits from TableEntity, and have properties that match the table type (again, we’ll only know if this is correct at runtime). Then, we need to create a query. The Azure SDK is somewhat inconsistent here in that you can create queries in several ways, and none of them are particularly satisfactory.
Firstly, we can use the weakly-typed TableQuery builder class to manually build the Azure Table query string – this offers us little or no compile-time safety whatsoever. Alternatively, we can use the TableQuery<T> query builder. Unfortunately, this API is somewhat awkward to use in that you can create it in two different ways – and depending on how you construct it, certain methods on the class must not be called. Failing to adhere to this will lead to runtime exceptions: -
There’s also an IQueryable implementation for tables. This suffers from the fact that you can generate compile-time safe queries that will fail at runtime as Azure Tables offer an extremely limited query feature set, so it’s extremely easy to write a query that compiles, but at runtime will result in an exception: -
Smarter and quicker Tables with F#
Again, it’s the F# Type Provider to the rescue. Firstly, we don’t need to worry about the hassle of navigating to a specific table, nor about manually building a POCO to handle the incoming data – the TP will create all this for you based on the schema that is inferred from the EDM metadata on the table, so we can immediately get up and running with any table we already have: -
This will output to the F# REPL within Visual Studio the following: -
We also have access to a strongly typed Azure Table query DSL that is statically generated based on the schema of the table. This is guaranteed to only generated queries that are supported by the Azure Table runtime, yet also gives us an IQueryable-like flexibility: -
Notice that query methods for each field are typed for that field – Cost takes in floats; Team takes in strings etc. etc. so there’s no chance of supplying data of the incorrect type at runtime.
Conclusion
Using the Storage Type Provider allows us to point to any Azure Storage account that you might already have and start working with your data in less than a minute and change the way we start interacting with our Azure Storage assets.
Download the Azure Storage Type Provider via NuGet, create a F# script file, provide your connection string, and then just use Visual Studio (or Visual Studio Code) to immediately start navigating through your storage assets.
There’s no need to leave your IDE to an external tool – you can continue with your standard workflow, using an F# script to explore your data. When you’re happy with your code, you can easily move this into a full F# assembly which can be called from C# as part of your existing solution.
More than just using a REPL and a script though, the combination of F# and the Storage Type Provider gives us an unparalleled experience through a stronger type system that lets us be more productive and confident when working with Azure cloud assets.
About the author
Isaac is an F# MVP and a .NET developer since .NET 1.0 with an interest in cloud computing and distributed data problems. He nowadays lives in both the UK and Germany, and is the director of Compositional IT. He specializes in consultancy, training and development, helping customers adopt high-quality, functional-first solutions on the .NET platform. You can follow his blog here.