Monday 6 February 2012

Build systems for the .NET Framework

When on 13th of February 2002 Microsoft released the first stable version of the .NET Framework, the ecosystem lacked an officially supported build platform. However, since early betas were available short after July 2000 Professional Developers Conference, a native solution – NAnt – emerged in August 2001, three months before the framework itself became officially available. But it was not until 7th of November 2005 that Microsoft presented it’s own tool : MSBuild. For two years the competing systems coexisted in the .NET world, as MSBuild was a new, and relatively unpolished, product. When on 19th of November 2007 a second version of MSBuild (labeled 3.5, to match the .NET Framework version it accompanied) was released, it brought multiple improvements that developers have asked for. The community’s focus switched from NAnt to the Microsoft solution, and NAnt 0.86-beta1, released on 8th of December 2007, was the last release for almost three years. Although NAnt development started again in April 2010, this long stagnation has led many of it’s previous users to believe the Open Source solution to be abandoned.

MSBuild 4.0 offers multiple improvements over NAnt: it ships with packaged Target files for commonly used project types, in accordance with “convention over configuration” paradigm; it has an ever-growing collection of community Tasks which perform various commonly executed build operations; it supports parallel builds; it integrates with Team Build (a Continuous Integration component of Microsoft Team Foundation Server) and other CI systems; and most importantly, it is used internally by Visual Studio, which presents most build options through a graphical user interface – developers creating a build project with the help of an IDE may not even be aware that MSBuild is being used underneath.

Nowadays MSBuild is the de facto standard tool for build automation in the .NET ecosystem. However, multiple projects still employ a legacy NAnt build system – the main problems preventing migration being complexity of the existing build infrastructure and supporting Mono, which, until 2.4 (released on 8th of December 2009), lacked an MSBuild implementation. Although the Mono version of MSBuild 3.5 is now relatively complete, version 4.0 is still virtually non-existent.

Pre-existing Solutions

Apart from the two already mentioned build platforms, there are several others. The first of them, dating way back into the Unix times, is called Autotools, officially known as the GNU Build System. The core of Autotools – make – was released in 1997. Although this system is widely used by projects developed in C or C++, such as the Mono runtime engine, it has no built-in support for .NET specific compilers, requiring a large amount of custom per-project work by developers. It also has a reputation of being convoluted and unfriendly, although extremely powerful.

Developers and users of other build system, such as CMake, Ant or Maven, had on numerous occasions undertaken efforts to enhance .NET support. Especially Maven community has spawned numerous .NET-targeted clones – NPanday, Byldan, NMaven – none of which has gained any traction. The only exception seems to be maven-dotnet-plugin, which delegates the build process back to MSBuild.

An interesting new tool that is worth mentioning is FAKE – F# Make. Although still very much an experimental project, started on 30th of March 2009, this tool is under active development by several contributors. It borrows heavily from ideas explored by Rake (written in and for the needs of projects in the ruby language), and allows users to describe the build process configuration in the same language they are using to write their code.

This post looks in depth at three existing build platforms employed on the .NET Framework: NAnt – which used to be the de facto standard, Microsoft Build – the officially supported tool, and FAKE – an interesting build tool employing an entirely different build description paradigm.

All three tools present the same basic functionality of a build platform: a project file contains tasks, enclosed in targets, which may have specified dependencies upon other targets. During the build process, those targets are first sorted topologically and then tasks within each target are executed in sequence. However, the structure of a project file differs greatly between tools.

NAnt

When Stefan Bodewig announced first official Ant release on 19th July 2000, the project already had undergone over a year of public development as part of the Tomcat servlet container, and had been used for a year before that as an internal tool at Sun Microsystems (under the name Another Neat Tool). In August 2001, Gerry Shaw made a decision to base the new .NET build platform on the existing Ant file syntax (initial code for .NET Beta 1 Ant clone was written by David Buksbaum of Hazware and released under the name of XBuild). Keeping with the open source tradition of self-recursive names, he aptly named this new tool NAnt, from NAnt is not Ant.

After almost ten years of separate development, NAnt’s Project.build is still difficult to distinguish from Ant’s build.xml file; the only obvious giveaway being the use of C#’s csc compiler task instead of Java’s javac. An absolutely minimal working NAnt build file looks as follows:

<project default="build">
  <target name="build">
    <csc target="exe" output="Hello.exe">
      <sources>
        <include name="*.cs" />
      </sources>
    </csc>
  </target>
</project>

This short example contains a single target (build), which in turn contains a single task, with a simple nested fileset. Executing this file starts the default target, which invokes the csc task to compile the code using the appropriate C# compiler.

NAnt projects consist of several basic entities: task, types, properties, functions and loggers. Tasks wrap fundamental operations, such as copying a file, performing source control operations or invoking the compiler. Types represent strongly typed parameters, are aware of their content and validate their correctness on creation. A fileset is perhaps the most often used type – it is a lazily evaluated collection of files (the sources element in the example above is a fileset). Properties can be used for storing text values that are used multiple times. They are evaluated in the place of their declaration. Functions, along with operators, can be used in any attribute value, and are evaluated when the attribute is read (usually upon task execution). Loggers are usually employed for reporting build progress to the user through various front-ends, but can also serve for tracking project execution for other purposes. NAnt ships with a large collection of predefined elements, additional ones can be either loaded from external assemblies or defined in-line using a script task. Scripts can be written in any .NET language that has a System.CodeDom.Compiler.CodeDomProvider available.

A more advanced example, showing properties, functions and global tasks (not enclosed inside a target):

<project>
  <property name="is-mono"
    value="${string::contains(framework::get-target-framework(), 'mono')}" />
  <property name="runtime-engine"
    value="${framework::get-runtime-engine(framework::get-target-framework()) }" />
  <echo message="Checking Mono version" if="${is-mono}"/>
  <exec program="${runtime-engine}" commandline="-V" if="${is-mono}" />
  <echo message="Using non-Mono runtime engine: '${runtime-engine}'"
    unless="${is-mono}" />
</project>

Global tasks are always executed in the order they are declared and are used for setting up the project. Functions and properties are evaluated inside ${} blocks, they can be distinguished by the fact that functions use :: to separate the prefix from the function name. Also visible in this example are the if and unless attributes which are available on every task and are used for conditional task execution.

While NAnt inherited Ant’s mature syntax, along with such brilliant constructs as a distinction between * (match in current directory) and ** (recursive directory match) for file inclusion/exclusion, it also inherited Ant’s deficiencies. The most glaring one is the inherent single threaded nature of the build process – although the engine itself can be relatively easily extended to invoke targets in parallel, existing build files rely on targets being executed sequentially.

Microsoft Build

MSBuild 2.0 (releases are numbered after the Microsoft .NET Framework they accompany, thus the first release is labeled 2.0, second – 3.5 and third – 4.0) was released on the 7th of November, 2005, as part of the Microsoft .NET 2.0 release. It came bundled as the default build tool for Visual Studio 2005. MSBuild’s initial design was similar to NAnt’s, but because at that time company policy forbade Microsoft employees from looking at the implementation of open source solutions (in order to prevent intellectual property violation claims), it does differ in many subtle ways.

Visual Studio 2005 used Microsoft Build for compiling C# and Visual Basic projects, all other solution types were still handled by the built-in mechanisms inherited from the 2003 release. Before version 2.0 MSBuild was not used internally by Microsoft, but as soon as it had reached the Release To Manufacturing stage, intense build process conversion effort has been launched, and by the early November 2005 it was already building about 40% of the Visual Studio project itself. This internal version added support for parallel builds (released to the general audience on 19th November 2007 as 3.5) and compiled all types of projects available in Visual Studio, including Visual C++ (this last feature was released on 12th April 2010 as part of 4.0 version). Another important improvement released with Visual Studio 2010 was a graphical debugging tool.

A minimalistic MSBuild’s Project.proj looks as follows:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003"
  DefaultTargets="Build">
  <Target Name="Build">
    <ItemGroup>
      <Compile Include="*.cs" />
    </ItemGroup>
    <CSC Sources="@(Compile)" OutputAssembly="Hello.exe"/>
  </Target>
</Project>

Although a different naming convention is used (uppercase identifiers instead of lowercase), this file shows great similarity to NAnt’s Project.build. It contains a single target, which in turn contains an item group and a task. The namespace definition is required and uses the same schema regardless of the MSBuild version. Executing this file starts the default target, named Build, which calls the CSC task to compile the code. File collections (and item groups in general) can be declared at target or project level, but (unlike NAnt) cannot be nested inside tasks (some tasks allow for embedding item groups and property groups, but this is rare behavior). Prior to version 4.0, items could not be modified once declared.

Despite being a valid MSBuild file, the above example would not be recognized by Visual Studio (and by most .NET developers). Instead of requiring the user to describe the whole build process verbosely, MSBuild offers .target files which allow “convention over configuration” approach to build process : user only specifies those settings and actions that differ from the default ones. MSBuild projects use .proj extension for generic build scripts, and language-specific extensions are used for files importing specific .targets (for example .csproj for C# projects). Thus, a minimal Project.csproj might be written as:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <ItemGroup>
    <Compile Include="*.cs" />
  </ItemGroup>
  <Import Project="$(MSBuildBinPath)\Microsoft.CSharp.targets" />
</Project>

By replacing an explicit invocation of the CSC task with an Import directive, this file inherits the whole build pipeline defined for Visual Studio, including automatic dependency tracking (should one declare Reference items), graphical user interface for configuring the build, targets for cleaning and rebuilding the assembly, and standardized extension points.

Basic entities in a MSBuild project are properties, items and tasks. Properties represent simple values. Items are untyped key-value collections, mostly used to represent files. Both types are evaluated as soon as they are encountered. They must be wrapped in groups, but this only allows them to share a Condition: properties cannot be bundled and items are always grouped by name (in the example above the ItemGroup generates items named Compile, one for each matching file). MSBuild has a mechanism named batching that splits items sharing a name according to a specified metadata value – when this is used, a task defined once will be executed for each batch of items separately. Item definitions allow setting default item metadata values. MSBuild, like NAnt, distinguishes between * (match inside current folder) and ** (recursive directory match). Loggers can be used for tracking project execution, but they must be attached from command line. There is a quite extensive task collection available out of the box, many of them are direct replacements for NAnt tasks. Since 4.0 it is also possible to define a task in-line with the help of UsingTask.

An example of using functions for evaluating task conditions (this example does not work as of Mono 2.10 because functions are still not implemented):

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003"
  InitialTargets="Info">
  <PropertyGroup>
    <IsMono>$(MSBuildBinPath.Contains('mono'))</IsMono>
    <RuntimeEngine>$(MSBuildBinPath)/../../../bin/mono</RuntimeEngine>
  </PropertyGroup>
  <Target Name="Info">
    <Message Text="Checking Mono version" Condition="$(IsMono)"/>
    <Exec Command="$(RuntimeEngine) -V" Condition="$(IsMono)"/>
    <Message Text="Using non-Mono runtime engine: '$(MSBuildBinPath)'"
      Condition="!$(IsMono)"/>
  </Target>
</Project>

Property values and functions are evaluated inside $() blocks, basic operators (such as ==) are also recognized outside of those markers. @() syntax is used for referencing collections of items and %() triggers the batching mode using item metadata. MSBuild 4.0 is keeping track of the actual underlying type of each property value and is able to invoke any .NET instance methods defined on such an object – however, because of security concerns, only methods marked as safe (number/date/string/version manipulation and file system read-only access) are available in scripts (this security mechanism can be disabled by setting the environment variable MSBUILDENABLEALLPROPERTYFUNCTIONS to 1). The syntax for method invocation comes from PowerShell – instance methods are called with a simple Value.Method(), while static methods can be invoked with [Full.Type.Name]::Method().

The MSBuild syntax draws heavily from NAnt and should feel quite familiar for any developer once one grasps how items differ from NAnt’s strongly typed collections. The tool is under active development, has extensive support from both Microsoft and the community, and – since Mono 2.4 was released on 8th of December 2009 – is usable as a cross-platform build system.

FAKE

Fake was published by Steffen Forkmann on the 1st of April, 2009. His goal was to create a build platform using the same language he wrote his programs in – F# (this trend is also observed in Ruby, Python and other languages which allow executable domain-specific languages to be defined at the language level). Three years later, Fake still remains more of an academic exercise than a widely deployed tool, but it does explore a very interesting approach to build management. Fake executes its scripts through the F# interpreter, extending the syntax of the language with three simple additions: defining build steps (Target? TargetName), declaring coupling between targets (For? TargetName <- Dependency? AnotherTargetName) and specification of default targets (Run? TargetName).

A basic build.fsx might look as shown in the listing below:

#I @"tools\FAKE"
#r "FakeLib.dll"
open Fake

Target? Default <-
  fun _ ->
    let appReferences  = !+ @"**.csproj" |> Scan
    let apps = MSBuildRelease @".\build\" "Build" appReferences
    Log "AppBuild-Output: " apps

Run? Default

First three lines import the Fake namespace from the FakeLib.dll file in directory tools\FAKE. Following them is a target definition, with a fileset wildcard match pipelined (using F#’s |> operator) to the Scan function, then a MSBuildRelease task invocation, log output, and finally – declaration of the default target. It should be noted here that Fake does not have built-in tasks for compiling code – it relies on the presence of MSBuild instead. There is also no need for a special in-line task definition syntax, as arbitrary F# code can be embedded anywhere in the script. This can be seen in the following example:

#I @"tools\FAKE"
#r "FakeLib.dll"
open Fake
open System

let isMono = Type.GetType ("Mono.Runtime") <> null
let stringType = Type.GetType ("System.String")
let corlibLocation = IO.GetDirectoryName (stringType.Assembly.Location)
let notMono = String.Format("Using non-Mono runtime engine: {0}", corlibLocation)

Target? Info <-
  fun _ ->
    if isMono then
      trace "Running on Mono"
    else
      trace notMono

Run? Info

The keyword let declares a F# variable, which is equivalent to property declarations used by NAnt and MSBuild. However, unlike those two tools, Fake allows the developer to invoke any .NET method, without security contraints.

As an experimental project, Fake does have some shortcomings. It does not execute its targets in parallel, although the code inside them can be easily parallelized. It also does not keep track of a target’s outputs up-to-date state, executing the target commands during every project rebuild, which makes it unsuitable for large projects. There is no support for using Fake under operating system other than Windows. And the F# language itself still remains exotic to most .NET developers, making the build scripts hard to understand and maintain.