Tuesday, 6 March 2012

Converting NAnt build files to MSBuild projects

TL;DR: I have a NAnt-to-MSBuild converter available at https://github.com/skolima/generate-msbuild.

Initially, I envisioned to implement as faithful translation of the build script as possible. However, after examining the idioms of both NAnt and MSBuild scripts I decided that a conversion producing results in accordance with those established patterns is a better choice. Investigating the build process of available projects revealed that converting the invocation of the csc task is enough to produce a functional Visual Studio solution. Translating tasks such as mkdir, copy, move or delete, while trivial to perform, would be actually detrimental to the final result. Those tasks are mostly employed in NAnt to prepare the build environment and to implement the “clean” target – the exact same effect is achieved in MSBuild by simply importing the Microsoft.CSharp.targets file. In a .csproj project conforming to the conventional file structure, such as is generated by the conversion tool, targets such as “PrepareForBuild” or “Clean” are automatically provided by the toolkit.

I planned to use the build listener infrastructure to capture the build process as it happends. The listener API of NAnt is not comprehensively documented, but exploring the source code of the project provides examples of its usage. Registering an IBuildListener reveals some clumsiness that suggest this process has not seen much usage:

protected override void ExecuteTask()
  Project.BuildStarted += BuildStarted;
  Project.BuildFinished += BuildFinished;
  Project.TargetStarted += TargetStarted;
  Project.TargetFinished += TargetFinished;
  Project.TaskStarted += TaskStarted;
  Project.TaskFinished += TaskFinished;
  Project.MessageLogged += MessageLogged;

  // this ensures we are propagated to child projects

Last line of this code sample is crucial, as it is a common practice to split the script into multiple files, with a master file performing initial setup and separate per-directory build files, one for each output assembly. This allows shared tasks and properties to be defined once in the master file and inherited by the child scripts. Surprisingly, build listeners registered for events are not passed to the included scripts by default.

Practically every operation in the NAnt build process is broadcasted to the project’s listeners, with *Started events providing opportunity to modify the subject before it is executed and *Finished events exposing final properties state, along with information on step execution status (success or failure). Upon receiving each message the logger is able to access and modify the current state of the whole project.

Typical MSBuild use case scenarios

I have inspected several available open source projects to establish common MSBuild usage scenarios. I determined that although the build script format allows for deep customization, most users do not take advantage of this, instead relying on Visual Studio to generate the file automatically. One notable exception from this usage pattern is NuGet, which employs MSBuild full capabilities for a custom deployment scenario. However, in order to comply with the limitations that the Visual Studio UI imposes on the script authors, the non-standard code is moved to a separate file and invoked through the BeforeBuild and AfterBuild targets.

Thus, in practice, users employ the convenience of .targets files “convention over configuration” approach (as mentioned in the previous post) and restrict the changes to those that can be performed through the graphical user interface: setting compiler configuration property values; choosing references, source files and resources to be compiled; or extending pre- and post-build targets. When performing incremental conversion, those settings are preserved, so the user does not need to edit the build script manually.

The only exception to this approach is handling of the list of source files included in the build: it is always replaced with the files used in the recorded NAnt build. I opted for this behavior because it is coherent with what developers do in order to conditionally exclude and include code in the build – instead of decorating Item nodes with Condition attributes, they wrap code inside the source files with #if SYMBOL_DEFINED/#else/#endif preprocessor directives. This technique is employed, for example, in the NAnt build system itself and has been verified to work correctly after conversion. It has the additional benefit of being easily malleable within the Visual Studio – conditional attributes, on the other hand, are not exposed in the UI.

NAnt converter task

Because I meant the conversion tool to be as easy to use for the developer as possible, I have implemented it as a NAnt task. It might be even more convenient if the conversion was available as a command line switch to NAnt, but this would require the user to compile a custom version of NAnt instead of using it as a simple, stand-alone drop-in. To use the current version, you just have to add <generate-msbuild/> as the first item in the build file and execute a clean build.

As I shown in my previous post, Microsoft Build project structure is sufficiently similar to NAnt’s syntax that almost verbatim element-to-element translation is possible. However, as the two projects mature and introduce more advanced features (such as functions, in-line scripts and custom tasks), the conversion process becomes more complex. Instead of shallow translation of unevaluated build variables, the converter I designed captures the flow of the build process and maps all known NAnt tasks to appropriate MSBuild items and properties. The task registers itself as a build listener and handles TaskFinished and BuildFinished events.

Upon each successful execution of a csc task, its properties and sub-items are saved as appropriate MSBuild constructs. When the main project file execution finishes (because a NAnt script may include sub-project files, as is the case with the script NAnt uses to build itself), a solution file is generated which references all the created Microsoft Build project files.

As I mentioned earlier, I initially anticipated that translators would be necessary for numerous existing NAnt tasks. However, after performing successful conversion of NAnt and CruiseControl.NET, I found out that only a csc to .csproj translation is necessary. The converter captures the output file name of the csc invocation and saves a project file with the same name, replacing the extension (.dll/.exe) with .csproj. If the file already exists then its properties are updated, to the extent possible. In the resulting MSBuild file all variables are expanded and all default values are explicitly declared.

All properties that are in use by the build scripts on which the converter was tested have been verified to be translated properly. Several known items (assembly and project references, source files and embedded resources) are always replaced, but other items are preserved. Properties are set without any Condition attribute, thus if if the user sets them from the Visual Studio UI, then those more specific values will override the ones copied from the NAnt script.

I have initially developerd and tested the MSBuild script generator on the Microsoft.NET Framework, but I always plannedfor it to be usable on Mono as well. I quickly found out that Mono had no implementation of the Microsoft.Build assembly. This is a relatively new assembly, introduced in Microsoft .NET Framework version 4.0. As this new API simplified development of the converter greatly, I decided that instead of re-writing the tool using classes already existing in Mono, I would implement the missing classes myself.

Mono Project improvements

I created a complete implementation of Microsoft.Build.Construction namespace, along with necessary classes and methods from Microsoft.Build.Evaluation and Microsoft.Build.Exceptions namespaces. The Construction namespace deals with parsing the raw build file XML data, creating new nodes and saving them to a file. It contains a single class for every valid project file construct, along with several abstract base classes, which encapsulate functionality common to their descendants, e.g. ProjectElement is able to load and save a simple node, storing information in XML attributes, while ProjectElementContainer extends it and can also store child sub-nodes.

While examining the behavior of the Microsoft implementation of those classes strongly suggest that they store the XML in memory, as they are able to save the loaded file without any formatting modifications, the documentation does not require this behavior. As this would bring no additional advantages, and is detrimental to the memory usage, my implementation only stores the parsed representation of the build script. Two exceptions from this are the ProjectExtensionsElement and ProjectCommentElement, as they represent nodes that have no syntactic meaning from the MSBuild point of view and it is not possible to parse them in any way – thus the raw XML is kept and saved as-is.

A project file is parsed using an event-driven parsing model, also known as SAX. This is preferable because of performance reasons – the parser does not backtrack, and there is no need to ever store the whole file in memory. As subsequent nodes are encountered, the parent node checks whether its content constitutes a valid child, and creates an appropriate object.

As is suggested for Mono contributions, the code was created using a test-driven development approach, with NUnit test cases written first, followed by class stubs to allow the code to compile, and finally the actual API was implemented. As the tests’ correctness was first verified by executing them on Microsoft .NET implementation, this method ensures that the code conforms to the expected behavior even in places where the MSDN documentation is vague or incomplete.

Evaluation in practice

After completing the implementation work, I tested the tool using two large open source projects that employ NAnt in their build process: Boo and IKVM.NET.

Boo project consists mostly of code written in Boo itself and ships with a custom compiler, NAnt task and Boo.Microsoft.Build.targets file for MSBuild, so a full conversion would require referencing those additional assemblies and would not provide much value. However, the compiler itself and bootstrapping libraries are written in C#, thus providing a suitable test subject.

Executing the conversion tool required forcing the build using the 4.0 .NET Framework (instead of 3.5) and disabling the Boo script that the project uses internally to populate MSBuild files. Initial conversion attempt revealed a bug in my implementation, as Boo employs a different layout of NAnt project files than the previously tested projects. Once I fixed the converter to take this into account and generate paths rooted against the .csproj file location instead of the NAnt .build file, the tool executed successfully and produced a fully working Visual Studio 2010 project that can be used for building the C# parts of the Boo project.

Testing using IKVM.NET followed a similar path, as most of the project consists of Java code, which can not be compiled using MSBuild and does not lend itself to conversion. After I successfully managed to perform the daunting task of getting IKVM.NET to compile, the <generate-msbuild/> task was executed and produced a correct Visual Studio solution, with no further fixes or manual tweaks necessary. The update functionality also worked as expected, setting build properties copied from NAnt where they were missing from the MSBuild projects.