Monday 4 June 2012

Why aren't C# methods virtual by default?

Recently, during GeeCON 2012 conference, I had a very interesting conversation with Martin Skurla on differences between the .NET runtime and the Java Virtual Machine. One of the more surprising divergences is centred around the virtual keyword.

Virtual methods are one of the central mechanisms of polymorphic objects: they allow a descendant object to replace the implementation provided by the base class with it's own. In fact, they are so important that in Java all public methods are virtual by default. even though this does carry a small runtime overhead. The virtual method dispatch is usually implemented using a virtual method table, thus each call to such a method requires an additional memory read to fetch the code address - it cannot be inlined by the compiler. On the other hand, a non-virtual method can have it's address inlined in the calling code - or even can be inlined whole, as is the case with trivial methods such as most C# properties.

There are several ways of dealing with this overhead: HotSpot JVM starts the program execution in interpreted mode and does not compile the bytecode into machine code until it gathers some execution statistics - among those is information, for every method, if it's virtual dispatch has more than a single target. If not, then the method call does not need to hit the VTable. When additional classes are loaded, the JVM performs what is called a de-optimization, falling back to interpreted execution of the affected bytecode until it re-verifies the optimization assumptions. While technically complex, this is a very efficient approach. .NET takes a different approach, akin to the C++ philosophy: don't pay for it if you don't use it. Methods are non-virtual by default and the JIT performs the optimization and machine code compilation only once. Because virtual calls are much rarer, the overhead becomes negligible. Non-virtual dispatch is also crucial for the aforementioned special 'property' methods - if they weren't inlineable (and equivalent in performance to straight field access), they wouldn't be as useful. This somewhat simpler approach has also the benefit of allowing for full compilation - JVM need to leave some trampoline code between methods that will allow it to de-optimize them selectively, while .NET runtime, once it has generated the binaries for the invoked method, can replace (patch) the references to it with simple machine instructions.

I am not familiar with any part of ECMA specification that would prohibit the .NET runtime from performing the de-optimization step, thus not permitting the HotSpot approach to the issue (apart from the huge Oracle patent portfolio covering the whole area). What I do know is that since the first version of the C# language did not choose virtual to be the default, future versions will not change this behaviour - it would be a huge breaking change for the existing code. I've always assumed that the performance trade-off rationale was the reason for the difference in behaviour - and this was also what I explained to Martin. Mistakenly, as it turns out.

As Anders Hejlsberg, the lead C# architect, explains in one of his interviews from the begging of the .NET Framework, a virtual method is an important API entry point that does require proper consideration. From software versioning point of view, it is much safer to assume method hiding as the default behaviour, because it allows full substitution according to the Liskov principle: if the subclass is used instead of an instance of the base class, the code behaviour will be preserved. The programmer has to consciously design with the substitutability in mind, he has to choose to allow derived classes to plug into certain behaviours - and that prevents mistakes. C# is on it's fifth major release, Java - seventh, and each of those releases introduces new methods into some basic classes. Methods which, if your code has a derived class that already used the new methods name - constitute breaking changes (if you are using Java) or merely compilation warnings (on the .NET side). So yes, a good public API should definitely expose as many plug-in points as possible, and most methods in publicly extendable classes should be virtual - but C# designers did not want to force this additional responsibility upon each and every language user, leaving this up to a deliberate decision.