Evaluation of the Microsoft CLR
No, I’m not an expert. Also, I didn’t come up with this stuff; I’m usually just taking sides on some ongoing debate about the “right way” to do things.
What is the CLR?
From what I can decipher, the CLR is a VM-type execution platform. It is the only interesting part of the big ball of marketing slime called “.Net”. I don’t have an exact definition of it and when I use the term, I might be including the standard runtime libraries (I probably wont, though, because I’m not very familiar with them).
I could be including C# as well. While you can’t technically call C# the canonical CLR language (some CLR features are only available in Visual Basic), I think C# is the language that most closely matches up with the CLR. The C++-specific features of the CLR are legacy support (surprisingly comprehensive legacy support!).
To sum things up, I don’t know what I mean when I say “CLR” and so you’re going to have to figure it out from the context.
"What’s your problem?"
First of all, lemme state that I do like the CLR. There are a lot things the CLR does better than the Java platform. On the other hand, big deal.
When the Microsoft guys designed their system, they had a fully implemented and deployed example to learn from. Given that, the CLR is depressingly similar to Java. All they had to do was evaluate the complaints against Java and deal with them one by one, but they didn’t. The CLR has some brand-new bad ideas, too.
Interestingly, many of the features that deal with practical deployment details (such as versioning, assemblies and app domains) are pretty good.
Good: Competition
Sun was avoiding new language-level features to Java for a
long time. The lack of autoboxing, generics, and enumerations have resulted in many, many hours of lost productivity. Luckily, Microsoft lit a fire under their asses.
Though some of those features were already in the works before C#, the new competition has really forced Sun to get its act together. I don’t know if Java would have all the language-level features it has now if C# hadn’t come along.
Good: Virtual Method Annotations
All methods are non-virtual by default (final
in Java-speak). If you want to create a new virtual method, you have to say so explicitly. If you want to override a virtual method, you have to say so explicitly.
Why creating virtual methods should be explicit
It takes extra care to design a method so that it can be safely overridden. The explicit annotation is a good way to force the programmer to to think about it (this argument was taken directly from
the man himself).
Why overriding should be explicit
This one is the real big win. In Java, when you write a new method that matches the signature of one of your parent class’ methods, the new method automatically overrides the old one. This can easily happen accidentally. What’s worse is that this can also happen when your parent class gets “upgraded” and now has methods that weren’t previously there. After you recompile the base class, weird things will start happening at runtime.
In C#, if a parent class is upgraded like that, you’ll get a error when you try to compile the base class, which lets you know that something has changed. (To fix it, you can add the
new
modifier.)
Good: Value Types
The addition of value types allowed the semantics of numeric and boolean primitives to be described within the CLR itself. This is good.
They’re good for interoperability with C code. In Java, laying out message structures is painful (to write) and inefficient (to run). It’s good that C# has the ability to manipulate complex structures in place.
Value types also unified the type system. Just joking. It didn’t unify jack.
Bad: Value Types
When I first read about value types, I was impressed. Had I written this article back then, I would have hailed value types as the CLR’s most goodest piece of goodness. Finally, a unified type system. Except not really. They behave differently from other types. But in C#, they still
look exactly the same when you use them.
Color originalColor = new Color();
originalColor.Red = 111;
Color x = originalColor;
x.Red = 222;
print("original.Red = " + original.Red);
print("x.Red = " + x.Red);
The programmer has to know whether
Color
is a value type or a class type to know how the above code will behave.
I hear you saying “No big deal. A programmer is expected to know
something about the classes he uses. What’s wrong with making the programmer know whether it’s a value type or a reference type?”. That’s a hard question and I don’t know if I can answer it to your satisfaction, but I’m going to have to try:
The value/class status of an object is another piece of information about
every object. The worst part is that the value/type decision is essentially an optimization hint. The effects of optimization hints shouldn’t leak into the semantics.
Why doesn’t Java have this problem? After all, aren’t
int
s the same as value types? Yes they are, but they’re also immutable. You can share the object with anyone you want and can be assured that the sharees won’t mess around with the object behind your back. This is also true in C# for the immutable values types, but now you can have mutable value types (like
Color
), which don’t play well with others.
One of the main reasons for this feature was performance. Unfortunately, much of the performance gain could have been realized with a smarter compiler (
without mucking with the semantics).
Good: Runtime Generic Instantation
The CLR’s design for generics was OK. Nothing spectacular. They scored a point by allowing constructor constraints on type parameters, but other than that it was pretty straight-forward. They did miss out on adding variance syntax to C#, but they can bolt that on later.
The CLR’s generics
implementation is great. Normal reference types are all handled the same way by the same class. But when you use a type like
int
or
bool
the runtime system will compile new instances of the generic methods to handle the primitive types efficiently. That was a good call.
The alternative is to box each primitive inside an object (in Java, this is done by converting
int
s to
java.lang.Integer
objects). This is how Java does it. It’s a lot slower. But I suppose that you can forgive Java for that. The truly sucky aspect of the Java implementation is that they don’t save any runtime information about type parameters (“type erasure”). Though
type erasure itself is not necessarily a bad thing, it wasn’t a good fit for Java.
In a well-designed programming language, you wouldn’t ever need that information. But since both C# and Java programmers have developed an unhealthy dependence on unsafe type casts, the lack of runtime type information causes things to
not work as expected.
Bad: Covariant Arrays
Good: Assemblies
Finally, a real way of packaging your code. Java’s JARs are pathetic. Though versioning/linking is still something you have to think hard about, assemblies and “strong names” are a step forward.
On the more technical side, assemblies give the JIT compiler some flexibility. By making certain classes only visible within an assembly, the compiler can do some inter-class optimization without worrying about dynamically-loaded code coming in and screwing everything up. Though I’ve heard that JIT compilers are getting better at dealing with this problem, assembly-level encapsulation makes things easier.
Assemblies are a good example of a feature that makes things easier for the programmer
and for the compiler.
Good: Application Domains
On my old machine, the startup time for Java applications was quite high. So I implemented a wrapper Java applciation that always keeps the Java compiler loaded in memory and invokes it upon request. It would be nice if you could do this for any application in general and I tried to do just that. Unfortunately, it’s not possible. Wait. I take that back. I think that some people have done this by messing around with the bytecode of external programs before loading them (replacing references to
java.io.File
and
java.lang.System
on the fly). So while they may have beaten the Java environment into submission, it still isn’t an ideal solution (see
Echidna (apparently abandoned) and
JNode).
With the CLR, it’s a lot easier. You can launch multiple CLR applications in the same process, each in its own “application domain”. This cuts down on resource usage, startup time and IPC overhead.
Application domains also let you define lightweight isolation boundaries between different programs. If you just let pointers run all over the place, you can’t cleanly shut down or reload one component without affecting all the others. I think the development of the ASP server forced Microsoft to deal with these issues properly (since the server has to continuously load and shutdown user programs).
(There’s an
active JSR to add a similar feature to Java).
Bad: out
parameters
I’m sure a lot of people think they love this feature. Every C/C++/Java programmer has run into the problem of wanting to return more than one value from a function. But “
out
” parameters are the bad way of doing it. What they should have done is allowed for
tuples (and multiple return values would have naturally followed). There are some legitimate uses for “
ref
” pointers (though I’m sure people end up using them where tuples would be more appropriate), but “
out
” parameters are almost always the wrong solution.
Currently, you can’t pass properties as “
ref
” or “
out
” parameters because they’re essentially C-style pointers. Implementation details are leaking out again. Tuples return values would use copying semantics instead of direct pointers and so you can, once again, treat properties like regular fields.
Good: "unsafe" code
The JVM and CLR provide two things:
- Type safety
- Platform independance
With the JVM, those two are tied together. The whole system provides both type safety and platform independance. So if you want to do type-unsafe things in Java, you need to write native code and compile it ahead-of-time for each target platform.
With the CLR’s unsafe subset of instructions, you lose type-safety (just like you do with JNI), but you still have platform independence. C#’s support for unsafe code makes everything more convenient too (have you used JNI?).
Of course, you can’t do
everything in CLR bytecode that you can do in native code but it covers a good percentage.
Bad: Redundancy
These are mostly lessons that most of us (obviously not all of us) have learned from Java.
What do you have to do when you rename a Java class? Rename the file, rename the “
class Blah
” declaration and then rename all the constructors. Those are all redundant pieces of information. You also have to rename all the references to the class, but this is unavoidable with a plain-text storage format.
In C#, I think you can name a file whatever you want, but that doesn’t really help in the common case. You’re going to end up making sure the class name matches the file name anyway. They also tacked the name of the class onto a class’ static initializer, so there’s an additional rename (the
D Programming Language people did the smart thing and
called their constructors "this
").
Also, Java package names were always absolute paths, making reorganization a pain. The same holds for C#. They could have fixed this, but they didn’t.
You might say that editors with language-sensitive renaming features can eliminate these problems. Yes they can (though I wont be able to use them until they add decent support for Vim-style editing; the pile of crap that comprises the CodeWright Vi bindings doesn’t count). I don’t like that kind of redundancy in the source file. It’s probably just a personal preference.
Good: Nullable value types
A recent addition was a simple syntax to make value types “nullable”. This means that you can declare a variable to have type
int?
and store either an integer value or
null
in it. Now value types and class types have become a little more uniform (not really, see below).
I think they forgive you if you perform arithmetic on nullable values. So if you do:
int? x = 5;
int? y = null;
int? z = x + y;
Instead of bombing out with a
NullReferenceException
, I think they’ll just set
z
to
null
. This “silent failure” behavior is dangerous, but that’s not even the worst part.
Reference types
do throw a
NullReferenceException
when you try to do things to them. So, again, the split type system is causing problems.
Bad: Nullable by default
What’s wrong with this picture:
int
means “integer”
int?
means “integer or null
”
String
means “string or null
”
When you have a type
String
, you’re really using the type
String?
because it could either be a pointer to a string object or it could be
null
. The fact that
null
is considered to be a valid pointer value is a hack from C that should have been fixed by now.
All reference types should be non-nullable by default. Unfortunately, there’s no way to express a non-nullable reference type. The CLR is too widely deployed to make non-nullable the default, but they might be able to salvage things by allowing explicit annotations to indicate that a reference is non-nullable. This, sadly, makes the common case more tedious. I think it’ll also suffer from the same problem C++’s
const
does in that it’ll be painful to add in the non-nullable annotations if you forget them in the beginning (and since the default case is less restrictive and involves more typing, it’ll be very easy to forget them in the beginning).
It might seem unfair to blame them for not fixing this – the only reason I’m even aware of the problem is that I randomly stumbled upon the
Nice programming language and read about the elegant way it’s handled in that language. But I think we can hold a “legendary language designer” to a higher standard (I think that’s what the Microsoft PR machine has been calling
him). This is the single biggest screw up in the CLR.
Good: Function Pointers (and Anonymous Functions)
Actually, they’re called delegates, but they’re the essentially the same thing. Great feature. But this was a pretty obvious one. It would have been stupid not to have added this feature. Then again, the Java still doesn’t have it…
Java’s anonymous classes can fake it a little but, but it’s way too inconvenient. (Heck, I think the word “
delegate
” is too long a prefix for anonymous functions; something like “
#
” would have been better).
[C# 3.0 Update: The “delegate” keyword is no longer necessary!]Also, “multicast delegates” are also a huge convenience. Not having to implement these yourself for all of your events is nice.
But, unfortunately, the implementation isn’t clean.
Bad: Function Pointer Hacks
While multicast delegates are useful things, they’re shoved onto the same type space as regular delegates. So you can’t statically tell whether a delegate points to a single target or if it points to multiple targets (and you often need to know this, because return values behave differently). There’s nothing in C# to protect you from this. There are separate
Delegate
and
MulticastDelegate
classes in the library, but
Delegate
is deprecated.
The actual implementation of delegates in the CLR is a behind-the-scenes hack. They just let the VM mess around with the parameters and hack up a new class on the fly to make up for the lack of a powerful-enough type system. With generics and
tuples (and a little syntactic sugar), delegates could have been implemented cleanly. The C# language does a pretty good job of shielding you from the mess, though, so a C# programmer doesn’t have to be aware of the back-end nonsense. On the other hand, there are a lot of other delegate-like things that you can’t do without tuples.
I realize that this is kind of unfair. The CLR didn’t have generics at first, and so they didn’t really have the necessary mechanisms to implement things cleanly. First of all, maybe they
should have planned on adding generics (after all, the Java people had generics in the works for a long time and so it was inevitable that C# would need them). But, ignoring the past, they really, really should do this for future releases. I have a pretty strong feeling that they wont, but maybe that’s just because we’ve all become used to Sun adamantly refusing to change Java, no matter how awesome the feature request.
BTW, the multicast delgate API should let you pass in a reduction function to handle multiple return values. Or, at least, return an array of all the return values so you can run the reduction yourself. Actually, because of the silly way delegates are implemented, you can’t add this to the static delegate API because the actual invocation routine is generated at runtime. Fixing this requires a VM change.
Bad: The CLR is not language agnostic
This is not really a complaint against the CLR as it is has better inter-language support than Java does. This is a complaint against the people who are convinced that compiling to CLR bytecode takes you to interoperability heaven.
The CLR is highly geared towards a Java-like language. There are additions to support C++, but that’s about it (and in the grand scheme of things, C++ is not very different from C#). “But what about Visual Basic?”
VB.Net is usless
From what I’ve read, it seems like the new Visual Basic is very different from the old one. Old VB programmers are
complaining that too much has changed. To me, it looks exactly the same because I see similar syntax. The reason is that the type system (which is probably the most important aspect of a programming language) has changed. VB.Net is just a C# core dressed up in different syntax. Most of the features unique to VB.Net are decidedly stupid and left over from attempts at pushing a language past its limits.
They shouldn’t have even bothered with VB.Net. They should have created a variant of C# that looked just like C# except had VB-style “duck-typed” varibles and a VB-style development environment (and, of course, called it something stupid like IntelliC#). There are a couple reasons (that I can think of) they didn’t do this:
- Creating VB.Net was a way to convince the casual observer that the CLR was language neutral.
- VB programmers would have complained loudly that they were being abandoned (though they still are being abandoned with VB.Net).
- Some genius decided that the VB syntax was valuable on its own and decided to keep it around.
Your guess is as good as mine. Unlike C++, VB doesn’t really add any value.
Narrow-minded type system
If they wanted to be language neutral, they should have come up with a solid type system. Instead, they did just enough to accomodate both C++ and Java (and many of the differences are superficial and due to historic reasons). Too many real features are missing. The biggest being
option types (yes, I mentioned this already, but it’s
really important).
The lack of a CLR-level “
const
” restriction on parameters and return values is also kind of sad because it seems like an obvious feature. I have a feeling that this can be bolted on to the CLR later but making the libraries take advantage of this will be seriously disruptive (like trying to make an existing C++ program
const
-correct). They could phase it in little by little by letting “
const
” mistakes pass as warnings, but then the optimizer can’t take full advantage of this extra information. It would still help with program correctness, though.
Your core libraries can’t be language-agnostic
Well, maybe someone, someday, will come up with a way to write language-independant libraries (and that would be an impressive feat), but that definitely hasn’t happened yet. Sure, you can probably get away with using the same libraries for Java and C#. But try and translate that a little further to C++ and see what happens. Imagine trying to implement (or even use) the STL from within C#. It doesn’t make sense. And C++ isn’t even the biggest challenge; there is no way you’re going to get Haskell programmers to ditch the Prelude. Of course, Microsoft knows this and will create separate core libraries for each language.
The silver lining
While I think that the CLR isn’t even close to being a unified language-agnostic runtime, it does appear to be a decent candidate for replacing the C calling convention as the standard foriegn function/object interface (becoming, as originally intended, a COM replacement). The key here is JIT compilation to avoid the virtual method issues statically compiled C++ libraries have.
Yes, Java did this before the CLR, but the CLR’s comprehensive support for unmanaged code means that you can totally ignore the not-so-language-neutral CLR type system most of the time, playing by the rules only when you want to use C# libraries. So while Haskell programmers will not be able to ditch the Haskell Prelude, they’ll be able take advantage C# libraries when there’s no Haskell equivalent, even though it might be a little inconvenient.
"You made a mistake"
I realize that I’m probably wrong about some of the things written here (hopefully not too many). I’d really appreciate it if you’d
point out my mistakes.