What is type inference?
Type inference is the act of determining the type of an expression that isn’t explicitly typed. What does that mean? It means that I can write:
var result = 1 + 2
The compiler should know that even though I used the generic type
var
, it can be
inferred that the variable
result
is an integer.
Most mainstream languages will not do that for you. But there’s no reason they shouldn’t.
The advantages of type inference
A lot of the time, type specifiers are useless. You don’t need the type annotation because it’s already obvious from the context. The compiler doesn’t need the annotation because it can figure it out. So it’s a redundant piece of information. And it isn’t harmless either; the explicit type specifiers get in the way.
It’s nice to have
longVariableAndTypeIdentifiers
. While you can go overboard with long identifiers, I find myself often using less-descriptive-than-ideal names to avoid having to type so much. Even if I didn’t mind typing them in, long names can still make the code less readable. Take this example from Java:
Set<Map.Entry<ComponentName,ComplexComponentImplementation>> entries = components.entrySet();
I might decide to use the identifiers
CName
and
CCImpl
instead of
ComponentName
and
CCImpl
to keep it readable:
Set<Map.Entry<CName,CCImpl>> entries = components.entrySet();
With the shortened names, this particular line of code is more readable, but now you’re going to have to use
CName
and
CCImpl
everywhere, making the entire program harder to understand.
What it comes down to is that, in this situation, the variable name (
entries
) is all the information you need. The surrounding context usually provides enough information to the reader. With type inference, I could just write:
var entries = components.entrySet();
The longer the type specifier, the less willing you’d be to type it in redundantly. So, naturally, you’ll either make your identifiers shorter than they should be or you’ll try to avoid using intermediate variables. Both are bad.
Why it’s stupid that many languages don’t support it already
Every statically-typed language I can think of (this includes Java, C# and C++) already does some type inference. It’s just that they make it inconvenient to use. If Java compilers didn’t have any sort of ability to infer types, you couldn’t do this:
String filename = pictureAlbum.getStuff().get("Big").getInfo().getFileName();
Instead, you’d have to add an annotation to every single term:
Map<String,List<Picture>> album = pictureAlbum.getStuff();
List<Picture> pictures = album.get("Big");
PictureInformation info = pictures.getInfo();
String filename = info.getFileName();
Also, if the compiler really needed you to specify all the types, then why does it complain when you do this:
int album = pictureAlbum.getStuff();
It complains because it knows the exact type of the expression
pictureAlbum.getStuff()
but still wants you to type it in. Why won’t it just accept the “
var
” specifier and stop complaining?
var album = pictureAlbum.getStuff();
var pictures = map.get("Big");
var info = pictures.getInfo();
var filename = info.getFileName();
It’s really too bad that local variable inference was left out of so many recently-designed languages.
Type inference can get complicated, but it doesn’t have to
As far as I can tell, there seem to be two basic types of inference: inference based on variable assignment and inference based on variable use. (I’m probably not using the right terms).
Inference based on variable assignment
In all the examples above, I’ve only used inference based on variable assignment. This is the easy one. This is the one all Java/C#/C++ compilers can do already. This is where you determine the type of the variable based on how it’s value is assigned. Since compilers already know the type of almost every expression, they can just use that type for the target variable.
This can also be used for function return types (and I believe C# allows this on closures):
GetFirst(List<Integer> list)
{
return list.get(0);
}
The return type is automatically inferred to be of type
Integer
.
Tricky error messages
Some strange things can happen with even the most basic type inference:
1 var map = new Map();
2 map.put("Hello", "Guy");
3 var value = map.get("Hello");
4 int i = value.length();
Can you see the error?
Error on line 4: Could not find instance method 'length()' in class 'Object'.
The reason for the error is that
map.get()
returns an
Object
. This caused the compiler to make
value
a variable of type
Object
. You can’t call the
length()
method on
Object
. If you had to explicitly type the
value
variable, the compiler would have flagged the error at that point. With type inference, the error message has commuted a couple lines down.
First of all, I’m aware that real programs could trigger more complex error messages. On the other hand, it wont take long before we programmers learn to recognize that type of mistake and are able to deal with it just like any other seemingly-strange-at-first error message (like GCC’s “implicit declaration of function blah()”).
Multiple assignment and subtyping
It can get even tricker, though, when you allow for multiple assignments and subtyping. Let’s say you had the following class heirarchy:
Thing
+-Human
| +-Doctor
| +-Clown
+-Lawyer
The type inference policy may try to find a common supertype to handle multiple assignments:
var x;
x = new Lawyer();
x = new Clown();
var y;
y = new Doctor();
y = x;
Here, the type of
x
will be inferred to be
Thing
, because that’s the greatest common denominator. This adds complexity because, unlike before, we couldn’t determine the type of a variable based on a variable’s very first assignment statement. Remove the second assignment to
x
and its type changes to
Lawyer
.
The type of
y
is also interesting. After the first assignment, the “running” type is
Doctor
. After the second assignment, what happens? Since the type of
x
was determined to be
Thing
,
y
’s type might also be pulled down to
Thing
. However, the compiler might do some control flow analysis and see that when the assignment occurs,
x
cannot be a
Lawyer
(since the original assignment has been replaced by
Clown
). It might then decide to make it a
Human
.
This could get complicated. However, notice that the smarter compiler would give us a more specific type (i.e.
Human
as opposed to
Thing
). I’m guessing that even though this gets complicated, a good compiler will usually allow us to do more of what we
mean to do without complaining about type errors.
Inference based on variable use
This is the hard stuff. This is also the stuff that lets you write entire Haskell programs with barely any type annotations at all. (However, I think that most Haskell programmers suggest that you use type annotations for publicly exported functions so that type errors are localized and easier to debug).
String GetName(key)
{
return map.get(key);
}
What’s the type of
key
? Well it all depends on the type of
map.get
. If
map
were of type
Map<Integer,String>
, then
key
would be inferred to be of type
Integer
(because that’s the only value that fits).
I’ve done a (very) little amount of programming in ML and Haskell and have come across weird type error messages. This kind of inference can get out of control. It’s usually accompanied by parametric polymorphism (where the genericity is also automatically inferred) so maybe some of the complexity comes from the combination and not from the inference itself. Either way, I can get by fine without it.
Just give me the simple stuff
All I want is the simple form of type inference (in a mainstream language). It’s not hard to implement and it’s not hard to use. I don’t even need the compiler to handle multiple assignments. I can’t guarantee that I wont start complaining again, but this should shut me up for a while.
The
Nemerle Programming Language and the
Nice Programming Language both have this level of type inference.