compareTo(…)
Is Bad
Traditionally, generic operations on ordered objects are provided a comparison function to use. In Java, you define a class’ natural ordering by writing a virtual method that performs the comparison. I think that was a mistake made by people who tried to force OO-ness where it didn’t belong.
The Problem
Let’s take a 2-dimensional
Point
class as an example.
class Point implements Comparable
{
int x, y;
int compareTo(Object otherObject)
{
Point p = (Point) otherObject;
// Do something with: x, y, p.x, p.y
}
}
Type Safety
The comparison function isn’t type safe. The first line could bomb out with a
ClassCastException
at runtime. Though Java’s type system isn’t the greatest, comparing values is fundamental to programming. We shouldn’t have to hack up a workaround for something so simple.
What could help is if the parameter was a
Point
instead of an
Object
. Doing that would not allow it to be a virtual method, though, since it wouldn’t be compatible with
Comparable
’s method signature. You can make the return type more specific or a parameter type more general, but not vice versa.
But wait a minute…why
is it a virtual method?
Overriding
The reason you’d want to make a method virtual is to let you override the behavior in a subtype. However, overriding
compareTo(…)
is fundamentally problematic.
To be useful, a comparison function must behave sanely. One of the requirements for sane behavior is that the order of arguments passed in shouldn’t matter. This means that “
a.compareTo(b)
” should return the opposite of “
b.compareTo(a)
” (symmetry).
If object “
b
” is a subtype of “
a
”, then overriding
compareTo(…)
in a subclass is almost guaranteed to break the symmetry requirement because you’re changing the behavior of one of the comparisons without changing the behavior of the other. Not sane.
Let’s say we defined a new subtype:
class Point3D extends Point
{
int z;
int compareTo(Object otherObject)
{
Point3D p = (Point3D) otherObject;
// Do something with: x, y, z, p.x, p.y, p.z
}
}
Now the comparison is not symmetric and will behave unpredictably:
Point Max(Point p1, Point p2)
{
if (p1.compareTo(p2) > 0)
return p1;
else
return p2;
}
If both the parameters are
Point
objects, everything will be fine. The same goes for when they are both
Point3D
objects.
If you pass in a
Point3D
for “
p1
” and a plain
Point
for “
p2
”, you’ll hit the
ClassCastException
in
Point3D
’s comparison function. In the inverse case,
Point
’s comparison function might think that the two objects are equal without even looking at the
Point3D
’s “
z
” component.
This is the kind of garbage that can be detected by the compiler if the API is designed to properly take advantage of the type system.
But Why Don’t I Run Into This Problem Often?
Because you almost always compare objects that are of the exact same type, which means you don’t need
compareTo(…)
to be virtual method.
The API makes it seem like
compareTo(…)
is this great method that lets you compare any object to any other object, but…
- It’s unsafe.
- There’s no need for it.
Put another way, the
compareTo()
technique destroys type safety to let you do something you don’t want to do anyway. Though there are ways to use
compareTo(…)
that avoid some of the issues (such as explicitly checking for subtype relations and deferring to the appropriate class), I can’t come up with one that does
The Right Thing in every case.
The Solution
A proper solution is to pass around a separate comparison function. This is definitely not a new idea; it’s what everyone was doing before
compareTo(…)
came along and screwed everything up.
Since Java
still doesn’t let you pass around functions, you’d have to use something like
java.util.Comparator
. (For simplicity, I’m going to pretend it works like a function, though.)
void QuickSort<T>(T[] array, Comparator<T> cmp)
{
...
}
class TreeMap<T>
{
Comparator<T> cmp;
...
TreeMap(Comparator<T> cmp)
{
this.cmp = cmp;
...
}
}
Default Comparison Function
The only problem with a separate function is that it’s kinda inconvenient to use. It’s an extra value to pass around when most classes only have a single commonly-used ordering function. With some modest language extensions, there might be a way around this. You could define some standard some location to look for the “default” comparison function.
void QuickSort<T>(T[] array, Comparator<T> cmp = T.cmp)
{
// sort it
}
class Point
{
static final Comparator<Point> cmp = new Comparator<Point>()
{
compare(Point p1, Point p2) { ... }
}
}
The “
cmp
” parameter is given a default value to use if none is passed in. This lets you call
QuickSort
with only an array of
Point
s and the compiler will automatically fill in the default value for “
cmp
”. But you still have the option of passing in your own comparison function.
In C++, the appropriate comparison function is matched at compile time and the compiler will let you know if one is not found. This works because C++ templates are also expanded at compile time.
Java and C# both use explicit constraint specifications on generic type parameters. Since
QuickSort
doesn’t place any constraints on what “
T
” must look like, there’s no way to guarantee that “
T
” will always have “
T.cmp
”. Using C#-style type constraints,
QuickSort
could possibly be written:
void QuickSort<T>(T[] array, Comparator<T> cmp = T.cmp)
where T.cmp : Comparator<T>
{
// sort it
}
But there’s still a problem. This method now requires that “
T.cmp
” exist, even if the caller passes in its own comparison function. If we remove the default parameter feature, we can still use the painful (but simple) technique of creating a pair of overloaded functions:
void QuickSort<T>(T[] array)
where T.cmp : Comparator<T>
{
QuickSort(array, T.cmp);
}
void QuickSort<T>(T[] array, Comparator<T> cmp)
{
// sort it
}
Oh well. Not an ideal situation, but what are you going to do? If this idiom becomes more common, it might be a good idea to have the language automatically handle the details for you. If the default parameter value is part of the function’s interface specification, then I think it still doesn’t violate the idea of specifying generic constraints explicitly.
If you know another way to accommodate the concept of a “default comparision function”, let me know.
What About equals(…)
?
One would think that Java’s technique for checking equality also suffers from similar problems, but it looks like the relative simplicity
equals(…)
lets it squeeze by. For one, the
ClassCastException
isn’t really an issue because the
equals(…)
function can just return false if the types aren’t compatible (an option that
compareTo(…)
doesn’t have).
To avoid the problem of asymmetry (“
a.equals(b)
” vs. “
b.equals(a)
”), we just have to match the type
exactly instead of simply checking for compatibility using “
instanceof
”:
class Point
{
int x, y;
boolean equals(Object o)
{
// Check for an EXACT type match.
if (this.getClass() != o.getClass()) return false;
Point p = (Point) o;
// Do something with: x, y, p.x, p.y
}
}
All
equals(…)
functions must follow the rule of checking for an exact type match. Even though everything works out, it still
feels like a bad idea. Any time you need to use runtime type information for something so simple, you’re probably doing it the wrong way.
Though I think a separate comparison function would be a better idea, the arguments against
compareTo(…)
don’t really apply here; the ulimate behavior of the
equals(…)
function is just fine. So far, all I can come up with is:
- Comparison functions are more generic (easier to substitute custom ones).
- The extra type checking is slow.
Of course, that’s not really a solid argument against it, so if you can come up with a better one, let me know.
Keywords
This is just to get hits from web searches:
- compareTo considered harmful
- compareTo is evil
- compareTo sucks
Incidentally, when I was first looking for information on this topic, I searched for “compareTo considered harmful”. I woulda used that title for this page if it didn’t feel so blasphemous.