Friday, April 23, 2010

The Brilliance of Maybe and the Value of Static Type Checking

I am sometimes surprised by the strokes of genius hidden inside Haskell's type system. For example: the Maybe type.

Maybe a is a parameterized type that can hold either the value Nothing, or a value Just a . Seems simple enough - it's an optional value.

But it goes deeper than that. Look at an object of type String in Java. This object's type is actually larger than String - it can either represent a string of characters or the special value null. Languages like Java, Ruby, and C# all make the mistake of allowing any type at all to hold null. This greatly diminishes the value of static typing, since any call of the form string.length() will sail happily past the compiler and then throw the dread NullPointerException at runtime if passed a null. The compiler will never see it coming, and your program could work for some time before you discover the problem.

But in Haskell, we must be explicit. By default, an object of type Integer can only hold an integral value - there is no null, and no possibility of a runtime null pointer exception. The Haskell compiler will not even produce a program if you try to pass Nothing into a non-Maybe type. A datatype cannot be instantiated with null values unless you explicitly allow it to hold them.

This eliminates an enormous class of errors. Anyone who has maintained a large Java application instinctively shudders at the prospect of an NPE - the value could have become null at any point in the program and resolving it requires extensive tracing.

Further, in order to avoid the possibility of an NPE, you must litter your code with distracting and unnecessary conditionals like:

if( foo == null ) throw new IllegalArgumentException( ..
if( foo.bar() == null ) throw new IllegalArgumentException( ... );


before doing any real work. But what's worse is that you're turning a programming error into a runtime exception. The fact that your program compiles does not guarantee that you will have no NPE's and running it does not guarantee that you will hit every branch that might generate one.

In Haskell, it is simply impossible to write a program that is susceptible to NullPointerExceptions at runtime unless you explicitly make the type in question optional by wrapping with Maybe.

10 comments:

  1. There's a second part to this story, just as important as the existence of the data type: the functor, applicative and monad type-classes for Maybe, which constitute the abstractions that let us do this layer of indirection without causing undue burden to the programmer.

    ReplyDelete
  2. Silly question of haskell newbie: an object of type Integer in haskell can be undefined. It will pass compilation and throw exception in runtime if used as argument for some strict function. Is there any difference between undefined in haskell and null in java?

    ReplyDelete
  3. I believe that you overstate the case. The existence of Maybe in Haskell *is* an advantage, but not as big an advantage as you state.

    When I am programming in Java, I often create variables of type String. Generally it is something like this:

    String s = name.substring(PREFIX_LENGTH);
    String t = httpServletRequest.getParameter(name);

    Just from reading these lines of code, I can be certain that s is never null (because substring() never returns a null) and that t MAY be null (because getParameter() can return null depending on input, and input is never reliable).

    As I continue to work through my program, I keep in mind which String variables have a chance of being null and which ones are certain not to. I will code checks for null when using the former and not when using the latter. In fact, I write essentially the SAME code I would write working in Haskell.

    The difference is that string-that-might-be-null and string-that-is-not-null are two different types to the compiler in Haskell, whereas in Java the programmer has to keep track of which is which. There is no savings in code: every place in Java where a null check is required will, in Haskell, require a dereference of Maybe. What you get in Haskell is this: the compiler keeps track of where it is required, and compilers make fewer mistakes than humans. Also, in Java I often call a function which has poor documentation and doesn't make it clear whether or not it can return null; in Haskell it is impossible to fail to document that fact.

    So: useful in that it provides documentation and checking, but not actually better in terms of the code required.

    ReplyDelete
  4. mcherm,

    But there's a difference - you can statically assert that substring (or your own method) never returns a null in Haskell but not in Java, so clients of your Haskellfunction never have to worry about NPE's.

    More importantly, although it may not be much work to check for nulls in Java, there's frequently not much you can do once you've detected one - except throw a RuntimeException. But the presence of a null in an unexpected place is a programming error that should be caught at compile time. There's nothing you can reasonably do when something you need to be non-null turns out to be null. So you have to enforce it at compile-time so that you're not writing programs where such a thing can arise.

    And there is a savings in code. Only certain Haskell functions are defined to return "Maybe" so you only need to do checks in those places. Most functions return a simple, non-Maybe value. Once you've extracted the value from a Just, you can freely pass it around without having your other functions have to do defensive null-checking, as you would in Java.

    Also, your example is substring, but consider something like "getName()" in a large JavaBean. If the user never calls setName, but you're later expecting to see a name, you'll get an NPE. In other words, you can only rely on getName() if a certain series of steps have been performed before you call it (i.e. the user called setName with a non-null value). You can get around this by making the class immutable, but then most Java libraries (Spring, Hibernate, Struts) will have problems working with it, and there isn't much support in the language for working with immutable objects.

    ReplyDelete
  5. Dmitry,

    True. undefined and error both evaluate to "bottom," which is the closest thing in Haskell to an NPE.

    But I would argue that the situation is still better. In Haskell, identifiers and record slots are not defaulted to "undefined." If you call undefined or error, you're saying very explicitly that you want the program to end. Whereas null can mean a legitimately unprovided value or a simple programming error.

    ReplyDelete
  6. mcherm,

    To add to MI's point, for a simple statement like
    String s = name.substring(PREFIX_LENGTH);

    It may be obvious to you that substring() does indeed have such behaviour, because you *know* the behaviour of substring. However consider that you work on somebody else's code, and you see this:

    String s = InterestingStringFactory.createInterestingString(...);

    ... as a programmer you are forced to know the exact behaviour of InterestingStringFactory.createInterestingString() to make the same assertion - the code or documentation of which may not be available to you.

    Worse still, if you're editing somebody else's class:

    class ComplicatedWebServiceConsumer {
    // constructor...

    // lots of methods...
    String url;
    }

    ... and you want to implement, say, a function:

    void fetchSomeData(){
    connect(url);
    // do something...
    }

    You can't be *sure* that url is non-null here, yet the compiler accepts this as a 'correct' function.
    Of course, to ensure you have a non-null url when you call connect() you will actually have to analyse the entire ComplicatedWebServiceConsumer class:
    - look at all constructors that possibly initialise this member
    - any assignments onto null that can occur in ANY method of this class, and think about when they are called
    - ensure that your call to fetchSomeData() must occur only at times when the state of this member is non-null
    OR
    you must explicitly check for null values before calling connect().

    In a strongly typed language like Haskell, when a programmer sees 'url' in code, they need not worry about it being non-null at all because the compiler guarantees it. They would instead spend more time focusing on other things, rather than pondering whether at any point a NPE can possibly occur.

    ReplyDelete
  7. Yep. I've had this same thought. Some of the best parts of Haskell are the simple ones, like Maybe and Functor. And it's not just that other languages conflate references with nullability: it's really hard to define a type which doesn't, and then a maybe-null type on top of it which enforces safety like Maybe does. (Take a look at Boost.Variant, for example.)

    Though interestingly, C++'s references (as opposed to pointers) do somewhat fit the bill: they can only be assigned to once at initialization, and though it's -possible- to construct a null reference (and memory the reference points to can be deleted), you have to go out of your way to do so[1]. Normally, any references you encounter can be assumed to be 'good', and it really does count as an exceptional circumstance when one isn't.

    [1] int& nullref = *((int*)0);

    ReplyDelete
  8. mcherm > In fact as ezyang asserted, even if you are able to keep all this information in your mind (which appears to be exactly what the static typing should avoid) there is a saving in code in Haskell due to its superior infrastructure to deal with Maybe values (Functor, Monad...).

    ReplyDelete
  9. Even assuming mcherm can reasonably and accurately keep all this information in his head, the real issue is that it may and often does change later. Maybe currently your foo.Bar() method doesn't return null ever, but you may change that later and not recall everywhere it is used, or someone else may change it. Anyway, common object-oriented design is to use things like factories and interfaces where you can't know what code will actually be run, and thus you're back to defensively checking.

    Also, as a side note, not every type can contain nulls in C#, and you can create new such types using the struct declaration. Unfortunately, this is also bound to different semantics in other ways. You could, however, make a NonNull generic struct that almost behaved like the parameter type except that it wouldn't have nulls.

    ReplyDelete
  10. As Dmitry already pointed out, Haskell has "undefined", "error", incomplete patterns, which are the same headache.

    1. May the compiler break when a program has the "headache"? I am aware only of GHC having "-fwarn-incomplete-patterns". I'd like to hear more, about Java as well.
    2. Apparently you should use some class in Java analogous to Maybe. If this class is not as handful as Maybe, this is because of Java missing pattern matching. This deficiency is more broad than just absence of Maybe.
    3. Implementations of Java and Haskell may differ in what amount of data they throw when the "headache" occurs. Though I doubt that any data is useful in such a situation.

    The following is a digression. The explanation why "error" in Haskell is used reveals another difficulty. "error" sometimes used in place of the "unreachable code" (see Wikipedia), which is perfectly legitimate. If a compiler breaks on "error", we have 2 ways to fix our program:
    a) Return some a nonsense value. Selecting that nonsense value is a burden not related to the task. Also it hides the error in the program.
    b) Write a formal proof that the unreachable code will never execute. But Haskell is not capable of checking proofs.

    ReplyDelete