Tuesday, August 26, 2014

NSNotificationCenter, Swift and blocks

The conventional way to register observers with NSNotificationCenter is to use the target-action pattern. While this gets the job done, it's inherently not type-safe.

For example, the following Swift snippet will compile perfectly:

    NSNotificationCenter.defaultCenter().addObserver(self, selector: Selector("itemAdded:"),
      name: MyNotificationItemAdded, object: nil)

even though at runtime it will fail unless self has a method named itemAdded that takes exactly one parameter (leaving off that last colon in the selector will turn this line into a no-op). Plus, this method gives you no way to take advantages of Swift's closures, which would allow the observer to access local variables in the method that adds the observer and would eliminate the need to create a dedicated method to handle the event.

A better way to do this is to use blocks. And NSNotificationCenter does include a block-based API:

    NSNotificationCenter.defaultCenter().addObserverForName(MyNotificationItemAdded, object: nil, queue: nil) { note in
      // ...
    }

This is much nicer, especially with Swift's trailing closure syntax. There are no method names to be looked up at runtime, we can refer to local variables in the method that registered the observer and we can perform small bits of logic in reaction to events without having to create and name dedicated methods.

The catch comes in resource management. It's very important that an object remove its event observers when it's deallocated, or else NSNotificationCenter will try to invoke methods on invalid pointers.

The traditional target-action method has the one advantage that we can easily handle this requirement with a single call in deinit:

  deinit {
    NSNotificationCenter.defaultCenter().removeObserver(self)
  }

With the block API, however, since there is no explicit target object, each call to addObserverForName returns "an opaque object to act as observer." So your observer class would need to track all of these objects and then remove them all from the notification center in deinit, which is a pain.

In fact, the hassle of having to do bookkeeping on the observer objects almost cancels out the convenience of using the block API. Frustrated by this situation, I sat down and created a simple helper class, NotificationManager:

class NotificationManager {
  private var observerTokens: [AnyObject] = []

  deinit {
    deregisterAll()
  }

  func deregisterAll() {
    for token in observerTokens {
      NSNotificationCenter.defaultCenter().removeObserver(token)
    }

    observerTokens = []
  }

  func registerObserver(name: String!, block: (NSNotification! -> ()?)) {
    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: nil, queue: nil) {note in
      block(note)
      ()
    }

    observerTokens.append(newToken)
  }
  
  func registerObserver(name: String!, forObject object: AnyObject!, block: (NSNotification! -> ()?)) {
    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: object, queue: nil) {note in
      block(note)
      ()
    }
    
    observerTokens.append(newToken)
  }
}

First, this simple class provides a Swift-specialized API around NSNotificationCenter.  It provides an additional convenience method without an object parameter (rarely used, in my experience) to make it easier to use trailing-closure syntax. But most importantly, it keeps track of the observer objects generated when observers are registered, and removes them when the object is deinit'd.

A client of this class can simply keep a member variable of type NotificationManager and use it to register its observers. When the parent class is deallocated, the deinit method will automatically be called on its NotificationManager member variable, and its observers will be properly disposed of:

class MyController: UIViewController {
  private let notificationManager = NotificationManager()
  
  override init() {
    notificationManager.registerObserver(MyNotificationItemAdded) { note in
      println("item added!")
    }
    
    super.init()
  }
  
  required init(coder: NSCoder) {
    fatalError("decoding not implemented")
  }
}

When the MyController instance is deallocated, its NotificationManager member variable will be automatically deallocated, triggering the call to deregisterAll that will remove the dead objects from NSNotificationCenter.

In my apps, I add a notificationManager instance to my common UIViewController base class so I don't have to explicitly declare the member variable in all of my controller subclasses.

Another benefit of using my own wrapper around NSNotificationCenter is that I can add useful functionality, like group observers: an observer that's triggered when any one of a group of notifications are posted:

struct NotificationGroup {
  let entries: [String]
  
  init(_ newEntries: String...) {
    entries = newEntries
  }

}

extension NotificationManager {
  func registerGroupObserver(group: NotificationGroup, block: (NSNotification! -> ()?)) {
    for name in group.entries {
      registerObserver(name, block: block)
    }
  }
}

This can be a great way to easily set up an event handler to run when, for example, an item is changed in any way at all:

   let MyNotificationItemsChanged = NotificationGroup(
      MyNotificationItemAdded,
      MyNotificationItemDeleted,
      MyNotificationItemMoved,
      MyNotificationItemEdited
    )

    notificationManager.registerGroupObserver(MyNotificationItemsChanged) { note in
      // ...
    }

Thursday, June 26, 2014

Unit Testing in Swift

Since Swift was released at the beginning of the month, I've been doing using it for most of my iOS development. It's been a pleasant experience: I've been able to discard huge amounts of boilerplate and take advantage of a few functional programming techniques that were previously unavailable on the iPhone and iPad.

One area where Swift has made huge improvements over Objective-C is unit tests. Objective-C's verbosity made it difficult to create small, focused classes to perform specific tasks. Plus, the language's insistence on keeping only one class to a file and the cumbersome pairing of every implementation file with a header imposed a hefty penalty on programmers who tried to divide their work up into discrete, testable components.

Unit testing in Swift is done with the same XCTest framework introduced back in Xcode 5 for Objective-C. But Swift's concision and its inclusion of modern language features like closures makes XCTest much more pleasant than it was to use under Objective-C. We'll walk through a very simple example of Swift unit testing below.

To get started, create an empty iOS Application project in Xcode called Counter. Xcode will generate a CounterTests folder for you and an associated test target.

First, let's create a simple class to be tested. Create the file "Counter.swift" and add the following code to it:

import Foundation

class Counter {
  var count: Int
  
  init(count: Int) {
    self.count = count
  }
  
  convenience init() {
    self.init(count: 0)
  }
  
  func increment() {
    self.count++
  }

}

This is a very simple class, but it will be enough to illustrate how to use XCTest to test your own Swift code.

Create a file called "CounterTest.swift" in the CounterTests folder Xcode generated for you (this simple test will be your "Hello, world" for Swift testing):

import XCTest
import Counter

class CounterTest: XCTestCase {
  func testSimpleAddition() {
    let counter = Counter()
    XCTAssertEqual(0, counter.count)
  }

}

NOTE: In the current version of Swift (Beta 2), you have to import your main target into the test target to get your tests to compile and run. This is why we import Counter at the top.

NOTE: I've seen a few Swift tutorials recommend that you use the built-in Swift function assert in your test cases - do not do this! assert will terminate your entire program if it fails. Using the XCTAssert functions provides a number of important benefits:

  • If one test case fails, your other cases can continue running; assert stops the entire program.
  • Because the XCTAssert functions are more explicit about what you're expecting, they can print helpful failure messages (e.g. "2 was not equal to 3") whereas assert can only report that its condition was false. There's a broad variety of assert functions, including XCTAssertLessThan, XCTAssertNil, etc.
  • The Swift language specification explicitly forbids string interpolation in the message passed to assert; the XCTAssert functions don't face this limitation.
To try your test code out, click "Test" on the "Product" menu. Your single test should pass.

We'll add two more test cases to create and exercise several instances of Counter and to ensure that the counter wraps around when it overflows:

import XCTest
import Test

class CounterTest: XCTestCase {
  func testInvariants() {
    let counter = Counter()
    XCTAssertEqual(0, counter.count, "Counter not initialized to 0")
    
    counter.increment()
    XCTAssertEqual(1, counter.count, "Increment is broken")

    XCTAssertEqual(1, counter.count, "Count has unwanted side effects!")
  }
  
  func testMultipleIncrements() {
    let counts = [1, 2, 3, 4, 5, 6]
    
    for count in counts {
      let counter = Counter()
      
      for i in 0..count {
        counter.increment()
      }
      
      XCTAssertEqual(counter.count, count, "Incremented value does not match expected")
    }
  }
  
  func testWraparound() {
    let counter = Counter(count: Int.max)
    counter.increment()
    
    XCTAssertEqual(counter.count, Int.min)
  }
}

These tests should pass as well.

You can find out more about XCTest in the Apple guide "Testing with Xcode." I hope this was helpful - please feel free to comment if anything is unclear.

Monday, February 11, 2013

anorm-typed: Statically-Typed SQL Queries for Scala Play Applications

The Play framework's default persistence framework, Anorm, is a very thin wrapper around JDBC (the whole library is about 800 lines of code). Although I like the idea of a framework that treats a database as a database - instead of trying to shoehorn databases into the OO paradigm - Anorm has never really appealed to me. Since it's just a wrapper around SQL, you end up writing lots of raw SQL in your application. This is a problem, because the Scala compiler and typechecker have no opportunity to check your database interaction for errors. As flawed as ORM approaches can be, at least they can generate valid SQL for you. Consider this Anorm call from the Play! documentation:
  SQL(
    """
      select * from Country c 
      join CountryLanguage l on l.CountryCode = c.Code 
      where c.code = {countryCode};
   """
  ).on("countryCode" -> "FRA")
Here are just some of the ways this code can go wrong at runtime:
  • A typo in an SQL keyword
  • A typo in a column or table name
  • Reference to a column or table that doesn't exist
  • A typo in the "countryCode" key passed to the "on" function
  • Passing in a non-string value for "countryCode"
  • A mismatch between the parameters named in the query string and the keys passed to "on"
With Anorm's primary competitors (SLICK and Squeryl), you create mappings between columns and class fields, then use a query DSL to translate Scala Collections-like code into SQL. These frameworks are still vulnerable to some of the above problems, but they have some advantages:
  • You map each column only once, so if you get the column's name or type wrong, there's only one place to correct it, and then the rest of your program will be free of that particular bug.
  • These frameworks generate SQL themselves from a simple Scala DSL, so most syntax errors are ruled out.
Yet, these frameworks also introduce a number of issues:
  • You need to manually maintain model mappings that can drift out of sync with the database
  • The DSL's these libraries provide are necessarily limited. Some queries that would be straightforward and fast with pure SQL are simply inexpressible in these DSL's.
  • Both mappings are database-agnostic. This has obvious advantages, but if you need to take advantage of a database-specific data type, function or syntactical convenience, you're out of luck.
About a month ago, Play developer Guillaume Bort announced a proof-of-concept implementation of a statically-checked version of Anorm, Play's persistence framework (source on Github). The framework was inspired by Joni Freeman's sqltyped framework. The main API of anorm-typed is the TypedSQL macro. When you compile a Scala file that contains TypedSQL calls, these calls are expanded into type-safe code that accepts parameters for any placeholders in the SQL and returns a tuple based on the column types selected in the query. Here's a short example:

  // assume
  // CREATE TABLE users(
  //    id integer,
  //    best_friend_id integer,
  //    name varchar(256)
  // );

  val q = TypedSQL("select * from users")
  q().list() // returns List[(Int, Int, String)]

  val q2 = TypedSQL("select name from users where id = ?")
  q2(5).single() // returns String

  val q3 = TypedSQL("update users set name = ? where id = 5")
  q3("Tyrone Slothrop").execute()

The anorm-typed module will catch every one of the errors I listed above - before your application can even compile. Note that everything here is type-checked, and that the code simply will not compile if we make a mistake matching Scala types to SQL types, if the SQL has a syntax error, if we use a nonexistent column, or if we provide the wrong number of arguments to the query. Awesome. Of course, there are some drawbacks to this approach:
  • The TypedSQL macro needs to connect to your database during compilation. This can cause a number of issues:
    • CI servers or other automated builds will need to be able to access a database to finish compilation
    • IDE's have no idea what to do with the TypedSQL macro - IntelliJ highlights every call as an error, even though the code compiles fine.
Still, this is pretty close to my holy grail for database interaction. I'm planning to set aside some time to work on an alternative implementation that would suit my needs a little better: instead of a macro, I'm planning to build an SBT plugin for Play apps that would, as with the conf/routes compiler, compile a list of SQL queries into an autogenerated file.

Tuesday, November 15, 2011

I should also say, there's really nothing wrong in just coding the
imperative algorithm in Scala. It's not a sacrilege to use imperative code,
in particular if it's inside a function. Sometimes it's the clearest way to
express things. Many other times it is not.

- Martin Oderksy

Saturday, August 20, 2011

The Implicit Environment Pattern

In general, I'm pretty skeptical of the purported benefits of test-driven development. Nevertheless, there are cases where some unit tests can be extremely helpful. Recently, I wanted to write a simple command-line utility to analyze some sales data for me. The problem was simple and well-contained enough that I thought it would make sense to write a few unit tests.

One of the perennial problems with writing good unit tests is cleanly separating out the objects being tested from their dependencies and collaborators. Most discussions of this topic limit this to separating out references to global variables or singleton objects. They typically do this with dependency injection. Conventional dependency injection is a pretty good solution to this problem, but it's often an extralingual solution, relying on XML files that are kept separate from your source code and which fall outside the purview of the type system. Even if they avoid XML files (Guice comes to mind), they're fairly heavy systems, in that you have to bring in jar's, set up your configuration, and get familiar with the API.

And aside from simply avoiding hardcoded class references, there were some other kinds of dependencies that I wanted to isolate:

  • Environment information, such as where to find configuration files, server addresses and ports, or flags
  • Side effects, including I/O For example, println assumes the presence of a console and offers no way for you to compare what was printed to what you expected to print. Similarly, reading from the console assumes that someone is present to type in commands and provides no way for you to provide mock input.

I eventually hit on a solution I'm calling the Implicit Environment Pattern. It makes heavy use of Scala-specific features, like traits and implicits. Unlike most dependency injection solutions, this is simply a code pattern and requires no external jar's, dependencies or configuration languages.

The key principles of this pattern are:

  • Separate configuration, hardcoded class references and I/O operations into abstract traits
  • Define concrete implementations of these traits for different environments, e.g. production, testing, QA, development
  • Pass these implementations into their dependent classes via implicit parameters to the constructor
  • Dependencies are transitive - the implicit values passed into one class will be transparently passed into any other instances they create, as long as those classes declare their dependencies via implicit parameters

To start with, I'll create a number of traits to declare very fine-grained dependencies:

trait WritesOutput { 
  def println(x: Any): Unit 
}

trait ReadsSalesDirectory { 
  def salesDirectory: File 
}

trait ReadsHistoricalRecords { 
  def historicalRecordsFile: File 
}

trait NeedsConsoleInput { 
  def readLine(completionOptions: Iterable[String] = Nil): String 
}

In this example, I've made the choice to treat the sales directory as a configuration parameter, so I can simply pass in a different directory in different environments. Depending on how much I/O is going on, I could also create a trait with a readFromFiles method that abstracts the I/O completely; in that case, my test implementations could just return an in-memory list from readFiles without touching the filesystem at all. Which of these paths you choose depends on how complicated the I/O is and on whether it's easier to provide test files or just create in-memory data structures. In this case, I have easy access to sample files, I'm pretty confident that basic line-reading code will work and I'm only reading data and not changing anything, so I've made this choice in the interest of pragmatism.

Now let's write a class that makes use of these traits via implicit parameters:

class SalesReportParser(val startingDay: LocalDate)(
     implicit salesDir: ReadsSalesDirectory, writer: WritesOutput) {
  ...
  
  def parseAll(): List[SalesRecord] = {
    val result = withSource(Source.fromFile(salesDir.salesDirectory)) { source => 
      ... 
    }

    writer.println("Parsed %d files.".format(result.count))
    result
  }
}

Clients of the class will pass in its dependencies transparently via implicits. So in my main method, I simply write:

object SalesReport {
  class ProdEnv extends WritesToStdout 
        with ReadsSalesDirectory 
        with ReadsBooksAndRecords 
        with AcceptsInput {
    def println(x: Any) = Predef.println(x)
    val salesDirecotry = new File(...)
    val historicalRecordsFile = new File(...)
    def readLine(completionOptions: Iterable[String]) = {
       ...
    }    
  }

  def main(args: Array[String]) {
    implicit val env: ProdEnv = new ProdEnv

    val parser = new SalesReportParser(new LocalDate)
    parser.parseAll()

    ...
  }
}

(For convenience, one class implements all of the necessary traits. In tests, I might want to create separate implementations for individual dependencies.)

The implicit value env in main is transparently passed into SalesReportParser's constructor. Importantly, if SalesReportParser instantiates any classes that have their own dependencies declared as implicits, they'll be automatically passed in from the SalesReportParser instance.

If I fail to provide any of the parser's dependencies, the bug will be caught at compile time, since implicit resolution will fail when SalesReportParser's constructor is called. This is what I want: it's a beautiful thing when you're able to structure your code such that certain bugs are "inexpressible," i.e. the compiler will reject them outright before they can even become bugs.

So now I've isolated the console output dependency and sales directory path from my sales report parsing logic. Further, each class specifies exactly what dependencies it needs - the SalesReportParser can't read the historical records without explicitly adding ReadsHistoricalRecords as an implicit parameter, so readers of this code can immediately determine how a given class depends on and affects the world around it. In effect, the SalesReportParser class acts like a function in the functional programming sense - since I provide the class with the objects that will accept its input and emit its output, it need not affect the real world unless I provide it dependencies that do so.

But the other benefit is that my unit tests become extremely straightforward:

class TestTheParser extends TestCase {
  class TestEnv extends ReadsSalesDirectory with WritesToStdout {
    var writtenData: List[String] = Nil

    def salesDirectory = new File("test/testSalesData")
    def println(x: Any) = {
      writtenData ::= x
 }
  }

  val testDate = ...

  def testMe = {
    implicit val env = new TestEnv
    val parser = new SalesReportParser(testDate)
    parser.parseAll()

    env.writtenData should_== List("Parsed 10 files.")
  }
}

The Implicit Environment pattern also allows us to create a single TestEnv class to be used throughout all test cases and override its behavior only when necessary. For example, our TestEnv class will by default accumulate output into the writtenData array, but we may want to override this in certain cases:

class MyTest extends TestCase {
  def testMe = {
    implicit val testEnvCatchesPrinting = new TestEnv {
      override def println(x) = fail("should not print to stdout")
    }

    val parser = new SalesReportParser
    parser.parseAll() // any output here will cause the test to fail
  }
}

Another advantage of this architecture - with strict separation of I/O into explicitly specified dependencies - is that it's less likely that test code will accidentally perform operations on production data. By isolating as much I/O as is practical into dependency traits, the only way the the two could intermingle would be by intentionally passing in a production implementation into test code, or vice-versa.

Saturday, July 9, 2011

Prefer recursion to var's and while loops

A while back, when I was first picking up Scala, I needed to prompt the user for a line of text and keep prompting until I got a non-blank line.

Apparently, I threw the following together:

def readCompleteLine(prompt: String) = {
  val reader = new jline.ConsoleReader()
  reader.setDefaultPrompt(prompt)

  var line: String = null

  do {
    line = reader.readLine()
  } while( line != null && line.length == 0 )

  line
}

I came across this function yesterday and involuntarily made a gross-out sound. var's, mutability, a while loop (a do-while loop, no less!) and even the dread null. There had to be a better way.

Using higher-order functions and recursion, I rewrote this to be less imperative, more readable and more general:

def readUntil[X](prompt: String, transform: String => X, pred: X => Boolean): X =  {
  val reader = new jline.ConsoleReader() { setDefaultPrompt(prompt)}

  @tailrec def ruHelper: X = Option(reader.readLine()).map(transform(_)) match {
    case Some(x) if pred(x) => x
    case _ => ruHelper
  }

  ruHelper
}

/* simple, string-only version */
def readUntil(prompt: String, pred: String => Boolean): String =  {
  readUntil(prompt, x => x, pred)
}

Using this as a primitive, it's easy to write a method that repeatedly prompts the user for a string until it's non-empty.

def readUntilNonBlankLine(prompt: String) = readUntil(prompt, _.length > 0)

But it's also easy to create a method that only allows input from a list of accepted strings:

def readOneOf(prompt: String, acceptableValues: List[String]): String = 
    readUntil(prompt, acceptableValues.contains(_))

Or even a function that reads until the value is a properly-formed floating-point number:

import util.control.Exception._

def tryToReadDouble(s: String) = 
  catching(classOf[NumberFormatException]) opt (s.toDouble)

def readUntilValidDouble(prompt: String) = ReadingUtils.readUntil(prompt,
      tryToReadDouble(_), (s: Option[Double]) => s.isDefined).get

Higher-order functions and genericity can really pay off - the base readUntil
is now a nicely abstracted primitive that I can test in isolation and then plug in elsewhere in my program, without having to write and rewrite error-prone imperative code.

Wednesday, June 8, 2011

Migrating to sbt 0.10: sbt-idea-plugin

In pre-0.10 versions of sbt, you could use sbt-idea-plugin to generate IDEA projects from your SBT files. That plugin doesn't work in 0.10, but here's how to make it happen:

Create a new file ~/.sbt/plugins/project/Build.scala and put the following code in it:

import sbt._

object MyPlugins extends Build {
  lazy val root = Project("root", file(".")) 
                    dependsOn (uri("git://github.com/ijuma/sbt-idea.git#sbt-0.10"))
}

Now restart SBT and type gen-idea (this command replaces the old idea). SBT 0.10 includes any code in this directory in all projects, so any other sbt-0.10 project will immediately get this functionality.

The initial project created by idea-gen will have a separate IDEA module for the project folder. I haven't been able to get IDEA to compile anything while this module is there, so I remove the module and then mark the project as excluded in the main module.

Note that if you later make a change to your .sbt file (e.g. adding a new dependency), you need to bring the IDEA project up-to-date. To do this, open the SBT Console in IDEA, run "update" and then run "gen-idea". Your IDEA project will be brought up to date and IDEA will prompt you to reload the project. Choose 'OK' and you're good to go.

(thanks to Graham Tackley for figuring out the proper way to do this)

Update: Updated the path to ~/.sbt/plugins/project/Build.scala, as it's not necessary to configure the IDEA plugin on a per-project basis.

Migrating to sbt 0.10: Lift

The latest release of sbt (version 0.10) is utterly incompatible with build scripts from older versions of SBT. Here's how to migrate a Lift project to SBT 0.10 (these instructions work with 2.4-M1; I believe you can use earlier versions of lift, as long as you downgrade scalaVersion to 2.8.1):

  1. Rename project/ to project-old/
  2. Create a new build.sbt file in the root of your project

    name := "projname"
    
    scalaVersion := "2.9.0"
    
    seq(WebPlugin.webSettings: _*)
    
    libraryDependencies ++= Seq(
      "net.liftweb" %% "lift-webkit" % "2.4-M1" % "compile->default",
      "net.liftweb" %% "lift-mapper" % "2.4-M1" % "compile->default",
      "net.liftweb" %% "lift-wizard" % "2.4-M1" % "compile->default")
    
    
    libraryDependencies ++= Seq(
      "junit" % "junit" % "4.5" % "test->default",
      "org.mortbay.jetty" % "jetty" % "6.1.22" % "jetty",
      "javax.servlet" % "servlet-api" % "2.5" % "provided->default",
      "com.h2database" % "h2" % "1.2.138",
      "ch.qos.logback" % "logback-classic" % "0.9.26" % "compile->default"
    )
  3. mkdir -p project/plugins
  4. Create a new file project/plugins/build.sbt and put the following in it:
    resolvers += "Web plugin repo" at "http://siasia.github.com/maven2"
    
    libraryDependencies <+= sbtVersion("com.github.siasia" %% "xsbt-web-plugin" % _)

Now you're good to go.

Note that the Jetty dependency has changed. If you execute jetty-run and no output appears, make sure you changed the scope for the Jetty dependency from "test" (the correct value in older SBT's) to "jetty".

Also note that the new SBT parser appears to be newline-sensitive; make sure you keep blank lines between each setting, as in the example above.

Wednesday, February 23, 2011

I want to go to there

From the excellent Deprecating the Observer Pattern by Odersky et al:

// step 1:
val path = new Path((self next mouseDown).position) 

// step 2: 
self loopUntil mouseUp {
 val m = self next mouseMove 
 path.lineTo(m.position) 
 draw(path)
}

// step 3:
path.close()
draw(path)

This is a complete implementation of a sketching application using scala-react.

Friday, January 21, 2011

Thoughts on node.js and Haskell

I've been hearing chatter about node.js in the last few months. A few weeks ago, a colleague gave me a quick demo and I decided to take a closer look. I was impressed with how much scale it could handle without batting an eye, and I was especially impressed with the claim (wish I could find where I saw it) that you could replace a cluster of five or six thread-based web servers with a single node.js process and see comparable performance.

Many web servers (e.g. Tomcat, Apache) dedicate a thread to every open connection. When a web request comes in, a thread is either created fresh or, if possible, dequeued from a thread pool. The thread performs some computation and I/O, and the thread scheduler puts it to sleep when it blocks for I/O or when it exceeds its quantum. This works well enough, but as more and more threads are created to handle simultaneous connections, the paradigm starts to break down: each new thread adds some overhead, and a large enough number of simultaneous connections will grind the server to a halt.

node.js is a JavaScript environment built from the ground up to use asynchronous I/O. Like other JavaScript implementations it has a single main thread. It achieves concurrency through an API that requires you to provide a callback (a closure) specifying what should happen when any I/O operation completes. Instead of using threads or blocking, node.js uses low-level event primitives to register for I/O notifications. The overhead of context switching and thread tracking is replaced by a simple queue of callbacks. As the events come in, node.js invokes the corresponding callback. The callback then carries on its work until it either begins another I/O request or completes its task, at which point node handles the next event or sleeps while waiting for a new one. Here's a simple web-based "Hello, world!" in node.js from the snap-benchmarks repository:



As you can see, all I/O operations accept a callback as their final argument. Instead of a series of calls that happen to block, we explicitly handle each case where an I/O request completes, giving us code consisting of nested callbacks.

In addition to reducing the strain placed on a system by a large number of threads, this approach also removes the need for thread synchronization - everything is happening in rapid sequence, rather than in parallel, so race conditions and deadlock are not a concern. The disadvantages are that any runaway CPU computation will hose your server (there's only one thread, after all) and that you must write your code in an extremely unnatural style to get it to work at all. The slogan of this blog used to be a famous Olin Shivers quote: "I object to doing things computers can do" - and explicitly handling I/O responses certainly falls into the category of Things a Computer Should Do.

Anyway, this got me thinking about how Haskell could take advantage of the asynchronous I/O paradigm. I was delighted to find that others were thinking the same thing. As it happens, the new I/O manager in GHC 7 uses async I/O behind the scenes - so your existing code with blocking calls automatically takes advantage of the better scalability of asynchronous I/O. The Snap framework has published benchmarks that show it handling about 150% more requests per second than node.js:



But what's even nicer is that there's no need to contort your code to get this performance. Snap and the underlying GHC I/O manager completely abstract away the asynchronous nature of what I'm doing. I don't need to set callbacks because the I/O manager is handling that under the covers. I pretend I'm using blocking I/O and the I/O manager takes care of everything else. I get some defense against long-running computations (because more than one thread is processing the incoming events) and, most importantly, I can write code in what I consider a saner style.

The catch is that most I/O libraries are not designed for asynchronous I/O. So, network and other I/O calls made at the Haskell level will work appropriately, but external libraries (say, MySQL) are often written with blocking I/O at the C level, which can negate some of the scalability gains from asynchronous I/O.

Thursday, December 9, 2010

parallel-io

A few days ago, parallel-io was uploaded to Hackage. Previously, when I wanted to perform a number of I/O actions in parallel, I used some homegrown code that used forkIO and MVar's to run a bunch of I/O actions in parallel and collect their result.

The new package makes all that unnecessary. Its two most important functions functions are:

    parallel_ :: [IO a] -> IO ()  -- ignores results
    parallel  :: [IO a] -> IO [a] -- collects results into a list
I have some code that fetches a number of exchange rates in relation to the US dollar. I want to do this work in parallel, so previously I used my own combinators. Now it's as simple as:
    getExchangeRateToUSD :: String -> IO Double
    getExchangeRateToUSD = ...

    parallel $ map getExchangeRateToUSD ["CAD", "EUR", "JPY", "GBP"]
The map call returns a list of IO actions. parallel-io dutifully forks off a thread for each of these actions and returns a value of type IO [Double]. Nice and simple.

Sunday, September 12, 2010

Getting started with Leksah

I've been wanting to try Leksah for some time, but found that it doesn't give you many cues about how to get started when you first run it. Today, I sat down and got a simple example running.

Here's how to get a simple Haskell program running in Leksah:

  1. Open Leksah. If this is the first time you're running it, just hit OK when asked for search paths and wait for a bit (possibly quite a bit) while it indexes your installed packages.
  2. Click "Workspace | New Workspace" and specify a location for the Leksah workspace, which is a single file that will track references to the Leksah projects you'll create.
  3. Click Module | New Module. This creates a new Cabal module under your workspace. The New Module dialog is looking for a folder which will hold the Cabal files. Create a new directory, navigate to it, and press Open.
  4. Press the Save button at the bottom of the Package tab.
  5. Write your code within the module you created. Note that Leksah will complete function names as you type. It will also repeatedly compile your package and show the output in the Log window to alert you to errors.
  6. Use the package menu to run your code or to install it via cabal. Your output will appear in the Log pane.

Hope that's helpful!

Tuesday, August 24, 2010

Haskell Syntax Highlighting for Blogs

Can anyone recommend a good way to get Haskell syntax highlighting for blog posts? I've been using Gist, which works very well, except that Gists don't show up in RSS feeds.

Monday, August 23, 2010

Planet Haskell

I give permission to include this blog and its contents on the Planet Haskell aggregator.

Saturday, August 21, 2010

Word Wrapping in Haskell

Prompted by a question on haskell-cafe, here's a relatively concise word-wrapping implementation I wrote:

Update: Thanks to Yitz in the comments for pointing out that "any isSpace line" should be "any isSpace beforeMax"

Sunday, July 25, 2010

cabal unpack

If you want to browse the source code of a package on Hackage, "cabal unpack" is a useful command.

packages % ls
packages % cabal unpack web-routes-quasi
Unpacking to web-routes-quasi-0.5.0/
packages % ls
web-routes-quasi-0.5.0
packages % 

Just pass it the name of a package and you'll get an unzipped tarball in your current directory. Very useful.

Thursday, May 27, 2010

readFile and lazy I/O

Recently, I came across a problem in a Haskell script I run frequently. Every so often, I drop a report file into a designated folder. Then I run my script, which peforms an operation like the following.

getReportFiles >>= map readFile

This worked fine for months - until this morning, when the program crashed with an error indicating that too many files had been opened.

The problem is that readFile uses lazy I/O. Generally, when we write code like getLine >>= putStrLn, we expect these calls to happen in order - indeed, that's one of the primary purposes of the IO monad. But readFile uses hGetContents internally, which is an exception to the strict I/O found elsewhere in Haskell. So readFile opens the file and then returns a thunk instead of actually reading the file. Only when the thunk is evaluated is the I/O performed and the file read into memory. And only when the thunk has been fully evaluated will the open file be closed.

So in my snippet, I was reading in hundreds of files as thunks. and until the full contents of the thunks were evaluated, the files all remained open. This was no problem until the number of reports I had to process reached a certain point and exposed the bug.

The solution in my case was to use Data.ByteString:

import qualified Data.ByteString.Char8 as BS

eagerReadFile :: FilePath -> IO String
eagerReadFile file = BS.unpack <$> BS.readFile file


ByteString's readFile method is eager, so you'll get back the complete file contents.


Update: In the comments, Chris points out System.IO.Strict, which has a strict readFile function that simply replaces Prelude.readFile.

Lazy I/O can be very useful: instead of reading in the complete contents of a large file, you can read it lazily using a function in the hGetContents family and then process it without having to read the entire contents into memory at once. But lazy I/O can surprise you if you're not expecting it.

(thanks to #haskell for pointing out the eager Data.ByteString.Char8.readFile)

Friday, April 23, 2010

The Brilliance of Maybe and the Value of Static Type Checking

I am sometimes surprised by the strokes of genius hidden inside Haskell's type system. For example: the Maybe type.

Maybe a is a parameterized type that can hold either the value Nothing, or a value Just a . Seems simple enough - it's an optional value.

But it goes deeper than that. Look at an object of type String in Java. This object's type is actually larger than String - it can either represent a string of characters or the special value null. Languages like Java, Ruby, and C# all make the mistake of allowing any type at all to hold null. This greatly diminishes the value of static typing, since any call of the form string.length() will sail happily past the compiler and then throw the dread NullPointerException at runtime if passed a null. The compiler will never see it coming, and your program could work for some time before you discover the problem.

But in Haskell, we must be explicit. By default, an object of type Integer can only hold an integral value - there is no null, and no possibility of a runtime null pointer exception. The Haskell compiler will not even produce a program if you try to pass Nothing into a non-Maybe type. A datatype cannot be instantiated with null values unless you explicitly allow it to hold them.

This eliminates an enormous class of errors. Anyone who has maintained a large Java application instinctively shudders at the prospect of an NPE - the value could have become null at any point in the program and resolving it requires extensive tracing.

Further, in order to avoid the possibility of an NPE, you must litter your code with distracting and unnecessary conditionals like:

if( foo == null ) throw new IllegalArgumentException( ..
if( foo.bar() == null ) throw new IllegalArgumentException( ... );


before doing any real work. But what's worse is that you're turning a programming error into a runtime exception. The fact that your program compiles does not guarantee that you will have no NPE's and running it does not guarantee that you will hit every branch that might generate one.

In Haskell, it is simply impossible to write a program that is susceptible to NullPointerExceptions at runtime unless you explicitly make the type in question optional by wrapping with Maybe.