Saturday, August 20, 2011

The Implicit Environment Pattern

In general, I'm pretty skeptical of the purported benefits of test-driven development. Nevertheless, there are cases where some unit tests can be extremely helpful. Recently, I wanted to write a simple command-line utility to analyze some sales data for me. The problem was simple and well-contained enough that I thought it would make sense to write a few unit tests.

One of the perennial problems with writing good unit tests is cleanly separating out the objects being tested from their dependencies and collaborators. Most discussions of this topic limit this to separating out references to global variables or singleton objects. They typically do this with dependency injection. Conventional dependency injection is a pretty good solution to this problem, but it's often an extralingual solution, relying on XML files that are kept separate from your source code and which fall outside the purview of the type system. Even if they avoid XML files (Guice comes to mind), they're fairly heavy systems, in that you have to bring in jar's, set up your configuration, and get familiar with the API.

And aside from simply avoiding hardcoded class references, there were some other kinds of dependencies that I wanted to isolate:

  • Environment information, such as where to find configuration files, server addresses and ports, or flags
  • Side effects, including I/O For example, println assumes the presence of a console and offers no way for you to compare what was printed to what you expected to print. Similarly, reading from the console assumes that someone is present to type in commands and provides no way for you to provide mock input.

I eventually hit on a solution I'm calling the Implicit Environment Pattern. It makes heavy use of Scala-specific features, like traits and implicits. Unlike most dependency injection solutions, this is simply a code pattern and requires no external jar's, dependencies or configuration languages.

The key principles of this pattern are:

  • Separate configuration, hardcoded class references and I/O operations into abstract traits
  • Define concrete implementations of these traits for different environments, e.g. production, testing, QA, development
  • Pass these implementations into their dependent classes via implicit parameters to the constructor
  • Dependencies are transitive - the implicit values passed into one class will be transparently passed into any other instances they create, as long as those classes declare their dependencies via implicit parameters

To start with, I'll create a number of traits to declare very fine-grained dependencies:

trait WritesOutput { 
  def println(x: Any): Unit 
}

trait ReadsSalesDirectory { 
  def salesDirectory: File 
}

trait ReadsHistoricalRecords { 
  def historicalRecordsFile: File 
}

trait NeedsConsoleInput { 
  def readLine(completionOptions: Iterable[String] = Nil): String 
}

In this example, I've made the choice to treat the sales directory as a configuration parameter, so I can simply pass in a different directory in different environments. Depending on how much I/O is going on, I could also create a trait with a readFromFiles method that abstracts the I/O completely; in that case, my test implementations could just return an in-memory list from readFiles without touching the filesystem at all. Which of these paths you choose depends on how complicated the I/O is and on whether it's easier to provide test files or just create in-memory data structures. In this case, I have easy access to sample files, I'm pretty confident that basic line-reading code will work and I'm only reading data and not changing anything, so I've made this choice in the interest of pragmatism.

Now let's write a class that makes use of these traits via implicit parameters:

class SalesReportParser(val startingDay: LocalDate)(
     implicit salesDir: ReadsSalesDirectory, writer: WritesOutput) {
  ...
  
  def parseAll(): List[SalesRecord] = {
    val result = withSource(Source.fromFile(salesDir.salesDirectory)) { source => 
      ... 
    }

    writer.println("Parsed %d files.".format(result.count))
    result
  }
}

Clients of the class will pass in its dependencies transparently via implicits. So in my main method, I simply write:

object SalesReport {
  class ProdEnv extends WritesToStdout 
        with ReadsSalesDirectory 
        with ReadsBooksAndRecords 
        with AcceptsInput {
    def println(x: Any) = Predef.println(x)
    val salesDirecotry = new File(...)
    val historicalRecordsFile = new File(...)
    def readLine(completionOptions: Iterable[String]) = {
       ...
    }    
  }

  def main(args: Array[String]) {
    implicit val env: ProdEnv = new ProdEnv

    val parser = new SalesReportParser(new LocalDate)
    parser.parseAll()

    ...
  }
}

(For convenience, one class implements all of the necessary traits. In tests, I might want to create separate implementations for individual dependencies.)

The implicit value env in main is transparently passed into SalesReportParser's constructor. Importantly, if SalesReportParser instantiates any classes that have their own dependencies declared as implicits, they'll be automatically passed in from the SalesReportParser instance.

If I fail to provide any of the parser's dependencies, the bug will be caught at compile time, since implicit resolution will fail when SalesReportParser's constructor is called. This is what I want: it's a beautiful thing when you're able to structure your code such that certain bugs are "inexpressible," i.e. the compiler will reject them outright before they can even become bugs.

So now I've isolated the console output dependency and sales directory path from my sales report parsing logic. Further, each class specifies exactly what dependencies it needs - the SalesReportParser can't read the historical records without explicitly adding ReadsHistoricalRecords as an implicit parameter, so readers of this code can immediately determine how a given class depends on and affects the world around it. In effect, the SalesReportParser class acts like a function in the functional programming sense - since I provide the class with the objects that will accept its input and emit its output, it need not affect the real world unless I provide it dependencies that do so.

But the other benefit is that my unit tests become extremely straightforward:

class TestTheParser extends TestCase {
  class TestEnv extends ReadsSalesDirectory with WritesToStdout {
    var writtenData: List[String] = Nil

    def salesDirectory = new File("test/testSalesData")
    def println(x: Any) = {
      writtenData ::= x
 }
  }

  val testDate = ...

  def testMe = {
    implicit val env = new TestEnv
    val parser = new SalesReportParser(testDate)
    parser.parseAll()

    env.writtenData should_== List("Parsed 10 files.")
  }
}

The Implicit Environment pattern also allows us to create a single TestEnv class to be used throughout all test cases and override its behavior only when necessary. For example, our TestEnv class will by default accumulate output into the writtenData array, but we may want to override this in certain cases:

class MyTest extends TestCase {
  def testMe = {
    implicit val testEnvCatchesPrinting = new TestEnv {
      override def println(x) = fail("should not print to stdout")
    }

    val parser = new SalesReportParser
    parser.parseAll() // any output here will cause the test to fail
  }
}

Another advantage of this architecture - with strict separation of I/O into explicitly specified dependencies - is that it's less likely that test code will accidentally perform operations on production data. By isolating as much I/O as is practical into dependency traits, the only way the the two could intermingle would be by intentionally passing in a production implementation into test code, or vice-versa.

4 comments:

  1. Nice; thanks! This seems to be the Scala version of the "implicit configurations" technique that Oleg Kiselyov and I described in http://www.cs.rutgers.edu/~ccshan/prepose/prepose.pdf . One difference is that you don't need to translate values to types and back to values in Scala.

    ReplyDelete
  2. Nice. With scala you can also do compilation at run time, see e.g. my little project http://bitbucket.org/egh/ssconf, so you can change those paths, etc. without rebuilding completely

    ReplyDelete
  3. Nice post. Have you looked at the Cake Pattern ? (http://debasishg.blogspot.com/2011/03/pushing-envelope-on-oo-and-functional.html)

    ReplyDelete