The Neophyte’s Guide to Scala Part 10: Staying DRY With Higher-order Functions

Jan 23rd, 2013

In the previous articles, I discussed the composable nature of Scala’s container types. As it turns out, being composable is a quality that you will not only find in Future, Try, and other container types, but also in functions, which are first class citizens in the Scala language.

Composability naturally entails reusability. While the latter has often been claimed to be one of the big advantages of object-oriented programming, it’s a trait that is definitely true for pure functions, i.e. functions that do not have any side-effects and are referentially transparent.

One obvious way is to implement a new function by calling already existing functions in its body. However, there are other ways to reuse existing functions: In this blog post, I will discuss some fundementals of functional programming that we have been missing out on so far. You will learn how to follow the DRY principle by leveraging higher-order functions in order to reuse existing code in new contexts.

On higher-order functions

A higher-order function, as opposed to a first-order function, can have one of three forms:

One or more of its parameters is a function, and it returns some value.
It returns a function, but none of its parameters is a function.
Both of the above: One or more of its parameters is a function, and it returns a function.

If you have followed this series, you have seen a lot of usages of higher-order functions of the first type: We called methods like map, filter, or flatMap and passed a function to it that was used to transform or filter a collection in some way. Very often, the functions we passed to these methods were anonymous functions, sometimes involving a little bit of duplication.

In this article, we are only concerned with what the other two types of higher-order functions can do for us: The first of them allows us to produce new functions based on some input data, whereas the other gives us the power and flexibility to compose new functions that are somehow based on existing functions. In both cases, we can eliminate code duplication.

And out of nowhere, a function was born

You might think that the ability to create new functions based on some input data is not terribly useful. While we mainly want to deal with how to compose new functions based on existing ones, let’s first have a look at how a function that produces new functions may be used.

Let’s assume we are implementing a freemail service where users should be able to configure when an email is supposed to be blocked. We are representing emails as instances of a simple case class:

case class Email(
  subject: String,
  text: String,
  sender: String,
  recipient: String)

We want to be able to filter new emails by the criteria specified by the user, so we have a filtering function that makes use of a predicate, a function of type Email => Boolean to determine whether the email is to be blocked. If the predicate is true, the email is accepted, otherwise it will be blocked:

type EmailFilter = Email => Boolean
def newMailsForUser(mails: Seq[Email], f: EmailFilter) = mails.filter(f)

Note that we are using a type alias for our function, so that we can work with more meaningful names in our code.

Now, in order to allow the user to configure their email filter, we can implement some factory functions that produce EmailFilter functions configured to the user’s liking:

val sentByOneOf: Set[String] => EmailFilter =
  senders => email => senders.contains(email.sender)
val notSentByAnyOf: Set[String] => EmailFilter =
  senders => email => !senders.contains(email.sender)
val minimumSize: Int => EmailFilter = n => email => email.text.size >= n
val maximumSize: Int => EmailFilter = n => email => email.text.size <= n

Each of these four vals is a function that returns an EmailFilter, the first two taking as input a Set[String] representing senders, the other two an Int representing the length of the email body.

We can use any of these functions to create a new EmailFilter that we can pass to the newMailsForUser function:

val emailFilter: EmailFilter = notSentByAnyOf(Set("johndoe@example.com"))
val mails = Email(
  subject = "It's me again, your stalker friend!",
  text = "Hello my friend! How are you?",
  sender = "johndoe@example.com",
  recipient = "me@example.com") :: Nil
newMailsForUser(mails, emailFilter) // returns an empty list

This filter removes the one mail in the list because our user decided to put the sender on their black list. We can use our factory functions to create arbitrary EmailFilter functions, depending on the user’s requirements.

Reusing existing functions

There are two problems with the current solution. First of all, there is quite a bit of duplication in the predicate factory functions above, when initially I told you that the composable nature of functions made it easy to stick to the DRY principle. So let’s get rid of the duplication.

To do that for the minimumSize and maximumSize, we introduce a function sizeConstraint that takes a predicate that checks if the size of the email body is okay. That size will be passed to the predicate by the sizeConstraint function:

type SizeChecker = Int => Boolean
val sizeConstraint: SizeChecker => EmailFilter = f => email => f(email.text.size)

Now we can express minimumSize and maximumSize in terms of sizeConstraint:

val minimumSize: Int => EmailFilter = n => sizeConstraint(_ >= n)
val maximumSize: Int => EmailFilter = n => sizeConstraint(_ <= n)

Function composition

For the other two predicates, sentByOneOf and notSentByAnyOf, we are going to introduce a very generic higher-order function that allows us to express one of the two functions in terms of the other.

Let’s implement a function complement that takes a predicate A => Boolean and returns a new function that always returns the opposite of the given predicate:

def complement[A](predicate: A => Boolean) = (a: A) => !predicate(a)

Now, for an existing predicate p we could get the complement by calling complement(p). However, sentByAnyOf is not a predicate, but it returns one, namely an EmailFilter.

Scala functions provide two composing functions that will help us here: Given two functions f and g, f.compose(g) returns a new function that, when called, will first call g and then apply f on the result of it. Similarly, f.andThen(g) returns a new function that, when called, will apply g to the result of f.

We can put this to use to create our notSentByAnyOf predicate without code duplication:

val notSentByAnyOf = sentByOneOf andThen(g => complement(g))

What this means is that we ask for a new function that first applies the sentByOneOf function to its arguments (a Set[String]) and then applies the complement function to the EmailFilter predicate returned by the former function. Using Scala’s placeholder syntax for anonymous functions, we could write this more concisely as:

val notSentByAnyOf = sentByOneOf andThen(complement(_))

Of course, you will now have noticed that, given a complement function, you could also implement the maximumSize predicate in terms of minimumSize instead of extracting a sizeConstraint function. However, the latter is more flexible, allowing you to specify arbitrary checks on the size of the mail body.

Composing predicates

Another problem with our email filters is that we can currently only pass a single EmailFilter to our newMailsForUser function. Certainly, our users want to configure multiple criteria. We need a way to create a composite predicate that returns true if either any, none or all of the predicates it consists of return true.

Here is one way to implement these functions:

def any[A](predicates: (A => Boolean)*): A => Boolean =
  a => predicates.exists(pred => pred(a))
def none[A](predicates: (A => Boolean)*) = complement(any(predicates: _*))
def every[A](predicates: (A => Boolean)*) = none(predicates.view.map(complement(_)): _*)

The any function returns a new predicate that, when called with an input a, checks if at least one of its predicates holds true for the value a. Our none function simply returns the complement of the predicate returned by any – if at least one predicate holds true, the condition for none is not satisfied. Finally, our every function works by checking that none of the complements to the predicates passed to it holds true.

We can now use this to create a composite EmailFilter that represents the user’s configuration:

val filter: EmailFilter = every(
    notSentByAnyOf(Set("johndoe@example.com")),
    minimumSize(100),
    maximumSize(10000)
  )

Composing a transformation pipeline

As another example of function composition, consider our example scenario again. As a freemail provider, we want not only to allow user’s to configure their email filter, but also do some processing on emails sent by our users. These are simple functions Email => Email. Some possible transformations are the following:

val addMissingSubject = (email: Email) =>
  if (email.subject.isEmpty) email.copy(subject = "No subject")
  else email
val checkSpelling = (email: Email) =>
  email.copy(text = email.text.replaceAll("your", "you're"))
val removeInappropriateLanguage = (email: Email) =>
  email.copy(text = email.text.replaceAll("dynamic typing", "**CENSORED**"))
val addAdvertismentToFooter = (email: Email) =>
  email.copy(text = email.text + "\nThis mail sent via Super Awesome Free Mail")

Now, depending on the weather and the mood of our boss, we can configure our pipeline as required, either by multiple andThen calls, or, having the same effect, by using the chain method defined on the Function companion object:

val pipeline = Function.chain(Seq(
  addMissingSubject,
  checkSpelling,
  removeInappropriateLanguage,
  addAdvertismentToFooter))

Higher-order functions and partial functions

I won’t go into detail here, but now that you know more about how you can compose or reuse functions by means of higher-order functions, you might want to have a look at partial functions again.

Chaining partial functions

In the article on pattern matching anonymous functions, I mentioned that partial functions can be used to create a nice alternative to the chain of responsibility pattern: The orElse method defined on the PartialFunction trait allows you to chain an arbitrary number of partial functions, creating a composite partial function. The first one, however, will only pass on to the next one if it isn’t defined for the given input. Hence, you can do something like this:

val handler = fooHandler orElse barHandler orElse bazHandler

Lifting partial functions

Also, sometimes a PartialFunction is not what you need. If you think about it, another way to represent the fact that a function is not defined for all input values is to have a standard function whose return type is an Option[A] – if the function is not defined for an input value, it will return None, otherwise a Some[A].

If that’s what you need in a certain context, given a PartialFunction named pf, you can call pf.lift to get the normal function returning an Option. If you have one of the latter and require a partial function, call Function.unlift(f).

Summary

In this article, we have seen the value of higher-order functions, which allow you to reuse existing functions in new, unforeseen contexts and compose them in a very flexible way. While in the examples, you didn’t save much in terms of lines of code, because the functions I showed were rather tiny, the real point is to illustrate the increase in flexibility. Also, composing and reusing functions is something that has benefits not only for small functions, but also on an architectural level.

In the next article, we will continue to examine ways to combine functions by means of partial function application and currying.

The Neophyte’s Guide to Scala Part 9: Promises and Futures in Practice

Jan 16th, 2013

In the previous article in this series, I introduced you to the Future type, its underlying paradigm, and how to put it to use to write highly readable and composable asynchronously executing code.

In that article, I also mentioned that Future is really only one piece of the puzzle: It’s a read-only type that allows you to work with the values it will compute and handle failure to do so in an elegant way. In order for you to be able to read a computed value from a Future, however, there must be a way for some other part of your software to put that value there. In this post, I will show you how this is done by means of the Promise type, followed by some guidance on how to use futures and promises in practice.

Promises

In the previous article on futures, we had a sequential block of code that we passed to the apply method of the Future companion object, and, given an ExecutionContext was in scope, it magically executed that code block asynchronously, returning its result as a Future.

While this is an easy way to get a Future when you want one, there is an alternative way to create Future instances and have them complete with a success or failure. Where Future provides an interface exclusively for querying, Promise is a companion type that allows you to complete a Future by putting a value into it. This can be done exactly once. Once a Promise has been completed, it’s not possible to change it any more.

A Promise instance is always linked to exactly one instance of Future. If you call the apply method of Future again in the REPL, you will indeed notice that the Future returned is a Promise, too:

import concurrent.Future
import concurrent.ExecutionContext.Implicits.global
val f: Future[String] = Future { "Hello world!" }
// REPL output: 
// f: scala.concurrent.Future[String] = scala.concurrent.impl.Promise$DefaultPromise@793e6657

The object you get back is a DefaultPromise, which implements both Future and Promise. This is an implementation detail, however. The Future and the Promise to which it belongs may very well be separate objects.

What this little example shows is that there is obviously no way to complete a Future other than through a Promise – the apply method on Future is just a nice helper function that shields you from this.

Now, let’s see how we can get our hands dirty and work with the Promise type directly.

Promising a rosy future

When talking about promises that may be fulfilled or not, an obvious example domain is that of politicians, elections, campaign pledges, and legislative periods.

Suppose the politicians that then got elected into office promised their voters a tax cut. This can be represented as a Promise[TaxCut], which you can create by calling the apply method on the Promise companion object, like so:

import concurrent.Promise
case class TaxCut(reduction: Int)
// either give the type as a type parameter to the factory method:
val taxcut = Promise[TaxCut]()
// or give the compiler a hint by specifying the type of your val:
val taxcut2: Promise[TaxCut] = Promise()

Once you have created a Promise, you can get the Future belonging to it by calling the future method on the Promise instance:

val taxcutF: Future[TaxCut] = taxcut.future

The returned Future might not be the same object as the Promise, but calling the future method of a Promise multiple times will definitely always return the same object to make sure the one-to-one relationship between a Promise and its Future is preserved.

Completing a Promise

Once you have made a Promise and told the world that you will deliver on it in the forseeable Future, you better do your very best to make it happen.

In Scala, you can complete a Promise either with a success or a failure.

Delivering on your Promise

To complete a Promise with a success, you call its success method, passing it the value that the Future associated with it is supposed to have:

taxcut.success(TaxCut(20))

Once you have done this, that Promise instance is no longer writable, and future attempts to do so will lead to an exception.

Also, completing your Promise like this leads to the successful completion of the associated Future. Any success or completion handlers on that future will now be called, or if, for instance, you are mapping that future, the mapping function will now be executed.

Usually, the completion of the Promise and the processing of the completed Future will not happen in the same thread. It’s more likely that you create your Promise, start computing its value in another thread and immediately return the uncompleted Future to the caller.

To illustrate, let’s do something like that for our taxcut promise:

object Government {
  def redeemCampaignPledge(): Future[TaxCut] = {
    val p = Promise[TaxCut]()
    Future {
      println("Starting the new legislative period.")
      Thread.sleep(2000)
      p.success(TaxCut(20))
      println("We reduced the taxes! You must reelect us!!!!1111")
    }
    p.future
  }
}

Please don’t get confused by the usage of the apply method of the Future companion object in this example. I’m just using it because it is so convenient for executing a block of code asynchronously. I could just as well have implemented the computation of the result (which involves a lot of sleeping) in a Runnable that is executed asynchrously by an ExecutorService, with a lot more boilerplate code. The point is that the Promise is not completed in the caller thread.

Let’s redeem our campaign pledge then and add an onComplete callback function to our future:

import scala.util.{Success, Failure}
val taxCutF: Future[TaxCut] = Government.redeemCampaignPledge()
  println("Now that they're elected, let's see if they remember their promises...")
  taxCutF.onComplete {
    case Success(TaxCut(reduction)) =>
      println(s"A miracle! They really cut our taxes by $reduction percentage points!")
    case Failure(ex) =>
      println(s"They broke their promises! Again! Because of a ${ex.getMessage}")
  }

If you try this out multiple times, you will see that the order of the console output is not deterministic. Eventually, the completion handler will be executed and run into the success case.

Breaking Promises like a sir

As a politician, you are pretty much used to not keeping your promises. As a Scala developer, you sometimes have no other choice, either. If that happens, you can still complete your Promise instance gracefully, by calling its failure method and passing it an exception:

case class LameExcuse(msg: String) extends Exception(msg)
object Government {
  def redeemCampaignPledge(): Future[TaxCut] = {
       val p = Promise[TaxCut]()
       Future {
         println("Starting the new legislative period.")
         Thread.sleep(2000)
         p.failure(LameExcuse("global economy crisis"))
         println("We didn't fulfill our promises, but surely they'll understand.")
       }
       p.future
     }
}

This implementation of the redeemCampaignPledge() method will to lots of broken promises. Once you have completed a Promise with the failure method, it is no longer writable, just as is the case with the success method. The associated Future will now be completed with a Failure, too, so the callback function above would run into the failure case.

If you already have a Try, you can also complete a Promise by calling its complete method. If the Try is a Success, the associated Future will be completed successfully, with the value inside the Success. If it’s a Failure, the Future will completed with that failure.

Future-based programming in practice

If you want to make use of the future-based paradigm in order to increase the scalability of your application, you have to design your application to be non-blocking from the ground-up, which basically means that the functions in all your application layers are asynchronous and return futures.

A likely use case these days is that of developing a web application. If you are using a modern Scala web framework, it will allow you to return your responses as something like a Future[Response] instead of blocking and then returning your finished Response. This is important since it allows your web server to handle a huge number of open connections with a relatively low number of threads. By always giving your web server Future[Response], you maximize the utilization of the web server’s dedicated thread pool.

In the end, a service in your application might make multiple calls to your database layer and/or some external web service, receiving multiple futures, and then compose them to return a new Future, all in a very readable for comprehension, such as the one you saw in the previous article. The web layer will turn such a Future into a Future[Response].

However, how do you implement this in practice? There are are three different cases you have to consider:

Non-blocking IO

Your application will most certainly involve a lot of IO. For example, your web application will have to talk to a database, and it might act as a client that is calling other web services.

If at all possible, make use of libraries that are based on Java’s non-blocking IO capabilities, either by using Java’s NIO API directly or through a library like Netty. Such libraries, too, can serve many connections with a reasonably-sized dedicated thread pool.

Developing such a library yourself is one of the few places where directly working with the Promise type makes a lot of sense.

Blocking IO

Sometimes, there is no NIO-based library available. For instance, most database drivers you’ll find in the Java world nowadays are using blocking IO. If you made a query to your database with such a driver in order to respond to a HTTP request, that call would be made on a web server thread. To avoid that, place all the code talking to the database inside a future block, like so:

// get back a Future[ResultSet] or something similar:
Future {
  queryDB(query)
}

So far, we always used the implicitly available global ExecutionContext to execute such future blocks. It’s probably a good idea to create a dedicated ExecutionContext that you will have in scope in your database layer.

You can create an ExecutionContext from a Java ExecutorService, which means you will be able to tune the thread pool for executing your database calls asynchronously independently from the rest of your application:

import java.util.concurrent.Executors
import concurrent.ExecutionContext
val executorService = Executors.newFixedThreadPool(4)
val executionContext = ExecutionContext.fromExecutorService(executorService)

Long-running computations

Depending on the nature of your application, it will occasionally have to call long-running tasks that don’t involve any IO at all, which means they are CPU-bound. These, too, should not be executed by a web server thread. Hence, you should turn them into Futures, too:

Future {
  longRunningComputation(data, moreData)
}

Again, if you have long-running computations, having them run in a separate ExecutionContext for CPU-bound tasks is a good idea. How to tune your various thread pools is highly dependent on your individual application and beyond the scope of this article.

Summary

In this article, we looked at promises, the writable part of the future-based concurrency paradigm, and how to use them to complete a Future, followed by some advice on how to put futures to use in practice.

In the next part of this series, we are taking a step back from concurrency issues and examine how functional programming in Scala can help you to make your code more reusable, a claim that has long been associated with object-oriented programming.

The Neophyte’s Guide to Scala Part 8: Welcome to the Future

Jan 9th, 2013

As an aspiring and enthusiastic Scala developer, you will likely have heard of Scala’s approach at dealing with concurrency – or maybe that was even what attracted you in the first place. Said approach makes reasoning about concurrency and writing well-behaved concurrent programs a lot easier than the rather low-level concurrency APIs you are confronted with in most other languages.

One of the two cornerstones of this approach is the Future, the other being the Actor. The former shall be the subject of this article. I will explain what futures are good for and how you can make use of them in a functional way.

Please make sure that you have version 2.9.3 or later if you want to get your hands dirty and try out the examples yourself. The futures we are discussing here were only incorporated into the Scala core distribution with the 2.10.0 release and later backported to Scala 2.9.3. Originally, with a slightly different API, they were part of the Akka concurrency toolkit.

Why sequential code can be bad

Suppose you want to prepare a cappuccino. You could simply execute the following steps, one after another:

Grind the required coffee beans
Heat some water
Brew an espresso using the ground coffee and the heated water
Froth some milk
Combine the espresso and the frothed milk to a cappuccino

Translated to Scala code, you would do something like this:

import scala.util.Try
// Some type aliases, just for getting more meaningful method signatures:
type CoffeeBeans = String
type GroundCoffee = String
case class Water(temperature: Int)
type Milk = String
type FrothedMilk = String
type Espresso = String
type Cappuccino = String
// dummy implementations of the individual steps:
def grind(beans: CoffeeBeans): GroundCoffee = s"ground coffee of $beans"
def heatWater(water: Water): Water = water.copy(temperature = 85)
def frothMilk(milk: Milk): FrothedMilk = s"frothed $milk"
def brew(coffee: GroundCoffee, heatedWater: Water): Espresso = "espresso"
def combine(espresso: Espresso, frothedMilk: FrothedMilk): Cappuccino = "cappuccino"
// some exceptions for things that might go wrong in the individual steps
// (we'll need some of them later, use the others when experimenting
// with the code):
case class GrindingException(msg: String) extends Exception(msg)
case class FrothingException(msg: String) extends Exception(msg)
case class WaterBoilingException(msg: String) extends Exception(msg)
case class BrewingException(msg: String) extends Exception(msg)
// going through these steps sequentially:
def prepareCappuccino(): Try[Cappuccino] = for {
  ground <- Try(grind("arabica beans"))
  water <- Try(heatWater(Water(25)))
  espresso <- Try(brew(ground, water))
  foam <- Try(frothMilk("milk"))
} yield combine(espresso, foam)

Doing it like this has several advantages: You get a very readable step-by-step instruction of what to do. Moreover, you will likely not get confused while preparing the cappuccino this way, since you are avoiding context switches.

On the downside, preparing your cappuccino in such a step-by-step manner means that your brain and body are on wait during large parts of the whole process. While waiting for the ground coffee, you are effectively blocked. Only when that’s finished, you’re able to start heating some water, and so on.

This is clearly a waste of valuable resources. It’s very likely that you would want to initiate multiple steps and have them execute concurrently. Once you see that the water and the ground coffee is ready, you’d start brewing the espresso, in the meantime already starting the process of frothing the milk.

It’s really no different when writing a piece of software. A web server only has so many threads for processing requests and creating appropriate responses. You don’t want to block these valuable threads by waiting for the results of a database query or a call to another HTTP service. Instead, you want an asynchronous programming model and non-blocking IO, so that, while the processing of one request is waiting for the response from a database, the web server thread handling that request can serve the needs of some other request instead of idling along.

“I heard you like callbacks, so I put a callback in your callback!”

Of course, you already knew all that - what with Node.js being all the rage among the cool kids for a while now. The approach used by Node.js and some others is to communicate via callbacks, exclusively. Unfortunately, this can very easily lead to a convoluted mess of callbacks within callbacks within callbacks, making your code hard to read and debug.

Scala’s Future allows callbacks, too, as you will see very shortly, but it provides much better alternatives, so it’s likely you won’t need them a lot.

“I know Futures, and they are completely useless!”

You might also be familiar with other Future implementations, most notably the one provided by Java. There is not really much you can do with a Java future other than checking if it’s completed or simply blocking until it is completed. In short, they are nearly useless and definitely not a joy to work with.

If you think that Scala’s futures are anything like that, get ready for a surprise. Here we go!

Semantics of Future

Scala’s Future[T], residing in the scala.concurrent package, is a container type, representing a computation that is supposed to eventually result in a value of type T. Alas, the computation might go wrong or time out, so when the future is completed, it may not have been successful after all, in which case it contains an exception instead.

Future is a write-once container – after a future has been completed, it is effectively immutable. Also, the Future type only provides an interface for reading the value to be computed. The task of writing the computed value is achieved via a Promise. Hence, there is a clear separation of concerns in the API design. In this post, we are focussing on the former, postponing the use of the Promise type to the next article in this series.

Working with Futures

There are several ways you can work with Scala futures, which we are going to examine by rewriting our cappuccino example to make use of the Future type. First, we need to rewrite all of the functions that can be executed concurrently so that they immediately return a Future instead of computing their result in a blocking way:

import scala.concurrent.future
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.util.Random

def grind(beans: CoffeeBeans): Future[GroundCoffee] = Future {
  println("start grinding...")
  Thread.sleep(Random.nextInt(2000))
  if (beans == "baked beans") throw GrindingException("are you joking?")
  println("finished grinding...")
  s"ground coffee of $beans"
}

def heatWater(water: Water): Future[Water] = Future {
  println("heating the water now")
  Thread.sleep(Random.nextInt(2000))
  println("hot, it's hot!")
  water.copy(temperature = 85)
}

def frothMilk(milk: Milk): Future[FrothedMilk] = Future {
  println("milk frothing system engaged!")
  Thread.sleep(Random.nextInt(2000))
  println("shutting down milk frothing system")
  s"frothed $milk"
}

def brew(coffee: GroundCoffee, heatedWater: Water): Future[Espresso] = Future {
  println("happy brewing :)")
  Thread.sleep(Random.nextInt(2000))
  println("it's brewed!")
  "espresso"
}

There are several things that require an explanation here.

First off, there is the apply method on the Future companion object, that requires two arguments:

object Future {
  def apply[T](body: => T)(implicit execctx: ExecutionContext): Future[T]
}

The computation to be computed asynchronously is passed in as the body by-name parameter. The second argument, in its own argument list, is an implicit one, which means we don’t have to specify one if a matching implicit value is defined somewhere in scope. We make sure this is the case by importing the global execution context.

An ExecutionContext is something that can execute our future, and you can think of it as something like a thread pool. Since the ExecutionContext is available implicitly, we only have a single one-element argument list remaining. Single-argument lists can be enclosed with curly braces instead of parentheses. People often make use of this when calling the future method, making it look a little bit like we are using a feature of the language and not calling an ordinary method. The ExecutionContext is an implicit parameter for virtually all of the Future API.

Furthermore, of course, in this simple example, we don’t actually compute anything, which is why we are putting in some random sleep, simply for demonstration purposes. We also print to the console before and after our “computation” to make the non-deterministic and concurrent nature of our code clearer when trying it out.

The computation of the value to be returned by a Future will start at some non-deterministic time after that Future instance has been created, by some thread assigned to it by the ExecutionContext.

Callbacks

Sometimes, when things are simple, using a callback can be perfectly fine. Callbacks for futures are partial functions. You can pass a callback to the onSuccess method. It will only be called if the Future completes successfully, and if so, it receives the computed value as its input:

grind("arabica beans").onSuccess { case ground =>
  println("okay, got my ground coffee")
}

Similarly, you could register a failure callback with the onFailure method. Your callback will receive a Throwable, but it will only be called if the Future did not complete successfully.

Usually, it’s better to combine these two and register a completion callback that will handle both cases. The input parameter for that callback is a Try:

import scala.util.{Success, Failure}
grind("baked beans").onComplete {
  case Success(ground) => println(s"got my $ground")
  case Failure(ex) => println("This grinder needs a replacement, seriously!")
}

Since we are passing in baked beans, an exception occurs in the grind method, leading to the Future completing with a Failure.

Composing futures

Using callbacks can be quite painful if you have to start nesting callbacks. Thankfully, you don’t have to do that! The real power of the Scala futures is that they are composable.

If you have followed this series, you will have noticed that all the container types we discussed made it possible for you to map them, flat map them, or use them in for comprehensions and that I mentioned that Future is a container type, too. Hence, the fact that Scala’s Future type allows you to do all that will not come as a surprise at all.

The real question is: What does it really mean to perform these operations on something that hasn’t even finished computing yet?

Mapping the future

Haven’t you always wanted to be a traveller in time who sets out to map the future? As a Scala developer you can do exactly that! Suppose that once your water has heated you want to check if its temperature is okay. You can do so by mapping your Future[Water] to a Future[Boolean]:

val temperatureOkay: Future[Boolean] = heatWater(Water(25)).map { water =>
  println("we're in the future!")
  (80 to 85).contains(water.temperature)
}

The Future[Boolean] assigned to temperatureOkay will eventually contain the successfully computed boolean value. Go change the implementation of heatWater so that it throws an exception (maybe because your water heater explodes or something) and watch how we're in the future will never be printed to the console.

When you are writing the function you pass to map, you’re in the future, or rather in a possible future. That mapping function gets executed as soon as your Future[Water] instance has completed successfully. However, the timeline in which that happens might not be the one you live in. If your instance of Future[Water] fails, what’s taking place in the function you passed to map will never happen. Instead, the result of calling map will be a Future[Boolean] containing a Failure.

Keeping the future flat

If the computation of one Future depends on the result of another, you’ll likely want to resort to flatMap to avoid a deeply nested structure of futures.

For example, let’s assume that the process of actually measuring the temperature takes a while, so you want to determine whether the temperature is okay asynchronously, too. You have a function that takes an instance of Water and returns a Future[Boolean]:

def temperatureOkay(water: Water): Future[Boolean] = Future {
  (80 to 85).contains(water.temperature)
}

Use flatMap instead of map in order to get a Future[Boolean] instead of a Future[Future[Boolean]]:

val nestedFuture: Future[Future[Boolean]] = heatWater(Water(25)).map {
  water => temperatureOkay(water)
}
val flatFuture: Future[Boolean] = heatWater(Water(25)).flatMap {
  water => temperatureOkay(water)
}

Again, the mapping function is only executed after (and if) the Future[Water] instance has been completed successfully, hopefully with an acceptable temperature.

For comprehensions

Instead of calling flatMap, you’ll usually want to write a for comprehension, which is essentially the same, but reads a lot clearer. Our example above could be rewritten like this:

val acceptable: Future[Boolean] = for {
  heatedWater <- heatWater(Water(25))
  okay <- temperatureOkay(heatedWater)
} yield okay

If you have multiple computations that can be computed in parallel, you need to take care that you already create the corresponding Future instances outside of the for comprehension.

def prepareCappuccinoSequentially(): Future[Cappuccino] = {
  for {
    ground <- grind("arabica beans")
    water <- heatWater(Water(20))
    foam <- frothMilk("milk")
    espresso <- brew(ground, water)
  } yield combine(espresso, foam)
}

This reads nicely, but since a for comprehension is just another representation for nested flatMap calls, this means that the Future[Water] created in heatWater is only really instantiated after the Future[GroundCoffee] has completed successfully. You can check this by watching the sequential console output coming from the functions we implemented above.

Hence, make sure to instantiate all your independent futures before the for comprehension:

def prepareCappuccino(): Future[Cappuccino] = {
  val groundCoffee = grind("arabica beans")
  val heatedWater = heatWater(Water(20))
  val frothedMilk = frothMilk("milk")
  for {
    ground <- groundCoffee
    water <- heatedWater
    foam <- frothedMilk
    espresso <- brew(ground, water)
  } yield combine(espresso, foam)
}

Now, the three futures we create before the for comprehension start being completed immediately and execute concurrently. If you watch the console output, you will see that it’s non-deterministic. The only thing that’s certain is that the "happy brewing" output will come last. Since the method in which it is called requires the values coming from two other futures, it is only created inside our for comprehension, i.e. after those futures have completed successfully.

Failure projections

You will have noticed that Future[T] is success-biased, allowing you to use map, flatMap, filter etc. under the assumption that it will complete successfully. Sometimes, you may want to be able to work in this nice functional way for the timeline in which things go wrong. By calling the failed method on an instance of Future[T], you get a failure projection of it, which is a Future[Throwable]. Now you can map that Future[Throwable], for example, and your mapping function will only be executed if the original Future[T] has completed with a failure.

Outlook

You have seen the Future, and it looks bright! The fact that it’s just another container type that can be composed and used in a functional way makes working with it very pleasant.

Making blocking code concurrent can be pretty easy by wrapping it in a call to future. However, it’s better to be non-blocking in the first place. To achieve this, one has to make a Promise to complete a Future. This and using the futures in practice will be the topic of the next part of this series.

The Neophyte’s Guide to Scala Part 7: The Either Type

Jan 2nd, 2013

In the previous article, I discussed functional error handling using Try, which was introduced in Scala 2.10. I also mentioned the existence of another, somewhat similar type called Either, which is the subject of this article. You will learn how to use it, when to use it, and what its particular pitfalls are.

Speaking of which, at least at the time of this writing, Either has some serious design flaws you need to be aware of, so much so that one might argue about whether to use it at all. So why then should you learn about Either at all?

For one, people will not all migrate their existing code bases to use Try for dealing with exceptions, so it is good to be able to understand the intricacies of this type, too.

Moreover, Try is not really an all-out replacement for Either, only for one particular usage of it, namely handling exceptions in a functional way. As it stands, Try and Either really complement each other, each covering different use cases. And, as flawed as Either may be, in certain situations it will still be a very good fit.

The semantics

Like Option and Try, Either is a container type. Unlike the aforementioned types, it takes not only one, but two type parameters: An Either[A, B] instance can contain either an instance of A, or an instance of B. This is different from a Tuple2[A, B], which contains both an A and a B instance.

Either has exactly two sub types, Left and Right. If an Either[A, B] object contains an instance of A, then the Either is a Left. Otherwise it contains an instance of B and is a Right.

There is nothing in the semantics of this type that specifies one or the other sub type to represent an error or a success, respectively. In fact, Either is a general-purpose type for use whenever you need to deal with situations where the result can be of one of two possible types. Nevertheless, error handling is a popular use case for it, and by convention, when using it that way, the Left represents the error case, whereas the Right contains the success value.

Creating an Either

Creating an instance of Either is trivial. Both Left and Right are case classes, so if we want to implement a rock-solid internet censorship feature, we can just do the following:

import scala.io.Source
import java.net.URL
def getContent(url: URL): Either[String, Source] =
  if (url.getHost.contains("google"))
    Left("Requested URL is blocked for the good of the people!")
  else
    Right(Source.fromURL(url))

Now, if we call getContent(new URL("http://danielwestheide.com")), we will get a scala.io.Source wrapped in a Right. If we pass in new URL("https://plus.google.com"), the return value will be a Left containing a String.

Working with Either values

Some of the very basic stuff works just as you know from Option or Try: You can ask an instance of Either if it isLeft or isRight. You can also do pattern matching on it, which is one of the most familiar and convenient ways of working with objects of this type:

getContent(new URL("http://google.com")) match {
  case Left(msg) => println(msg)
  case Right(source) => source.getLines.foreach(println)
}

Projections

You cannot, at least not directly, use an Either instance like a collection, the way you are familiar with from Option and Try. This is because Either is designed to be unbiased.

Try is success-biased: it offers you map, flatMap and other methods that all work under the assumption that the Try is a Success, and if that’s not the case, they effectively don’t do anything, returning the Failure as-is.

The fact that Either is unbiased means that you first have to choose whether you want to work under the assumption that it is a Left or a Right. By calling left or right on an Either value, you get a LeftProjection or RightProjection, respectively, which are basically left- or right-biased wrappers for the Either.

Mapping

Once you have a projection, you can call map on it:

val content: Either[String, Iterator[String]] =
  getContent(new URL("http://danielwestheide.com")).right.map(_.getLines())
// content is a Right containing the lines from the Source returned by getContent
val moreContent: Either[String, Iterator[String]] =
  getContent(new URL("http://google.com")).right.map(_.getLines)
// moreContent is a Left, as already returned by getContent

Regardless of whether the Either[String, Source] in this example is a Left or a Right, it will be mapped to an Either[String, Iterator[String]]. If it’s called on a Right, the value inside it will be transformed. If it’s a Left, that will be returned unchanged.

We can do the same with a LeftProjection, of course:

val content: Either[Iterator[String], Source] =
  getContent(new URL("http://danielwestheide.com")).left.map(Iterator(_))
// content is the Right containing a Source, as already returned by getContent
val moreContent: Either[Iterator[String], Source] =
  getContent(new URL("http://google.com")).left.map(Iterator(_))
// moreContent is a Left containing the msg returned by getContent in an Iterator

Now, if the Either is a Left, its wrapped value is transformed, whereas a Right would be returned unchanged. Either way, the result is of type Either[Iterator[String], Source].

Please note that the map method is defined on the projection types, not on Either, but it does return a value of type Either, not a projection. In this, Either deviates from the other container types you know. The reason for this has to do with Either being unbiased, but as you will see, this can lead to some very unpleasant problems in certain cases. It also means that if you want to chain multiple calls to map, flatMap and the like, you always have to ask for your desired projection again in between.

Flat mapping

Projections also support flat mapping, avoiding the common problem of creating a convoluted structure of multiple inner and outer Either types that you will end up with if you nest multiple calls to map.

I’m putting very high requirements on your suspension of disbelief now, coming up with a completely contrived example. Let’s say we want to calculate the average number of lines of two of my articles. You’ve always wanted to do that, right? Here’s how we could solve this challenging problem:

val part5 = new URL("http://t.co/UR1aalX4")
val part6 = new URL("http://t.co/6wlKwTmu")
val content = getContent(part5).right.map(a =>
  getContent(part6).right.map(b =>
    (a.getLines().size + b.getLines().size) / 2))

What we’ll end up with is an Either[String, Either[String, Int]]. Now, content being a nested structure of Rights, we could flatten it by calling the joinRight method on it (you also have joinLeft available to flatten a nested structure of Lefts).

However, we can avoid creating this nested structure altogether. If we flatMap on our outer RightProjection, we get a more pleasant result type, unpacking the Right of the inner Either:

val content = getContent(part5).right.flatMap(a =>
  getContent(part6).right.map(b =>
    (a.getLines().size + b.getLines().size) / 2))

Now content is a flat Either[String, Int], which makes it a lot nicer to work with, for example using pattern matching.

For comprehensions

By now, you have probably learned to love working with for comprehensions in a consistent way on various different data types. You can do that, too, with projections of Either, but the sad truth is that it’s not quite as nice, and there are things that you won’t be able to do without resorting to ugly workarounds, out of the box.

Let’s rewrite our flatMap example, making use of for comprehensions instead:

def averageLineCount(url1: URL, url2: URL): Either[String, Int] =
  for {
    source1 <- getContent(url1).right
    source2 <- getContent(url2).right
  } yield (source1.getLines().size + source2.getLines().size) / 2

This is not too bad. Note that we have to call right on each Either we use in our generators, of course.

Now, let’s try to refactor this for comprehension – since the yield expression is a little too involved, we want to extract some parts of it into value definitions inside our for comprehension:

def averageLineCountWontCompile(url1: URL, url2: URL): Either[String, Int] =
  for {
    source1 <- getContent(url1).right
    source2 <- getContent(url2).right
    lines1 = source1.getLines().size
    lines2 = source2.getLines().size
  } yield (lines1 + lines2) / 2

This won’t compile! The reason will become clearer if we examine what this for comprehension corresponds to, if you take away the sugar. It translates to something that is similar to the following, albeit much less readable:

def averageLineCountDesugaredWontCompile(url1: URL, url2: URL): Either[String, Int] =
  getContent(url1).right.flatMap { source1 =>
    getContent(url2).right.map { source2 =>
      val lines1 = source1.getLines().size
      val lines2 = source2.getLines().size
      (lines1, lines2)
    }.map { case (x, y) => x + y / 2 }
  }

The problem is that by including a value definition in our for comprehension, a new call to map is introduced automatically – on the result of the previous call to map, which has returned an Either, not a RightProjection. As you know, Either doesn’t define a map method, making the compiler a little bit grumpy.

This is where Either shows us its ugly trollface. In this example, the value definitions are not strictly necessary. If they are, you can work around this problem, replacing any value definitions by generators, like this:

def averageLineCount(url1: URL, url2: URL): Either[String, Int] =
  for {
    source1 <- getContent(url1).right
    source2 <- getContent(url2).right
    lines1 <- Right(source1.getLines().size).right
    lines2 <- Right(source2.getLines().size).right
  } yield (lines1 + lines2) / 2

It’s important to be aware of these design flaws. They don’t make Either unusable, but can lead to serious headaches if you don’t have a clue what’s going on.

Other methods

The projection types have some other useful methods:

You can convert your Either instance to an Option by calling toOption on one of its projections. For example, if you have an e of type Either[A, B], e.right.toOption will return an Option[B]. If your Either[A, B] instance is a Right, that Option[B] will be a Some. If it’s a Left, it will be None. The reverse behaviour can be achieved, of course, when calling toOption on the LeftProjection of your Either[A, B]. If you need a sequence of either one value or none, use toSeq instead.

Folding

If you want to transform an Either value regardless of whether it is a Left or a Right, you can do so by means of the fold method that is defined on Either, expecting two transform functions with the same result type, the first one being called if the Either is a Left, the second one if it’s a Right.

To demonstrate this, let’s combine the two mapping operations we implemented on the LeftProjection and the RightProjection above:

val content: Iterator[String] =
  getContent(new URL("http://danielwestheide.com")).fold(Iterator(_), _.getLines())
val moreContent: Iterator[String] =
  getContent(new URL("http://google.com")).fold(Iterator(_), _.getLines())

In this example, we are transforming our Either[String, Source] into an Iterator[String], no matter if it’s a Left or a Right. You could just as well return a new Either again or execute side-effects and return Unit from your two functions. As such, calling fold provides a nice alternative to pattern matching.

When to use Either

Now that you have seen how to work with Either values and what you have to take care of, let’s move on to some specific use cases.

Error handling

You can use Either for exception handling very much like Try. Either has one advantage over Try: you can have more specific error types at compile time, while Try uses Throwable all the time. This means that Either can be a good choice for expected errors.

You’d have to implement a method like this, delegating to the very useful Exception object from the scala.util.control package:

import scala.util.control.Exception.catching
def handling[Ex <: Throwable, T](exType: Class[Ex])(block: => T): Either[Ex, T] =
  catching(exType).either(block).asInstanceOf[Either[Ex, T]]

The reason you might want to do that is because while the methods provided by scala.util.Exception allow you to catch only certain types of exceptions, the resulting compile-time error type is always Throwable.

With this method at hand, you can pass along expected exceptions in an Either:

import java.net.MalformedURLException
def parseURL(url: String): Either[MalformedURLException, URL] =
  handling(classOf[MalformedURLException])(new URL(url))

You will have other expected error conditions, and not all of them result in third-party code throwing an exception you need to handle, as in the example above. In these cases, there is really no need to throw an exception yourself, only to catch it and wrap it in a Left. Instead, simply define your own error type, preferably as a case class, and return a Left wrapping an instance of that error type in your expected error condition occurs.

Here is an example:

case class Customer(age: Int)
class Cigarettes
case class UnderAgeFailure(age: Int, required: Int)
def buyCigarettes(customer: Customer): Either[UnderAgeFailure, Cigarettes] =
  if (customer.age < 16) Left(UnderAgeFailure(customer.age, 16))
  else Right(new Cigarettes)

You should avoid using Either for wrapping unexpected exceptions. Try does that better, without all the flaws you have to deal with when working with Either.

Processing collections

Generally, Either is a pretty good fit if you want to process a collection, where for some items in that collection, this might result in a condition that is problematic, but should not directly result in an exception, which would result in aborting the processing of the rest of the collection.

Let’s assume that for our industry-standard web censorship system, we are using some kind of black list:

type Citizen = String
case class BlackListedResource(url: URL, visitors: Set[Citizen])

val blacklist = List(
  BlackListedResource(new URL("https://google.com"), Set("John Doe", "Johanna Doe")),
  BlackListedResource(new URL("http://yahoo.com"), Set.empty),
  BlackListedResource(new URL("https://maps.google.com"), Set("John Doe")),
  BlackListedResource(new URL("http://plus.google.com"), Set.empty)
)

A BlackListedResource represents the URL of a black-listed web page plus the citizens who have tried to visit that page.

Now we want to process this black list, where our main purpose is to identify problematic citizens, i.e. those that have tried to visit blocked pages. At the same time, we want to identify suspicous web pages – if not a single citizen has tried to visit a black-listed page, we must assume that our subjects are bypassing our filter somehow, and we need to investigate that.

Here is how we can process our black list:

val checkedBlacklist: List[Either[URL, Set[Citizen]]] =
  blacklist.map(resource =>
    if (resource.visitors.isEmpty) Left(resource.url)
    else Right(resource.visitors))

We have created a sequence of Either values, with the Left instances representing suspicious URLs and the Right ones containing sets of problem citizens. This makes it almost a breeze to identify both our problem citizens and our suspicious web pages:

val suspiciousResources = checkedBlacklist.flatMap(_.left.toOption)
val problemCitizens = checkedBlacklist.flatMap(_.right.toOption).flatten.toSet

These more general use cases beyond exception handling are where Either really shines.

Summary

You have learned how to make use of Either, what its pitfalls are, and when to put it to use in your code. It’s a type that is not without flaws, and whether you want to have to deal with them and incorporate it in your own code is ultimately up to you.

In practice, you will notice that, now that we have Try at our hands, there won’t be terribly many use cases for it. Nevertheless, it’s good to know about it, both for those situations where it will be your perfect tool and to understand pre-2.10 Scala code you come across where it’s used for error handling.

The Neophyte’s Guide to Scala Part 6: Error Handling With Try

Dec 26th, 2012

When just playing around with a new language, you might get away with simply ignoring the fact that something might go wrong. As soon you want to create anything serious, though, you can no longer run away from handling errors and exceptions in your code. The importance of how well a language supports you in doing so is often underestimated, for some reason or another.

Scala, as it turns out, is pretty well positioned when it comes to dealing with error conditions in an elegant way. In this article, I’m going to present Scala’s approach to dealing with errors, based on the Try type, and the rationale behind it. I’m using features introduced with Scala 2.10 and ported back to Scala 2.9.3, so make sure your Scala version in SBT is at 2.9.3 or later.

Throwing and catching exceptions

Before going straight to Scala’s idiomatic approach at error handling, let’s first have a look at an approach that is more akin to how you are used to working with error conditions if you come from languages like Java or Ruby. Like these languages, Scala allows you to throw an exception:

case class Customer(age: Int)
class Cigarettes
case class UnderAgeException(message: String) extends Exception(message)
def buyCigarettes(customer: Customer): Cigarettes =
  if (customer.age < 16)
    throw UnderAgeException(s"Customer must be older than 16 but was ${customer.age}")
  else new Cigarettes

Thrown exceptions can be caught and dealt with very similarly to Java, albeit using a partial function to specify the exceptions we want to deal with. Also, Scala’s try/catch is an expression, so the following code returns the message of the exception:

val youngCustomer = Customer(15)
try {
  buyCigarettes(youngCustomer)
  "Yo, here are your cancer sticks! Happy smokin'!"
} catch {
    case UnderAgeException(msg) => msg
}

Error handling, the functional way

Now, having this kind of exception handling code all over your code base can become ugly very quickly and doesn’t really go well with functional programming. It’s also a rather bad solution for applications with a lot of concurrency. For instance, if you need to deal with an exception thrown by an Actor that is executed on some other thread, you obviously cannot do that by catching that exception – you will want a possibility to receive a message denoting the error condition.

Hence, in Scala, it’s usually preferred to signify that an error has occurred by returning an appropriate value from your function.

Don’t worry, we are not going back to C-style error handling, using error codes that we need to check for by convention. Rather, in Scala, we are using a specific type that represents computations that may result in an exception.

In this article, we are confining ourselves to the Try type that was introduced in Scala 2.10 and later backported to Scala 2.9.3. There is also a similar type, called Either, which, even after the introduction of Try, can still be very useful, but is more general.

The semantics of Try

The semantics of Try are best explained by comparing them to those of the Option type that was the topic of the previous part of this series.

Where Option[A] is a container for a value of type A that may be present or not, Try[A] represents a computation that may result in a value of type A, if it is successful, or in some Throwable if something has gone wrong. Instances of such a container type for possible errors can easily be passed around between concurrently executing parts of your application.

There are two different types of Try: If an instance of Try[A] represents a successful computation, it is an instance of Success[A], simply wrapping a value of type A. If, on the other hand, it represents a computation in which an error has occurred, it is an instance of Failure[A], wrapping a Throwable, i.e. an exception or other kind of error.

If we know that a computation may result in an error, we can simply use Try[A] as the return type of our function. This makes the possibility explicit and forces clients of our function to deal with the possibility of an error in some way.

For example, let’s assume we want to write a simple web page fetcher. The user will be able to enter the URL of the web page they want to fetch. One part of our application will be a function that parses the entered URL and creates a java.net.URL from it:

import scala.util.Try
import java.net.URL
def parseURL(url: String): Try[URL] = Try(new URL(url))

As you can see, we return a value of type Try[URL]. If the given url is syntactically correct, this will be a Success[URL]. If the URL constructor throws a MalformedURLException, however, it will be a Failure[URL].

To achieve this, we are using the apply factory method on the Try companion object. This method expects a by-name parameter of type A (here, URL). For our example, this means that the new URL(url) is executed inside the apply method of the Try object. Inside that method, non-fatal exceptions are caught, returning a Failure containing the respective exception.

Hence, parseURL("http://danielwestheide.com") will result in a Success[URL] containing the created URL, whereas parseURL("garbage") will result in a Failure[URL] containing a MalformedURLException.

Working with Try values

Working with Try instances is actually very similar to working with Option values, so you won’t see many surprises here.

You can check if a Try is a success by calling isSuccess on it and then conditionally retrieve the wrapped value by calling get on it. But believe me, there aren’t many situations where you will want to do that.

It’s also possible to use getOrElse to pass in a default value to be returned if the Try is a Failure:

val url = parseURL(Console.readLine("URL: ")) getOrElse new URL("http://duckduckgo.com")

If the URL given by the user is malformed, we use the URL of DuckDuckGo as a fallback.

Chaining operations

One of the most important characteristics of the Try type is that, like Option, it supports all the higher-order methods you know from other types of collections. As you will see in the examples to follow, this allows you to chain operations on Try values and catch any exceptions that might occur, and all that in a very readable manner.

Mapping and flat mapping

Mapping a Try[A] that is a Success[A] to a Try[B] results in a Success[B]. If it’s a Failure[A], the resulting Try[B] will be a Failure[B], on the other hand, containing the same exception as the Failure[A]:

parseURL("http://danielwestheide.com").map(_.getProtocol)
// results in Success("http")
parseURL("garbage").map(_.getProtocol)
// results in Failure(java.net.MalformedURLException: no protocol: garbage)

If you chain multiple map operations, this will result in a nested Try structure, which is usually not what you want. Consider this method that returns an input stream for a given URL:

import java.io.InputStream
def inputStreamForURL(url: String): Try[Try[Try[InputStream]]] = parseURL(url).map { u =>
  Try(u.openConnection()).map(conn => Try(conn.getInputStream))
}

Since the anonymous functions passed to the two map calls each return a Try, the return type is a Try[Try[Try[InputStream]]].

This is where the fact that you can flatMap a Try comes in handy. The flatMap method on a Try[A] expects to be passed a function that receives an A and returns a Try[B]. If our Try[A] instance is already a Failure[A], that failure is returned as a Failure[B], simply passing along the wrapped exception along the chain. If our Try[A] is a Success[A], flatMap unpacks the A value in it and maps it to a Try[B] by passing this value to the mapping function.

This means that we can basically create a pipeline of operations that require the values carried over in Success instances by chaining an arbitrary number of flatMap calls. Any exceptions that happen along the way are wrapped in a Failure, which means that the end result of the chain of operations is a Failure, too.

Let’s rewrite the inputStreamForURL method from the previous example, this time resorting to flatMap:

def inputStreamForURL(url: String): Try[InputStream] = parseURL(url).flatMap { u =>
  Try(u.openConnection()).flatMap(conn => Try(conn.getInputStream))
}

Now we get a Try[InputStream], which can be a Failure wrapping an exception from any of the stages in which one may be thrown, or a Success that directly wraps the InputStream, the final result of our chain of operations.

Filter and foreach

Of course, you can also filter a Try or call foreach on it. Both work exactly as you would expect after having learned about Option.

The filter method returns a Failure if the Try on which it is called is already a Failure or if the predicate passed to it returns false (in which case the wrapped exception is a NoSuchElementException). If the Try on which it is called is a Success and the predicate returns true, that Succcess instance is returned unchanged:

def parseHttpURL(url: String) = parseURL(url).filter(_.getProtocol == "http")
parseHttpURL("http://apache.openmirror.de") // results in a Success[URL]
parseHttpURL("ftp://mirror.netcologne.de/apache.org") // results in a Failure[URL]

The function passed to foreach is executed only if the Try is a Success, which allows you to execute a side-effect. The function passed to foreach is executed exactly once in that case, being passed the value wrapped by the Success:

parseHttpURL("http://danielwestheide.com").foreach(println)

For comprehensions

The support for flatMap, map and filter means that you can also use for comprehensions in order to chain operations on Try instances. Usually, this results in more readable code. To demonstrate this, let’s implement a method that returns the content of a web page with a given URL using for comprehensions.

import scala.io.Source
def getURLContent(url: String): Try[Iterator[String]] =
  for {
    url <- parseURL(url)
    connection <- Try(url.openConnection())
    is <- Try(connection.getInputStream)
    source = Source.fromInputStream(is)
  } yield source.getLines()

There are three places where things can go wrong, all of them covered by usage of the Try type. First, the already implemented parseURL method returns a Try[URL]. Only if this is a Success[URL], we will try to open a connection and create a new input stream from it. If opening the connection and creating the input stream succeeds, we continue, finally yielding the lines of the web page. Since we effectively chain multiple flatMap calls in this for comprehension, the result type is a flat Try[Iterator[String]].

Please note that this could be simplified using Source#fromURL and that we fail to close our input stream at the end, both of which are due to my decision to keep the example focussed on getting across the subject matter at hand.

Pattern Matching

At some point in your code, you will often want to know whether a Try instance you have received as the result of some computation represents a success or not and execute different code branches depending on the result. Usually, this is where you will make use of pattern matching. This is easily possible because both Success and Failure are case classes.

We want to render the requested page if it could be retrieved, or print an error message if that was not possible:

import scala.util.Success
import scala.util.Failure
getURLContent("http://danielwestheide.com/foobar") match {
  case Success(lines) => lines.foreach(println)
  case Failure(ex) => println(s"Problem rendering URL content: ${ex.getMessage}")
}

Recovering from a Failure

If you want to establish some kind of default behaviour in the case of a Failure, you don’t have to use getOrElse. An alternative is recover, which expects a partial function and returns another Try. If recover is called on a Success instance, that instance is returned as is. Otherwise, if the partial function is defined for the given Failure instance, its result is returned as a Success.

Let’s put this to use in order to print a different message depending on the type of the wrapped exception:

import java.net.MalformedURLException
import java.io.FileNotFoundException
val content = getURLContent("garbage") recover {
  case e: FileNotFoundException => Iterator("Requested page does not exist")
  case e: MalformedURLException => Iterator("Please make sure to enter a valid URL")
  case _ => Iterator("An unexpected error has occurred. We are so sorry!")
}

We could now safely get the wrapped value on the Try[Iterator[String]] that we assigned to content, because we know that it must be a Success. Calling content.get.foreach(println) would result in Please make sure to enter a valid URL being printed to the console.

Conclusion

Idiomatic error handling in Scala is quite different from the paradigm known from languages like Java or Ruby. The Try type allows you to encapsulate computations that result in errors in a container and to chain operations on the computed values in a very elegant way. You can transfer what you know from working with collections and with Option values to how you deal with code that may result in errors – all in a uniform way.

To keep this article at a reasonable length, I haven’t explained all of the methods available on Try. Like Option, Try supports the orElse method. The transform and recoverWith methods are also worth having a look at, and I encourage you to do so.

In the next part we are going to deal with Either, an alternative type for representing computations that may result in errors, but with a wider scope of application that goes beyond error handling.

The Neophyte’s Guide to Scala Part 5: The Option Type

Dec 19th, 2012

For the last couple of weeks, we have pressed ahead and covered a lot of ground concerning some rather advanced techniques, particularly ones related to pattern matching and extractors. Time to shift down a gear and look at one of the more fundamental idiosyncrasies of Scala: the Option type.

If you have participated in the Scala course at Coursera, you have already received a brief introduction to this type and seen it in use in the Map API. In this series, we have also used it when implementing our own extractors.

And yet, there is still a lot left to be explained about it. You may have wondered what all the fuss is about, what is so much better about options than other ways of dealing with absent values. You might also be at a loss how to actually work with the Option type in your own code. The goal of this part of the series is to do away with all these question marks and teach you all you really need to know about Option as an aspiring Scala novice.

The basic idea

If you have worked with Java at all in the past, it is very likely that you have come across a NullPointerException at some time (other languages will throw similarly named errors in such a case). Usually this happens because some method returns null when you were not expecting it and thus not dealing with that possibility in your client code. A value of null is often abused to represent an absent optional value.

Some languages treat null values in a special way or allow you to work safely with values that might be null. For instance, Groovy has the null-safe operator for accessing properties, so that foo?.bar?.baz will not throw an exception if either foo or its bar property is null, instead directly returning null. However, you are screwed if you forget to use this operator, and nothing forces you to do so.

Clojure basically treats its nil value like an empty thing, i.e. like an empty list if accessed like a list, or like an empty map if accessed like a map. This means that the nil value is bubbling up the call hierarchy. Very often this is okay, but sometimes this just leads to an exception much higher in the call hierchary, where some piece of code isn’t that nil-friendly after all.

Scala tries to solve the problem by getting rid of null values altogether and providing its own type for representing optional values, i.e. values that may be present or not: the Option[A] trait.

Option[A] is a container for an optional value of type A. If the value of type A is present, the Option[A] is an instance of Some[A], containing the present value of type A. If the value is absent, the Option[A] is the object None.

By stating that a value may or may not be present on the type level, you and any other developers who work with your code are forced by the compiler to deal with this possibility. There is no way you may accidentally rely on the presence of a value that is really optional.

Option is mandatory! Do not use null to denote that an optional value is absent.

Creating an option

Usually, you can simply create an Option[A] for a present value by directly instantiating the Some case class:

val greeting: Option[String] = Some("Hello world")

Or, if you know that the value is absent, you simply assign or return the None object:

val greeting: Option[String] = None

However, time and again you will need to interoperate with Java libraries or code in other JVM languages that happily make use of null to denote absent values. For this reason, the Option companion object provides a factory method that creates None if the given parameter is null, otherwise the parameter wrapped in a Some:

val absentGreeting: Option[String] = Option(null) // absentGreeting will be None
val presentGreeting: Option[String] = Option("Hello!") // presentGreeting will be Some("Hello!")

Working with optional values

This is all pretty neat, but how do you actually work with optional values? It’s time for an example. Let’s do something boring, so we can focus on the important stuff.

Imagine you are working for one of those hipsterrific startups, and one of the first things you need to implement is a repository of users. We need to be able to find a user by their unique id. Sometimes, requests come in with bogus ids. This calls for a return type of Option[User] for our finder method. A dummy implementation of our user repository might look like this:

case class User(
  id: Int,
  firstName: String,
  lastName: String,
  age: Int,
  gender: Option[String])

object UserRepository {
  private val users = Map(1 -> User(1, "John", "Doe", 32, Some("male")),
                          2 -> User(2, "Johanna", "Doe", 30, None))
  def findById(id: Int): Option[User] = users.get(id)
  def findAll = users.values
}

Now, if you received an instance of Option[User] from the UserRepository and need to do something with it, how do you do that?

One way would be to check if a value is present by means of the isDefined method of your option, and, if that is the case, get that value via its get method:

val user1 = UserRepository.findById(1)
if (user1.isDefined) {
  println(user1.get.firstName)
} // will print "John"

This is very similar to how the Optional type in the Guava library is used in Java. If you think this is clunky and expect something more elegant from Scala, you’re on the right track. More importantly, if you use get, you might forget about checking with isDefined before, leading to an exception at runtime, so you haven’t gained a lot over using null.

You should stay away from this way of accessing options whenever possible!

Providing a default value

Very often, you want to work with a fallback or default value in case an optional value is absent. This use case is covered pretty well by the getOrElse method defined on Option:

val user = User(2, "Johanna", "Doe", 30, None)
println("Gender: " + user.gender.getOrElse("not specified")) // will print "not specified"

Please note that the default value you can specify as a parameter to the getOrElse method is a by-name parameter, which means that it is only evaluated if the option on which you invoke getOrElse is indeed None. Hence, there is no need to worry if creating the default value is costly for some reason or another – this will only happen if the default value is actually required.

Pattern matching

Some is a case class, so it is perfectly possible to use it in a pattern, be it in a regular pattern matching expression or in some other place where patterns are allowed. Let’s rewrite the example above using pattern matching:

val user = User(2, "Johanna", "Doe", 30, None)
user.gender match {
  case Some(gender) => println("Gender: " + gender)
  case None => println("Gender: not specified")
}

Or, if you want to remove the duplicated println statement and make use of the fact that you are working with a pattern matching expression:

val user = User(2, "Johanna", "Doe", 30, None)
val gender = user.gender match {
  case Some(gender) => gender
  case None => "not specified"
}
println("Gender: " + gender)

You will hopefully have noticed that pattern matching on an Option instance is rather verbose, which is also why it is usually not idiomatic to process options this way. So, even if you are all excited about pattern matching, try to use the alternatives when working with options.

There is one quite elegant way of using patterns with options, which you will learn about in the section on for comprehensions, below.

Options can be viewed as collections

So far you haven’t seen a lot of elegant or idiomatic ways of working with options. We are coming to that now.

I already mentioned that Option[A] is a container for a value of type A. More precisely, you may think of it as some kind of collection – some special snowflake of a collection that contains either zero elements or exactly one element of type A. This is a very powerful idea!

Even though on the type level, Option is not a collection type in Scala, options come with all the goodness you have come to appreciate about Scala collections like List, Set etc – and if you really need to, you can even transform an option into a List, for instance.

So what does this allow you to do?

Performing a side-effect if a value is present

If you need to perform some side-effect only if a specific optional value is present, the foreach method you know from Scala’s collections comes in handy:

UserRepository.findById(2).foreach(user => println(user.firstName)) // prints "Johanna"

The function passed to foreach will be called exactly once, if the Option is a Some, or never, if it is None.

Mapping an option

The really good thing about options behaving like a collection is that you can work with them in a very functional way, and the way you do that is exactly the same as for lists, sets etc.

Just as you can map a List[A] to a List[B], you can map an Option[A] to an Option[B]. This means that if your instance of Option[A] is defined, i.e. it is Some[A], the result is Some[B], otherwise it is None.

If you compare Option to List, None is the equivalent of an empty list: when you map an empty List[A], you get an empty List[B], and when you map an Option[A] that is None, you get an Option[B] that is None.

Let’s get the age of an optional user:

val age = UserRepository.findById(1).map(_.age) // age is Some(32)

flatMap and options

Let’s do the same for the gender:

val gender = UserRepository.findById(1).map(_.gender) // gender is an Option[Option[String]]

The type of the resulting gender is Option[Option[String]]. Why is that?

Think of it like this: You have an Option container for a User, and inside that container you are mapping the User instance to an Option[String], since that is the type of the gender property on our User class.

These nested options are a nuisance? Why, no problem, like all collections, Option also provides a flatMap method. Just like you can flatMap a List[List[A]] to a List[B], you can do the same for an Option[Option[A]]:

val gender1 = UserRepository.findById(1).flatMap(_.gender) // gender is Some("male")
val gender2 = UserRepository.findById(2).flatMap(_.gender) // gender is None
val gender3 = UserRepository.findById(3).flatMap(_.gender) // gender is None

The result type is now Option[String]. If the user is defined and its gender is defined, we get it as a flattened Some. If either the use or its gender is undefined, we get a None.

To understand how this works, let’s have a look at what happens when flat mapping a list of lists of strings, always keeping in mind that an Option is just a collection, too, like a List:

val names: List[List[String]] =
  List(List("John", "Johanna", "Daniel"), List(), List("Doe", "Westheide"))
names.map(_.map(_.toUpperCase))
// results in List(List("JOHN", "JOHANNA", "DANIEL"), List(), List("DOE", "WESTHEIDE"))
names.flatMap(_.map(_.toUpperCase))
// results in List("JOHN", "JOHANNA", "DANIEL", "DOE", "WESTHEIDE")

If we use flatMap, the mapped elements of the inner lists are converted into a single flat list of strings. Obviously, nothing will remain of any empty inner lists.

To lead us back to the Option type, consider what happens if you map a list of options of strings:

val names: List[Option[String]] = List(Some("Johanna"), None, Some("Daniel"))
names.map(_.map(_.toUpperCase)) // List(Some("JOHANNA"), None, Some("DANIEL"))
names.flatMap(xs => xs.map(_.toUpperCase)) // List("JOHANNA", "DANIEL")

If you just map over the list of options, the result type stays List[Option[String]]. Using flatMap, all elements of the inner collections are put into a flat list: The one element of any Some[String] in the original list is unwrapped and put into the result list, whereas any None value in the original list does not contain any element to be unwrapped. Hence, None values are effectively filtered out.

With this in mind, have a look again at what flatMap does on the Option type.

Filtering an option

You can filter an option just like you can filter a list. If the instance of Option[A] is defined, i.e. it is a Some[A], and the predicate passed to filter returns true for the wrapped value of type A, the Some instance is returned. If the Option instance is already None or the predicate returns false for the value inside the Some, the result is None:

UserRepository.findById(1).filter(_.age > 30) // Some(user), because age is > 30
UserRepository.findById(2).filter(_.age > 30) // None, because age is <= 30
UserRepository.findById(3).filter(_.age > 30) // None, because user is already None

For comprehensions

Now that you know that an Option can be treated as a collection and provides map, flatMap, filter and other methods you know from collections, you will probably already suspect that options can be used in for comprehensions. Often, this is the most readable way of working with options, especially if you have to chain a lot of map, flatMap and filter invocations. If it’s just a single map, that may often be preferrable, as it is a little less verbose.

If we want to get the gender for a single user, we can apply the following for comprehension:

for {
  user <- UserRepository.findById(1)
  gender <- user.gender
} yield gender // results in Some("male")

As you may know from working with lists, this is equivalent to nested invocations of flatMap. If the UserRepository already returns None or the Gender is None, the result of the for comprehension is None. For the user in the example, a gender is defined, so it is returned in a Some.

If we wanted to retrieve the genders of all users that have specified it, we could iterate all users, and for each of them yield a gender, if it is defined:

for {
  user <- UserRepository.findAll
  gender <- user.gender
} yield gender

Since we are effectively flat mapping, the result type is List[String], and the resulting list is List("male"), because gender is only defined for the first user.

Usage in the left side of a generator

Maybe you remember from part three of this series that the left side of a generator in a for comprehension is a pattern. This means that you can also patterns involving options in for comprehensions.

We could rewrite the previous example as follows:

for {
  User(_, _, _, _, Some(gender)) <- UserRepository.findAll
} yield gender

Using a Some pattern in the left side of a generator has the effect of removing all elements from the result collection for which the respective value is None.

Chaining options

Options can also be chained, which is a little similar to chaining partial functions. To do this, you call orElse on an Option instance, and pass in another Option instance as a by-name parameter. If the former is None, orElse returns the option passed to it, otherwise it returns the one on which it was called.

A good use case for this is finding a resource, when you have several different locations to search for it and an order of preference. In our example, we prefer the resource to be found in the config dir, so we call orElse on it, passing in an alternative option:

case class Resource(content: String)
val resourceFromConfigDir: Option[Resource] = None
val resourceFromClasspath: Option[Resource] = Some(Resource("I was found on the classpath"))
val resource = resourceFromConfigDir orElse resourceFromClasspath

This is usually a good fit if you want to chain more than just two options – if you simply want to provide a default value in case a given option is absent, the getOrElse method may be a better idea.

Summary

In this article, I hope to have given you everything you need to know about the Option type in order to use it for your benefit, to understand other people’s Scala code and write more readable, functional code. The most important insight to take away from this post is that there is a very basic idea that is common to lists, sets, maps, options, and, as you will see in a future post, other data types, and that there is a uniform way of using these types, which is both elegant and very powerful.

In the following part of this series I am going to deal with idiomatic, functional error handling in Scala.

The Neophyte’s Guide to Scala Part 4: Pattern Matching Anonymous Functions

Dec 12th, 2012

In the previous part of this series, I gave an overview of the various ways in which patterns can be used in Scala, concluding with a brief mention of anonymous functions as another place in which patterns can be put to use. In this post, we are going to take a detailed look at the possibilities opened up by being able to define anonymous functions in this way.

If you have participated in the Scala course at Coursera, or have coded in Scala for a while, you will likely have written anonymous functions on a regular basis. For example, given a list of song titles which you want to transform to lower case for your search index, you might make want to define an anonymous function that you pass to the map method, like this:

val songTitles = List("The White Hare", "Childe the Hunter", "Take no Rogues")
songTitles.map(t => t.toLowerCase)

Or, if you like it even shorter, of course, you will probably write the normalize function like this, making use of Scala’s placeholder syntax:

songTitles.map(_.toLowerCase)

So far so good. However, let’s see how this syntax performs for a slightly different example: We have a sequence of pairs, each representing a word and its frequency in some text. Our goal is to filter out those pairs whose frequency is below or above a certain threshold, and then only return the remaining words, without their respective frequencies. We need to write a function wordsWithoutOutliers(wordFrequencies: Seq[(String, Int)]): Seq[String].

Our initial solution makes use of the filter and map methods, passing anonymous functions to them using our familiar syntax:

val wordFrequencies = ("habitual", 6) :: ("and", 56) :: ("consuetudinary", 2) ::
  ("additionally", 27) :: ("homely", 5) :: ("society", 13) :: Nil
def wordsWithoutOutliers(wordFrequencies: Seq[(String, Int)]): Seq[String] =
  wordFrequencies.filter(wf => wf._2 > 3 && wf._2 < 25).map(_._1)
wordsWithoutOutliers(wordFrequencies) // List("habitual", "homely", "society")

This solution has several problems. The first one is only an aesthetic one – accessing the fields of the tuple looks pretty ugly to me. If only we could destructure the pair, we could make this code a little more pleasant and probably also more readable.

Thankfully, Scala provides an alternative way of writing anonymous functions: A pattern matching anonymous function is an anonymous function that is defined as a block consisting of a sequence of cases, surrounded as usual by curly braces, but without a match keyword before the block. Let’s rewrite our function, making use of this notation:

def wordsWithoutOutliers(wordFrequencies: Seq[(String, Int)]): Seq[String] =
  wordFrequencies.filter { case (_, f) => f > 3 && f < 25 } map { case (w, _) => w }

In this example, we have only used a single case in each of our anonymous functions, because we know that this case always matches – we are simply decomposing a data structure whose type we already know at compile time, so nothing can go wrong here. This is a very common way of using pattern matching anonymous functions.

If you try to assign these anonymous functions to values, you will see that they have the expected type:

val predicate: ((String, Int)) => Boolean = { case (_, f) => f > 3 && f < 25 }
val transformFn: ((String, Int)) => String = { case (w, _) => w }

Please note that you have to specify the type of the value here, the Scala compiler cannot infer it for pattern matching anonymous functions.

Nothing prevents you from defining a more complex sequence of cases, of course. However, if you define an anonymous function this way and want to pass it to some other function, such as the ones in our example, you have to make sure that for all possible inputs, one of your cases matches so that your anonymous function always returns a value. Otherwise, you will risk a MatchError at runtime.

Partial functions

Sometimes, however, a function that is only defined for specific input values is exactly what you want. In fact, such a function can help us get rid of another problem that we haven’t solved yet with our current implementation of the wordsWithoutOutliers function: We first filter the given sequence and then map the remaining elements. If we can boil this down to a solution that only has to iterate over the given sequence once, this would not only need fewer CPU cycles but would also make our code shorter and, ultimately, more readable.

If you browse through Scala’s collections API, you will notice a method called collect, which, for a Seq[A], has the following signature:

def collect[B](pf: PartialFunction[A, B])

This method returns a new sequence by applying the given partial function to all of its elements – the partial function both filters and maps the sequence.

So what is a partial function? In short, it’s a unary function that is known to be defined only for certain input values and that allows clients to check whether it is defined for a specific input value.

To this end, the PartialFunction trait provides an isDefinedAt method. As a matter of fact, the PartialFunction[-A, +B] type extends the type (A) => B (which can also be written as Function1[A, B]), and a pattern matching anonymous function is always of type PartialFunction.

Due to this inheritance hierarchy, passing a pattern matching anonymous function to a method that expects a Function1, like map or filter, is perfectly fine, as long as that function is defined for all input values, i.e. there is always a matching case.

The collect method, however, specifically expects a PartialFunction[A, B] that may not be defined for all input values and knows exactly how to deal with that case. For each element in the sequence, it first checks if the partial function is defined for it by calling isDefinedAt on the partial function. If this returns false, the element is ignored. Otherwise, the result of applying the partial function to the element is added to the result sequence.

Let’s first define a partial function that we want to use for refactoring our wordsWithoutOutliers function to make use of collect:

val pf: PartialFunction[(String, Int), String] = {
  case (word, freq) if freq > 3 && freq < 25 => word
}

We added a guard clause to our case, so that this function will not be defined for word/frequency pairs whose frequency is not within the required range.

Instead of using the syntax for pattern matching anonymous functions, we could have defined this partial function by explicitly extending the PartialFunction trait:

val pf = new PartialFunction[(String, Int), String] {
  def apply(wordFrequency: (String, Int)) = wordFrequency match {
    case (word, freq) if freq > 3 && freq < 25 => word
  }
  def isDefinedAt(wordFrequency: (String, Int)) = wordFrequency match {
    case (word, freq) if freq > 3 && freq < 25 => true
    case _ => false
  }
}

Usually, however, you will want to use the much more concise anonymous function syntax.

Now, if we passed our partial function to the map method, this would compile just fine, but result in a MatchError at runtime, because our partial function is not defined for all possible input values, thanks to the added guard clause:

  wordFrequencies.map(pf) // will throw a MatchError

However, we can pass this partial function to the collect method, and it will filter and map the sequence as expected:

  wordFrequencies.collect(pf) // List("habitual", "homely", "society")

The result of this is the same as that of our current implementation of wordsWithoutOutliers when passing our dummy wordFrequencies sequence to it. So let’s rewrite that function:

def wordsWithoutOutliers(wordFrequencies: Seq[(String, Int)]): Seq[String] =
  wordFrequencies.collect { case (word, freq) if freq > 3 && freq < 25 => word }

Partial functions have some other very useful properties. For example, they provide the means to be chained, allowing for a neat functional alternative to the chain of responsibility pattern known from object-oriented programming. This, however, will have to be the subject of a future post in this series, when I am going to address the issue of functional composability.

Partial functions are also a crucial element of many Scala libraries and APIs. For example, the way an Akka actor processes messages sent to it is defined in terms of a partial function. Hence, it’s quite important to know and understand this concept.

Summary

In this part, we examined an alternative way of defining anonymous functions, namely as a sequence of cases, which opens up some nice destructuring possibilities in a rather concise way. Moreover, we delved into the topic of partial functions, demonstrating their usefulness by means of a simple use case.

In the next article, I am going to dig deeper into the ever-present Option type, explaining the reasoning behind its existence and how best to make use of it.

Please let me know if you have any questions or feedback. Is there any particular topic you would like to see covered in an article?

The Neophyte’s Guide to Scala Part 3: Patterns Everywhere

Dec 5th, 2012

In the first two parts of this series, I spent quite some time explaining what’s actually happening when you destructure an instance of a case class in a pattern, and how to write your own extractors, allowing you to destructure any types of objects in any way you desire.

Now it is time to get an overview of where patterns can actually be used in your Scala code, because so far you have only seen one of the different possible ways to make use of patterns. Here we go!

Pattern matching expressions

One place in which patterns can appear is inside of a pattern matching expression. This way of using patterns should be very familiar to you after attending the Scala course at Coursera and following along in this series. You have some expression e, followed by the match keyword and a block, which can contain any number of cases. A case, in turn, consists of the case keyword followed by a pattern and, optionally, a guard clause on the left side, plus a block on the right side, to be executed if this pattern matches.

Here is a simple example, making use of patterns and, in one of the cases, a guard clause:

case class Player(name: String, score: Int)

def printMessage(player: Player) = player match {
  case Player(_, score) if score > 100000 => println("Get a job, dude!")
  case Player(name, _) => println("Hey " + name + ", nice to see you again!")
}

The printMessage method has a return type of Unit, its sole purpose is to perform a side effect, namely printing a message. It is important to remember that you don’t have to use pattern matching as you would use switch statements in languages like Java. What we are using here is called a pattern matching expression for a reason. Their return value is what is returned by the block belonging to the first matched pattern.

Usually, it’s a good idea to take advantage of this, as it allows you to decouple two things that do not really belong together, making it easier to test your code, too. We could rewrite the example above as follows:

def message(player: Player) = player match {
  case Player(_, score) if score > 100000 => "Get a job, dude!"
  case Player(name, _) => "Hey " + name + ", nice to see you again!"
}
def printMessage(player: Player) = println(message(player))

Now, we have a separate message method whose return type is String. This is essentialy a pure function, returning the result of a pattern matching expression. You could also store the result of such a pattern matching expression as a value or assign it to a variable, of course.

Patterns in value definitions

Another place in which a pattern can occur in Scala is in the left side of a value definition (and in a variable definition, for that matter, but we want write our Scala code in a functional style, so you won’t see a lot of usage of variables in this series). Let’s assume we have a method that returns our current player. We will use a dummy implementation that always returns the same player:

def currentPlayer(): Player = Player("Daniel", 3500)

Your usual value definition looks like this:

val player = currentPlayer()
doSomethingWithTheName(player.name)

If you know Python, you are probably familiar with a feature called sequence unpacking. The fact that you can use any pattern in the left side of a value definition or variable definition lets you write your Scala code in a similar style. We could change our above code and destructure the given current player while assigning it to the left side:

val Player(name, _) = currentPlayer()
doSomethingWithTheName(name)

You can do this with any pattern, but generally, it is a good idea to make sure that your pattern always matches. Otherwise, you will be the witness of an exception at runtime. For instance, the following code is problematic. scores is a method returning a list of scores. In our code below, this method simply returns an empty list to illustrate the problem.

def scores: List[Int] = List()
val best :: rest = scores
println("The score of our champion is " + best)

Oops, we’ve got a MatchError. It seems like our game is not that successful after all, having no scores whatsoever.

A safe and very handy way of using patterns in this way is for destructuring case classes whose type you know at compile time. Also, when working with tuples, this makes your code a lot more readable. Let’s say we have a function that returns the name of a player and their score as a tuple, not using the Player class we have used so far:

def gameResult(): (String, Int) = ("Daniel", 3500)

Accessing the fields of a tuple always feels very awkward:

val result = gameResult()
println(result._1 + ": " + result._2)

It’s safe to destructure our tuple in the value definition, as we know we are dealing with a Tuple2:

val (name, score) = gameResult()
println(name + ": " + score)

This is much more readable, isn’t it?

Patterns in for comprehensions

Patterns also have a very valuable place in for comprehensions. For one, a for comprehension can also contain value definitions. And everything you learnt about the usage of patterns in the left side of value definitions holds true for value definitions in for comprehensions. So if we have a collection of results and want to determine the hall of fame, which in our game is simply a collection of the names of players that have trespassed a certain score threshold, we could do that in a very readable way with a for comprehension:

def gameResults(): Seq[(String, Int)] =
  ("Daniel", 3500) :: ("Melissa", 13000) :: ("John", 7000) :: Nil

def hallOfFame = for {
  result <- gameResults()
  (name, score) = result
  if (score > 5000)
} yield name

The result is List("Melissa", "John"), since the first player does not meet the condition of the guard clause.

This can be written even more concisely, because in for comprehensions, the left side of a generator is also a pattern. So, instead of first assigning each game result to result, we can directly destructure the result in the left side of the generator:

def hallOfFame = for {
  (name, score) <- gameResults()
  if (score > 5000)
} yield name

In this example, the pattern (name, score) always matches, so if it were not for the guard clause, if (score > 5000), the for comprehension would be equivalent to simply mapping from the tuples to the player names, without filtering anything.

It is important to know that patterns in the left side of generators can already be used for filtering purposes – if a pattern on the left side of a generator does not match, the respective element is filtered out.

To illustrate, let’s say we have a sequence of lists, and we want to return the sizes of all non-empty lists. This means we have to filter out all empty lists and then return the sizes of the ones remaining. Here is one solution:

val lists = List(1, 2, 3) :: List.empty :: List(5, 3) :: Nil

for {
  list @ head :: _ <- lists
} yield list.size

The pattern on the left side of the generator does not match for empty lists. This will not throw a MatchError, but result in any empty list being removed. Hence, we get back List(3, 2).

Patterns and for comprehensions are a very natural and powerful combination, and if you work with Scala for some time, you will see that you’ll be using them a lot.

Anonymous functions

Finally, patterns can be used for defining anonymous functions. If you have ever used a catch block in order to deal with an exception in Scala, then you have made used of this feature. Pattern matching anonymous functions is a subject that warrants its own blog post, because there is a lot to be said about it. Hence, I will refrain from delving into this usage of patterns in this article, instead leaving you with the promise of dealing with it in the next part of the series.

Update: Fixed a mistake in the expected result of the hallOfFame for comprehension. Thanks to Rajiv for pointing it out.

The Neophyte’s Guide to Scala Part 2: Extracting Sequences

Nov 28th, 2012

In the first part of this series, we learned how to implement our own extractors and how these extractors can be used for pattern matching. However, we only discussed extractors that allow you to destructure a given object into a fixed number of parameters. Yet, for certain kinds of data structures, Scala allows you to do pattern matching expecting an arbitrary number of extracted parameters.

For example, you can use a pattern that only matches a list of exactly two elements, or a list of exactly three elements:

val xs = 3 :: 6 :: 12 :: Nil
xs match {
  case List(a, b) => a * b
  case List(a, b, c) => a + b + c
  case _ => 0
}

What’s more, if you want to match lists the exact length of which you don’t care about, you can use a wildcard operator, _*:

val xs = 3 :: 6 :: 12 :: 24 :: Nil
xs match {
  case List(a, b, _*) => a * b
  case _ => 0
}

Here, the first pattern matches, binding the first two elements to the variables a and b, while simply ignoring the rest of the list, regardless how many remaining elements there are.

Clearly, extractors for these kinds of patterns cannot be implemented with the means I introduced in the first article. We need a way to specify that an extractor takes an object of a certain type and destructures it into a sequence of extracted values, where the length of that sequence is unknown at compile time.

Enter unapplySeq, an extractor method that allows for doing exactly that. Let’s take a look at one of its possible method signatures:

def unapplySeq(object: S): Option[Seq[T]]

It expects an object of type S and returns either None, if the object does not match at all, or a sequence of extracted values of type T, wrapped in a Some.

Example: Extracting given names

Let’s make use of this kind of extractor method in an admittedly contrived example. Let’s say that in some piece of our application, we are receiving a person’s given name as a String. This string can contain the person’s second or third name, if that person has more than one given name. Hence, possible values could be "Daniel", or "Catherina Johanna", or "Matthew John Michael". We want to be able to match against these names, extracting and binding the individual given names.

Here is a very simple extractor implementation by means of the unapplySeq method that will allow us to do that:

object GivenNames {
  def unapplySeq(name: String): Option[Seq[String]] = {
    val names = name.trim.split(" ")
    if (names.forall(_.isEmpty)) None else Some(names)
  }
}

Given a String containing one or more given names, it will extract those as a sequence. If the input name does not contain at least one given name, this extractor will return None, and thus, a pattern in which this extractor is used will not match such a string.

We can now put our new extractor to test:

def greetWithFirstName(name: String) = name match {
  case GivenNames(firstName, _*) => "Good morning, " + firstName + "!"
  case _ => "Welcome! Please make sure to fill in your name!"
}

This nifty little method returns a greeting for a given name, ignoring everything but the first name. greetWithFirstName("Daniel") will return "Good morning, Daniel!", while greetWithFirstName("Catherina Johanna") will return "Good morning, Catherina!"

Combining fixed and variable parameter extraction

Sometimes, you have certain fixed values to be extracted that you know about at compile time, plus an additional optional sequence of values.

Let’s assume that in our example, the input name contains the person’s complete name, not only the given name. Possible values might be "John Doe" or "Catherina Johanna Peterson". We want to be able to match against such strings using a pattern that always binds the person’s last name to the first variable in the pattern and the first name to the second variable, followed by an arbitrary number of additional given names.

This can be achieved by means of a slight modification of our unapplySeq method, using a different method signature:

def unapplySeq(object: S): Option[(T1, .., Tn-1, Seq[T])]

As you can see, unapplySeq can also return an Option of a TupleN, where the last element of the tuple must be the sequence containing the variable parts of the extracted values. This method signature should be somewhat familiar, as it is similar to one of the possible signatures of the unapply method that I introduced last week.

Here is an extractor making use of this:

object Names {
  def unapplySeq(name: String): Option[(String, String, Seq[String])] = {
    val names = name.trim.split(" ")
    if (names.size < 2) None
    else Some((names.last, names.head, names.drop(1).dropRight(1)))
  }
}

Have a close look at the return type and the construction of the Some. Our method returns an Option of Tuple3. That tuple is created with Scala’s syntax for tuple literals by just putting the three elements – the last name, the first name, and the sequence of additional given names – in a pair of parentheses.

If this extractor is used in a pattern, the pattern will only match if at least a first and last name is contained in the given input string. The sequence of additional given names is created by dropping the first and the last element from the sequence of names.

We can use this extractor to implement an alternative greeting method:

def greet(fullName: String) = fullName match {
  case Names(lastName, firstName, _*) => "Good morning, " + firstName + " " + lastName + "!"
  case _ => "Welcome! Please make sure to fill in your name!"
}

Feel free to play around with this in the REPL or a worksheet.

Summary

In this article, we learned how to implement and use extractors that return variable-length sequences of extracted values. Extractors are a pretty powerful mechanism. They can often be re-used in flexible ways and provide a powerful way to extend the kinds of patterns you can match against.

We will revisit extractors in a case study towards the end of this series. In the next part, however, I will give an overview of the different ways in which patterns can be applied in Scala code – there is more to it than just the pattern matching you have seen in the examples so far.

Update, 24.01.2013: I updated the code example implementing the GivenNames extractor. Thanks to Christophe Bliard for pointing out a mistake in there.

The Neophyte’s Guide to Scala Part 1: Extractors

Nov 21st, 2012

More than 50,000 people signed up for Martin Odersky’s course “Functional Programming Principles in Scala” at Coursera. That’s a huge number of developers for whom this might have been the first contact with Scala, functional programming, or both.

If you are reading this, maybe you are one of them, or maybe you have started to learn Scala by some other means. In any case, if you have started to learn Scala, you are excited to delve deeper into this beautiful language, but it all still feels a little exotic or foggy to you, then the series of articles that is beginning with this one is for you.

Even though the Coursera course covered quite a lot of what you need to know about Scala, the given time constraints made it impossible to explain everything in detail. As a result, some Scala features might seem like magic to you if you are new to the language. You are able to use them somehow, but you haven’t fully grasped how they work and, more importantly, why they work as they do.

In this article and the ones following in the coming weeks, I would like to clear things up and remove those question marks. I will also explain some of the features of the Scala language and library that I had trouble with when I started learning the language, partially because I didn’t find any good explanations for them, but instead just stumbled upon them in the wild. Where appropriate, I will also try to give guidance on how to use these features in an idiomatic™ way.

Enough of the introductions. Before I begin, keep in mind that, while having attended the Coursera course is not a prerequisite for following this series, having roughly the knowledge of Scala as can be acquired in that course is definitely helpful, and I will sometimes refer to the course.

So how does this pattern matching thingie actually work?

In the Coursera course, you came across one very powerful language feature of Scala: Pattern matching. It allows you to decompose a given data structure, binding the values it was constructed from to variables. It’s not an idea that is unique to Scala, though. Other prominent languages in which pattern matching plays an important role are Haskell and Erlang, for instance.

If you followed the video lectures, you saw that you can decompose various kinds of data structures using pattern matching, among them lists, streams, and any instances of case classes. So is this list of data structures that can be destructured fixed, or can you extend it somehow? And first of all, how does this actually work? Is there some kind of magic involved that allows you to write things like the following?

case class User(firstName: String, lastName: String, score: Int)
def advance(xs: List[User]) = xs match {
  case User(_, _, score1) :: User(_, _, score2) :: _ => score1 - score2
  case _ => 0
}

As it turns out, there isn’t. At least not much. The reason why you are able to write the above code (no matter how little sense this particular example makes) is the existence of so-called extractors.

In its most widely applied form, an extractor has the opposite role of a constructor: While the latter creates an object from a given list of parameters, an extractor extracts the parameters from which an object passed to it was created.

The Scala library contains some predefined extractors, and we will have a look at one of them shortly. Case classes are special because Scala automatically creates a companion object for them: a singleton object that contains not only an apply method for creating new instances of the case class, but also an unapply method – the method that needs to be implemented by an object in order for it to be an extractor.

Our first extractor, yay!

There is more than one possible signature for a valid unapply method, but we will start with the ones that are most widely used. Let’s pretend that our User class is not a case class after all, but instead a trait, with two classes extending it, and for the moment, it only contains a single field:

trait User {
  def name: String
}
class FreeUser(val name: String) extends User
class PremiumUser(val name: String) extends User

We want to implement extractors for the FreeUser and PremiumUser classes in respective companion objects, just as Scala would have done were these case classes. If your extractor is supposed to only extract a single parameter from a given object, the signature of an unapply method looks like this:

def unapply(object: S): Option[T]

The method expects some object of type S and returns an Option of type T, which is the type of the parameter it extracts. Remember that Option is Scala’s safe alternative to the existence of null values. There will be a separate article about it, but for now, it’s enough to know that the unapply method returns either Some[T] (if it could successfully extract the parameter from the given object) or None, which means that the parameters could not be extracted, as per the rules determined by the extractor implementation.

Here are our extractors:

trait User {
  def name: String
}
class FreeUser(val name: String) extends User
class PremiumUser(val name: String) extends User

object FreeUser {
  def unapply(user: FreeUser): Option[String] = Some(user.name)
}
object PremiumUser {
  def unapply(user: PremiumUser): Option[String] = Some(user.name)
}

We can now use this in the REPL:

scala> FreeUser.unapply(new FreeUser("Daniel"))
res0: Option[String] = Some(Daniel)

But you wouldn’t usually call this method directly. Scala calls an extractor’s unapply method if the extractor is used as an extractor pattern.

If the result of calling unapply is Some[T], this means that the pattern matches, and the extracted value is bound to the variable declared in the pattern. If it is None, this means that the pattern doesn’t match and the next case statement is tested.

Let’s use our extractors for pattern matching:

val user: User = new PremiumUser("Daniel")
user match {
  case FreeUser(name) => "Hello " + name
  case PremiumUser(name) => "Welcome back, dear " + name
}

As you will already have noticed, our two extractors never return None. The example shows that this makes more sense than it might seem at first. If you have an object that could be of some type or another, you can check its type and destructure it at the same time.

In the example, the FreeUser pattern will not match because it expects an object of a different type than we pass it. Since it wants an object of type FreeUser, not one of type PremiumUser, this extractor is never even called. Hence, the user value is now passed to the unapply method of the PremiumUser companion object, as that extractor is used in the second pattern. This pattern will match, and the returned value is bound to the name parameter.

Later in this article, we will see an example of an extractor that does not always return Some[T].

Extracting several values

Now, let’s assume that our classes against which we want to match have some more fields:

trait User {
  def name: String
  def score: Int
}
class FreeUser(val name: String, val score: Int, val upgradeProbability: Double)
  extends User
class PremiumUser(val name: String, val score: Int) extends User

If an extractor pattern is supposed to decompose a given data structure into more than one parameter, the signature of the extractor’s unapply method looks like this:

def unapply(object: S): Option[(T1, ..., Tn)]

The method expects some object of type S and returns an Option of type TupleN, where N is the number of parameters to extract.

Let’s adapt our extractors to the modified classes:

trait User {
  def name: String
  def score: Int
}
class FreeUser(val name: String, val score: Int, val upgradeProbability: Double)
  extends User
class PremiumUser(val name: String, val score: Int) extends User

object FreeUser {
  def unapply(user: FreeUser): Option[(String, Int, Double)] =
    Some((user.name, user.score, user.upgradeProbability))
}
object PremiumUser {
  def unapply(user: PremiumUser): Option[(String, Int)] = Some((user.name, user.score))
}

We can now use this extractor for pattern matching, just like we did with the previous version:

val user: User = new FreeUser("Daniel", 3000, 0.7d)
user match {
  case FreeUser(name, _, p) =>
    if (p > 0.75) name + ", what can we do for you today?" else "Hello " + name
  case PremiumUser(name, _) => "Welcome back, dear " + name
}

A Boolean extractor

Sometimes, you don’t really have the need to extract parameters from a data structure against which you want to match – instead, you just want to do a simple boolean check. In this case, the third and last of the available unapply method signatures comes in handy, which expects a value of type S and returns a Boolean:

def unapply(object: S): Boolean

Used in a pattern, the pattern will match if the extractor returns true. Otherwise the next case, if available, is tried.

In the previous example, we had some logic that checks whether a free user is likely to be susceptible to being persuaded to upgrade their account. Let’s place this logic in its own boolean extractor:

object premiumCandidate {
  def unapply(user: FreeUser): Boolean = user.upgradeProbability > 0.75
}

As you can see here, it is not necessary for an extractor to reside in the companion object of the class for which it is applicable. Using such a boolean extractor is as simple as this:

val user: User = new FreeUser("Daniel", 2500, 0.8d)
user match {
  case freeUser @ premiumCandidate() => initiateSpamProgram(freeUser)
  case _ => sendRegularNewsletter(user)
}

This example shows that a boolean extractor is used by just passing it an empty parameter list, which makes sense because it doesn’t really extract any parameters to be bound to variables.

There is one other peculiarity in this example: I am pretending that our fictional initiateSpamProgram function expects an instance of FreeUser because premium users are never to be spammed. Our pattern matching is against any type of User, though, so I cannot pass user to the initiateSpamProgram function – not without ugly type casting anyway.

Luckily, Scala’s pattern matching allows to bind the value that is matched to a variable, too, using the type that the used extractor expects. This is done using the @ operator. Since our premiumCandidate extractor expects an instance of FreeUser, we have therefore bound the matched value to a variable freeUser of type FreeUser.

Personally, I haven’t used boolean extractors that much, but it’s good to know they exist, as sooner or later you will probably find yourself in a situation where they come in handy.

Infix operation patterns

If you followed the Scala course at Coursera, you learned that you can destructure lists and streams in a way that is akin to one of the ways you can create them, using the cons operator, :: or #::, respectively:

val xs = 58 #:: 43 #:: 93 #:: Stream.empty
xs match {
  case first #:: second #:: _ => first - second
  case _ => -1
}

Maybe you have wondered why that is possible. The answer is that as an alternative to the extractor pattern notation we have seen so far, Scala also allows extractors to be used in an infix notation. So, instead of writing e(p1, p2), where e is the extractor and p1 and p2 are the parameters to be extracted from a given data structure, it’s always possible to write p1 e p2.

Hence, the infix operation pattern head #:: tail could also be written as #::(head, tail), and our PremiumUser extractor could also be used in a pattern that reads name PremiumUser score. However, this is not something you would do in practice. Usage of infix operation patterns is only recommended for extractors that indeed are supposed to read like operators, which is true for the cons operators of List and Stream, but certainly not for our PremiumUser extractor.

A closer look at the Stream extractor

Even though there is nothing special about how the #:: extractor can be used in pattern matching, let’s take a look at it, to better understand what is going on in our pattern matching code above. Also, this is a good example of an extractor that, depending on the state of the passed in data structure, may return None and thus not match.

Here is the complete extractor, taken from the sources of Scala 2.9.2:

object #:: {
  def unapply[A](xs: Stream[A]): Option[(A, Stream[A])] =
    if (xs.isEmpty) None
    else Some((xs.head, xs.tail))
}

If the given Stream instance is empty, it just returns None. Thus, case head #:: tail will not match for an empty stream. Otherwise, a Tuple2 is returned, the first element of which is the head of the stream, while the second element of the tuple is the tail, which is itself a Stream again. Hence, case head #:: tail will match for a stream of one or more elements. If it has only one element, tail will be bound to the empty stream.

To understand how this extractor works for our pattern matching example, let’s rewrite that example, going from infix operation patterns to the usual extractor pattern notation:

val xs = 58 #:: 43 #:: 93 #:: Stream.empty
xs match {
  case #::(first, #::(second, _)) => first - second
  case _ => -1
}

First, the extractor is called for the intitial stream xs that is passed to the pattern matching block. The extractor returns Some((xs.head, xs.tail)), so first is bound to 58, while the tail of xs is passed to the extractor again, which is used again inside of the first one. Again, it returns the head and and tail as a Tuple2 wrapped in a Some, so that second is bound to the value 43, while the tail is bound to the wildcard _ and thus thrown away.

Using extractors

So when and how should you actually make use of custom extractors, especially considering that you can get some useful extractors for free if you make use of case classes?

While some people point out that using case classes and pattern matching against them breaks encapsulation, coupling the way you match against data with its concrete representation, this criticism usually stems from an object-oriented point of view. It’s a good idea, if you want to do functional programming in Scala, to use case classes as algebraic data types (ADTs) that contain pure data and no behaviour whatsoever.

Usually, implementing your own extractors is only necessary if you want to extract something from a type you have no control over, or if you need additional ways of pattern matching against certain data. For example, a common usage of extractors is to extract meaningful values from some string. As an exercise, think about how you would implement and use a URLExtractor that takes String representations of URLs.

Conclusion

In this first part of the series, we have examined extractors, the workhorse behind pattern matching in Scala. You have learned how to implement your own extractors and how the implementation of an extractor relates to its usage in a pattern.

We haven’t covered all there is to say about extractors, because this article is already long enough as it is. In the next part of this series, I am going to revisit extractors, covering how to implement them if you you want to bind a variable number of extracted parameters in a pattern.

Please do let me know if this article was helpful to you or if something is not clear to you.

← Older Blog Archives Newer →

About me

I’m Daniel Westheide, a software developer from Berlin, Germany.

I am working as a senior consultant at INNOQ.

On this blog, I’m discussing functional programming, usability, as well as anything related to the software development process.

You can get in touch via email. I can also be found on Twitter as @kaffeecoder.

More content published by me can be found on the INNOQ website.

My book

Get The Neophyte’s Guide to Scala e-book at LeanPub.