The point is missed

Museum of public mistakes and unfinished projects

Scala Nugget – Pattern matching and Lists

with 7 comments

I was whining recently about how my scala-code is java in poor disguise. I started reading the scala by example pdf that also comes with the scala installation. I just read some interesting things about lists and pattern-matching that gave me an idea how to “scalafy” the following scjava-code:

  def importResource(name:String, resource:Resource):Unit = {
    log.debug("importing " + name + " into " + root)
    val path = pathOf(name)
    createResource(path.toList)(resource)
  }

  private def createResource(nodes:List[String])(resource:Resource) =  {
    val directory = nodes.dropRight(1).foldLeft(root)((directory, name) => {
      if(name.equals("")) {
        directory
      }
      else {
        val directoryOption:Option[AbstractDirectory] = directory.getDirectory(name)
        directoryOption.getOrElse({
          val subDir = directory.createDirectory(name) match {
                         case result:AbstractDirectory => result
                       }
          directory.add(subDir)
          subDir
        })
      }
    })
    directory.createIfNewer(nodes.last, _:Resource)
  }

If it's not crystal clear to you what the code does. Here is an overview:

  • The first method takes a name and a resource
  • The name is split up into it's path-elements
  • The path is passed to the createResource method that will create all directories that does not yet exist on the way to the resource and finally return a function that takes a resource as input and creates it in the already given directory.

A big issue I have with the code is the createResource-method. It simply is very hard to name it. createAllDirectoriesAndReturnResourceCreatingMethod would illustrate better how smelly that method really is. I have an idea on how to refactor this with scalas pattern matching. This is the new version (after some red/green bar cycles):

  def importResource(name:String, resource:Resource):Unit = {
    log.debug("importing " + name + " into " + root)
    val path = pathOf(name)
    createResource(path.toList, root, resource)
  }

  private def createResource(nodes:List[String],
                             directory:AbstractDirectory,
                             resource:Resource) {
    nodes match {
      case head :: Nil => directory.createIfNewer(head, resource)
      case "" :: tail => createResource(tail, directory, resource)
      case head :: tail => {
        val subDir = directory.getDirectory(head).getOrElse {
          val result = directory.createDirectory(head) match {
                                   case dir:AbstractDirectory => dir
                                 }
          directory.add(result)
          result}
        createResource(tail, subDir, resource)
      }
      case nil =>
    }
  }

The first thing that strikes me about the second version is the unbalance between the different cases. The head :: tail case is not exactly a one-liner... but it should be. This smells like feature envy. We could ask the directory to getOrCreateDirectory and it would read much better. But that is another refactoring. First let's go through this one:
One thing I did not first get in scala is how you can iterate through lists using pattern-matching. In order to get how that works it is important to realise one thing about scalas lists:

List("a", "b", "c", "d")

is equivalent to

"a" ::("b" ::("c" ::("d" ::(Nil))))

which due to the right associativityness of :: is equivalent to:

"a" :: "b" :: "c" :: "d" :: Nil

Or plain english: Lists are not flat structures, lists are recursive structures. What that means in practical terms is that given any element in the list it is very easy to split the list at that element into head (the current element) and tail (the rest of the elements.

For instance, given the element "b" above, head will be "b" and tail will be "c" :: "d" :: Nil.

Ok, not that hard. Now to the cool part: The :: operator is a case class which in short means that it can be used in pattern matching:

"a" :: "b" :: "c" :: Nil match { case h :: t => println(head +" -> " +tail); ...}

will assign "a" to the variable h and "b" :: "c" :: nil to the variable t

We can now understand the cases above. In pseudocode:

  private def createResource(nodes:List[String],
                             directory:AbstractDirectory,
                             resource:Resource) {
    nodes match {
      case head :: Nil =>  //the last element of the list, create the resource using head as a name
      case ""   :: tail => // special case, an empty directory name. Skip to the next name:
                           // createResource(tail, directory, resource)
      case head :: tail => //head will be a directory-name that we use to get or create the 
                           //nextDirectory that is used with tail to call ourselves recursively:
                           // createResource(tail, nextDirectory, resource)
      case nil => //no more elements to traverse
    }
  }

I'm far from done with the refactoring. I want to get rid of the special case of "" to begin with and as mentioned above remove some feature-envy but I think that the cases are more readable than the original code. At least in terms of where the problems lie in terms of special cases and bloated cases. Of course, it's just an opinion and I reserve the right to change my mind tomorrow :)

About these ads

Written by johlrogge

October 6, 2008 at 10:27 pm

Posted in scala

Tagged with , , , , ,

7 Responses

Subscribe to comments with RSS.

  1. If you write:

    createResource(path.toList.filter(_ != “”), root, resource)

    Then you can get rid of the (“” :: tail) case in createResource.

    Jorge Ortiz

    October 7, 2008 at 8:37 am

  2. import java.io.File

    var nodes = List(“foo”,”bar”,”baz”)
    var root = new File(“/”)
    var dir = (List(root) /: nodes) ((fs, n)=>{ if (n == “”) fs else new File(fs.head, n) :: fs}) head
    if (dir.mkdirs()) createResourceUnder(dir, resource)

    Julian Morrison

    October 7, 2008 at 10:46 am

  3. Jorge: Thanks for the pointer. I think that it moves the problem rather than fixes it but I agree that it is better to deal with it earlier than as a special case so it is indeed a better solution.

    I expect the “” case to dissapear entirely when I fix other crappy code elsewhere. I need to get to the bottom with that.

    johlrogge

    October 7, 2008 at 11:44 am

  4. Julian: Now we’re talking! :) A bit out of my comfort-zone at the moment but I will give it try and see how it goes to use your ideas in my code.
    To me it seems like a form of the first example minus a poor abstraction of a filesystem and also not adressing the fact that the last element is the name of the resource?

    I must confess that I’m still a bit uneasy with one letter variable-names etc. While making the code compact it is harder to follow for me.

    I find that I have to use more of my “brain-stack” when reading a one liner like yours rather than writing it one step at the time… (though the elegance of code like yours impresses me).

    I have similar issues with /: instead of foldLeft. I realize that this is probably a matter of what you’re used to and that I may grow into it. An advantage of /: compared to foldLeft is that it is more recognizable in a formula while foldLeft has to be read… and after all you do have to understand what foldLeft does anyway, it’s not self explainatory to a java guy like me with no background in FP.

    I love that I get suggestions like these, keep them coming! Reflection amplifies learning!

    johlrogge

    October 7, 2008 at 12:12 pm

  5. I’m really cheating by using a lot of the Haskell conventions, and a dirty Scala trick of using head without braces or a dot (it’s not meant to be line-wrapped!). Also, I goofed by putting var when I meant val.

    Anyway, the Haskell conventions here are single character variables (they actually make functional code easier to follow – variables are glue, watch the functions), and pluralized variable names (here: fs) indicating the tail of a list (so you often find consing or pattern matching that looks like x :: xs).

    So we’re folding a list of File out of a list of String nodes (going forward through the nodes and building the result list by consing on the front – so the end result looks reversed) so as to wrap each dir level (starting from “/” or wherever else) in the two argument constructor of File, using the head of the accumulator list as the parent dir (it was what we built last iteration for the previous node). Then when our list is built (it’s not wasteful, there is only one of each File and they point to each other) you snip off the top one, and it’s the whole directory hierarchy. Then just ask it to mkdir itself.

    The (as /: bs)((as, b)=>…) notation in Scala is a visual hint of what it does. The “as” are the beginning of a list to be folded out of “bs”, which will be done by starting with the provided “as” and then using the earlier “as” and the current “b”. Some people complain it’s terse, but I personally find the syntactic sugar more instructive than just the words “fold left”.

    Julian Morrison

    October 8, 2008 at 12:56 am

  6. Also, the mixing up of creating the resource and the directory is half of what makes your imperative code hard (otherwise could be barely longer than the FP code). It’s simpler to create the directory *then* create the resource in it ;-)

    Julian Morrison

    October 8, 2008 at 1:06 am

  7. Thanks for the clarification, I actually was able to figure out how the code works but it was out of my comfort zone :) The reason the directories are mixed with the resource-name is how it was easier to use the original method (way back) which was basically looking like:

    import(“/adirectory/afile.bin”, inputStream)

    Now the code is evolving towards:

    import (“/adirectory”, namedResource)

    And you’re right, indeed mixing directories and resources was the hard part. The original java-code was more object oriented and used a tell don’t ask style that I managed to mess up totally when converting to scala due to my lack of tools in the language (not that the language does not have tools, just that I have not been able to pick them up).

    Yesterday I moved some of the uglyness into the directory class and the code now looks like this:

    private def createResource(path:List[String],
    directory:AbstractDirectory,
    resource:Resource) {
    path match {
    case head :: Nil => directory.createIfNewer(head, resource)
    case “” :: tail => createResource(tail, directory, resource)
    case head :: tail => createResource(tail, directory.getOrElseCreateDirectory(head), resource)
    case nil =>
    }
    }

    Which is a bit better but if as you said, the named resource and the directory-path were separate pretty much all of this would go away (and it would be more intuitive to use).

    I think this is a cool thing about scala, even if I code java in scala, the ugliness that was hidden in the redundant java explicit type soup really comes out and stares me in the face. It’s like a mist is clearing (and I’m not always happy about what I find).

    A see your point about /: \: foldLeft does seem to work better as an operator. (After all, what does “foldLeft” mean anyway :)) but it makes the code look mighty strange to us java-folks :)

    Thanks for your comment!

    johlrogge

    October 8, 2008 at 6:06 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: