Nothing New

Code blocks and closures are nothing new to programming - you can find them in most OO languages (and PERL :p). This has been sort of doable with C# using delegates (and more recently with Linq and Predicates/yield and the Tinker-toy syntax that is Linq) but it is, in a word, obtuse. Ruby does this pretty easily - all without carpal tunnel and 12 Google searches.

The Problem: XML Parsing

I was recording the latest episode in Tekpub's Sinatra production today when a golden opportunity came to show how you can use a block in Ruby to reduce repetitive code. It wasn't that big of a deal to do it - the thing that hit me was how very simple it all looked.

To be specific: I was pulling in the data from the Chinook XML file (the same used for the MVC Music Store), parsing it into a MongoMapper-enabled class (using Nokogiri), and shoving it into MongoDB. Pretty simple stuff really - but there was some repetition in there.

This is my first go:

It works - but it's hardly pretty. There's a good amount of repetition and Nokogiri API noise going on ("css" and "at_css" are element selectors - you can query using xpath too - but the css stuff is easier to read)- we can do a lot better.

Using Blocks

The first thing I wanted to do was to get rid of the parsing and looping noise - it's overly repetitive. These blocks are doing 3 things:*Parsing the XML document for a parent node (like "Artist")

*Looping that node, and parsing out the various Elements (like "Name")

*Assigning those values to the Mongo class (Artist) and saving to the DB (create!)This can be separated using a block that's sole purpose is to parse the XML document, and return each element in turn:

This method takes 2 arguments: the name of the key (like "Artist") and the associated block (noted in the argument list with the "&" - a special prefix for a block).

In Ruby, you can think of a block as a chunk of code, or process, that at some point will be acted on or given control. In the method here - "yield" is called - which means that control will be given back to the calling code. By passing "item" to yield - not only are we handing off control, we're handing a value with it.

To consume this code we can rewrite our initial method like this:

This can look a bit funky until you stare at it a few times. We're calling "parse_key" and passing in the name of the key we want (Artist), and then passing the code between "do" and "end". When the method is called, it will do its parsing thing then hand control back to the calling code - but this time it will pass back the parsed "item" - which the calling code can now use.

You know it will do this because you specified that "item" be used (the thing between the goal posts) in the block - a variable that shares scope between the caller and callee.

It's control ping-pong, and if you've used Ruby before, you've used blocks.

Method Missing in Action

The code is getting better in that we've cleaned up a lot of repetitive noise - but the caller has to know way to much about "item" - the value returned from yield. In our case it has to work against the Nokogiri API stuff, and if we ever change out Nokogiri we'll be in trouble (in that all of the calling code will break).

Ideally we should use a very basic construct - returning an array or hash of some kind (this is by far the preferable solution). We could also get tricky, if we wanted to, and use method_missing with some on-the-fly method declaration. Again - this is hackish trickery, but I think it shows what Ruby is capable of - and moreover it's fun... so what the hell.

Let's set it up so that our parsekey method defines methodmissing so that we can access each element explicitly:

By resetting methodmissing, I'm able to shove the parsing logic back into "parsekey" - which is where it should be. Now I can reduce the noise in my calling code even more:

The overall effect isn't terribly exciting visually (in fact I'm sure there's a lot more I can do here) - but it makes the code a lot nicer to work with. In fact I've been able to use that parsing code in 3 other Rake tasks that I'm doing for this episode - fun!

Here's the final outcome:

It's worth noting a final time that returning a hash would be better here - but I couldn't resist showing a fun way to use method_missing and working an object definition on the fly.