January 2009

When in Doubt, Turn to _why

Rack can’t parse nested hash params. That’s pretty annoying. So while we still can’t decide how to implement it in the best way, every other framework is cleaning up after Rack. My attempt to do it in Camping is clearly a bad solution, so I decided to have a look at other implementations and steal some ideas.

Let’s use Sinatra as an example

Most of them are very similar to Sinatra’s (this is slightly modified for readability, but the idea still applies):

params.inject({}) do |hash, (key, value)|
  if key =~ /\[.*\]/
    parts = key.scan(/(^[^\[]+)|\[([^\]]+)\]/).flatten.compact
    head, last = parts[0..-2], parts[-1]
    head.inject(hash){ |s,v| s[v] ||= {} }[last] = value
  else
    hash[key] = value
  end
  res
end   

Update: As Ryan has written in the comments, I better give some credz to the real author of the snippet above: Sinatra’s nested params implementation was taken from an example posted to the Rack ML by Michael Fellinger (of Ramaze fame). For the background on the patch that went into Sinatra: take a look the ticket over at their bug tracker.

We’re looping through each of the params and doing some stuff if the key includes [something] and therefore is nested. If it’s nested, we need to find the parts:

parts = key.scan(/(^[^\[]+)|\[([^\]]+)\]/).flatten.compact

The first part of the regex matches from the beginning to the first [, and the second matches each of the [parts]. Some cleanup is needed to flatten and remove nils (just try the line in IRB and you’ll see it quickly).

Next up, we’re splitting out the last part, and then comes the inject:

head.inject(hash){ |s,v| s[v] ||= {} }[last] = value  

Here, we’re building a Hash based on the params we’ve already cleaned up and the current param we’re working on. Then, finally, we’re setting the value.

The _why way

All of this makes sense. I wrote approximately the same when trying to cleanup my broken version. However, while I was looking for the way Ramaze did it, I found an excellent link to RedHanded: Injecting a Hash Backwards and the Merge Block. That version is just awesome (this is also slightly modified):

m = proc {|_,o,n|o.merge(n,&m)}
params.inject({}) do |hash, (key, value)|
  parts = key.split(/[\]\[]+/)
  hash.merge(parts.reverse.inject(value) { |x, i| {i => x} }, &m) 
end    

Notice the sweet, micro way to split out the parts: If we got a key like: “first[second][third]” we can simply split by any numbers of [ and ]. Of course, we’re going to fail on stuff like “this]is]really]one]key”, but if you’re writing like that, you deserve it!

Now, the next is what makes this so different and awesome. Let’s look at first part of it:

parts.reverse.inject(value) { |x, i| {i => x} }

We’re reversing it and building it backwards. The inject starts with our original value and then we build our way out by creating Hashes:

parts = ["first", "second", "third"]
value = 123

# first run of inject:
x = 123
i = "third"
return { "third" => 123 }

# second run of inject:
x = { "third" => 123 }
i = "second"
return { "second" => { "third" => 123 } }

# second run of inject:
x = { "second" => { "third" => 123 } }
i = "first"
return { "first" => { "second" => { "third" => 123 } } }

So when we got this little recursive Hash, we need to merge it with the rest. Most of you, including me, would say that it would be a hard task, since we also need to merge it recursively:

params = { "first" => { "second" => { "third" => 123 } } }
current_param = { "first" => { "another" => 456 } }

params.merge(current_param) # fail!
current_param.merge(params) # fail!

Meet the Merge Block

This was the first time I’ve ever heard of the merge block. It’s not even properly documented! But it’s a very simple and very powerful feature: If we get a merge conflict (same keys in both Hashes), it’s calling that block. Easy peasy!

params = { "first" => { "second" => { "third" => 123 } } }
current_param = { "first" => { "another" => 456 } }

params.merge(current_param) do |key, value_from_params, value_from_current_param|
  # key is defined in both params and current_param
  value_from_params.merge(value_from_current_param)
end

# =>
{ "first" => { "second" => { "third" => 123 }, "another" => 456 } }

And in order to merge it recursively, we build the block in advance and pass the block into the inner merge too:

m = proc {|_,o,n|o.merge(n,&m)}
# ...
hash.merge(recursive_hash, &m) 

Shorter? Faster?

I believe the first, natural way is both faster and shorter (if we use the micro splitter), but I still think I’m going to put _why’s version in Camping. Not (only) because it’s _why’s, but also because it’s so wicked awesome and fits perfectly together with the other weird stuff in Camping. Some bytes have to be sacrificed for style; Camping is still far away from exceeding the limit.