The Why (or How or Something) of Y – Coffeescript Edition – As a Personal Reminder

Ah, the Y Combinator. Made famous to laymen through Paul Graham’s incubator, long a rallying point amoung functional language enthusiasts, it’s one of those many, many slippery things in Computer Science that you think you understand until you sit down to explain it to someone and find yourself at a loss for words. Indeed, that’s kinda what’s fun about CS: so many things in it that are deceptively simple are also actually simple, just difficult to explain. So, you know, you start by getting a high-level, conceptual grasp of something, and you think you’ve "got it," only when it comes time to actually use it (or explain it), you get bogged down in details that convince you you didn’t "have it" at all. So then you take a refresher, and you approach it from several angles, and you find yourself re-deriving the high-level conceptual grasp you started with. Which is to say, you understood it all along after all. It seems like heaps of stuff in CS is like that. You understand it until you don’t, and then you do again.

I find myself revisiting the Y Combinator every so often, because as obvious as it seems, it has a way of slipping away every couple of years, and I have to relearn it.

In that spirit, I recently read Richard P. Gabriel’s excellent Why of Y paper. 4 times, in fact. And it occurred to me while I was doing it that getting away with that name depends a lot on what the meaning of "is" is. Well, what "why" is, anyway. It spends most of its time showing you how someone might have stumbled on the idea of Y, and how he might have derived it. Which, to me, seems more like a "how" of Y than a "why" of Y. But I guess it’s meant to be a "why" of Y in the sense of showing why Y is the way it is.

To me, though, just knowing why Y is the way it is is a shallower thing – you know, a useful tool for remembering it when you’ve already understood it once, so that you don’t have to go through the exercise of rederiving it every couple of years.

For me, that goes something like this.

QUESTION What is Y?

ANSWER Y is a combinator that implements recursion.

QUESTION What is a combinator?

ANSWER A combinator is a function with only bound arguments (i.e. not refering to anything outside of its definition – including itself).

QUESTION How does it implement recursion?

ANSWER It’s dead simple, actually. Just think of the single simplest recursive application you can.

(Y f) = (f (Y f))

So, it’s recursive in that it’s defined in terms of itself. Check!

It’s also pretty clearly infinite:

(Y f) = (f (Y f))
      = (f (f (Y f)))
      = (f (f (f (Y f))))

You know, since any (Y f) can be replaced with (f (Y f))

Seen that way, all Y does is provide a locus for recursion – I mean, literally just by being there in the definition.

So we’re done, right?

Well, no – for two reasons. One is that this obviously isn’t a combinator, since it refers to itself in its definition. But let’s come back to that. The second is because I still don’t understand how it would actually work. I mean, clearly this doesn’t work for any function, as is frequently advertised. Looking at this, f would seem to need to be a function that takes (Y f) as an argument, which is to say:

  1. It’s a function of one argument

AND

  1. That argument IS the recursive application of f.

So, really, f has to have a form something like this (Coffeescript):

f = (f) ->
  (n) ->
    body

That is, it’s a function that takes the function it recurs on as an argument, and then returns a function that takes the actual input as an argument.

So, if f were the beloved factorial function, it would be something like this:

factorial = (f) ->
  (n) ->
    if n is 0
      1
    else
      n * f(n-1)

You satisfy yourself on this point by defining a traditional factorial function and passing it in:

fact = (n) ->
  if n is 0
    1
  else
    n * fact(n-1)

console.log (factorial fact) 5

You’ll get 120, as expected.

But of course that’s cheating, because the goal is to make factorial recursive without having to define (a separately-named and explicitly recursive) fact to pass to it to make it work.

So, does Y solve our problem? That is, if I pass factorial to Y, is it recursive, even though its recursive call is called f instead of factorial?

Let’s check.

(Y factorial) = (factorial (Y factorial))

Expand it, and we get:

(n) ->
  if n is 0
    1
  else
    n * ((Y factorial) (n - 1))

So, yeah, that seems to work, because if we then expand (Y factorial) we just get another recursive call of factorial:

(n) ->
  if n is 0
    1
  else
    n * ((factorial (Y factorial)) (n - 1))

And we’ve seen from the previous expansion that (factorial (Y factorial)) returns a function that takes a single number as an argument, which is the continuation of factorial we’re looking for. Great!

Of course, it won’t actually work – for two reasons. (1) I didn’t actually define Y in Coffeescript and (2) Coffeescript isn’t a lazy language, so it will try to expand (Y factorial) before passing it as an argument to factorial, and we know that there lies madness (infinity, whatever). It’ll never stop expanding that definition, and hence never execute our function at all.

So, that’s no good.

Also, as mentioned earlier, it’s still not a combinator anyway, since Y still refers to itself in its own definition. We’re looking for something that doesn’t refer to anything that’s not in its list of arguments. (Because hell, if we’re allowed to refer to things that aren’t in our list of arguments, we don’t NEED no stinkin’ combinator to be recursive, amirite?)

This is the part of Y that always throws me for a loop (yuk, yuk!). If someone weren’t guiding me, here’s where I would get lost. In fact, here’s where I DO get lost every couple of years when I have to remind myself how this works.

The next step is to notice that the goal of that last line is really just to apply factorial to factorial. That’s it’s whole point, its whole reason for being. Because remember, we’ve defined factorial in such a way that it takes its continuation function as its first argument, and returns a function built from that which is ready to take a number.

So, what we’re really gunning for is something like this:

(n) ->
  if n is 0
    1
  else
    n * ((factorial factorial) (n - 1))

You know, pass the recursive function to itself, and then apply that to the next step content argument. Because, as we’ve seen, passing factorial to factorial results in a function that takes a number and recurrs on factorial if the number is greater than 0.

Does it work?

It does!

factorial = (f) ->
  (n) ->
    if n is 0
      1
    else
      n * (f f)(n - 1)

console.log (factorial factorial) 5

It gives us 120.

And that’s really all there is to it!

So what need Y?

Well, again, you don’t really, if you can just remember to always pass your continuation call to itself in the body of the function, you’re good. Y is just a name for that. And of course it’s a much more useful name if it actually does something – that is, if we assign it to a function that does the thing we’re doing manually. Because if you can name the thing and say exactly what it is, you might as well pull it out and have it be a real – not just conceptual – thing, right?

That is, we’d like to do something like this:

shallowfactorial = (f) ->
  (n) ->
    if n is 0
      1
    else
      n * f(n - 1)

factorial = Y shallowfactorial

So, we need something that guarantees that whatever gets passed in as the f argument to shallowfactorial (1) is already recursive before getting passed in and (2) will get passed to itself on each recursive call. It’s really (2) that we were cheating on in our definition of factorial above – because we defined the recursive call to be:

n * (f f)(n - 1)

That (f f) part is the part that we’d specifically like to pull out.

You’ve probably seen Y defined like this (e.g. on the Wikipedia page for it)

Y = λf.(λx.f(x x))(λx.f(x x))

The reason why there have to be two components to the body is exactly that – one of them (the outer one) assures us that we’re passing a recursive function in to our function, and the inner one actually passes in the thing that will call itself on itself at the right time.

So, we’re done, right?

Again, wrong – because Coffeescript isn’t a lazy language. If you try to do this:

y = (f) ->
  ((x) -> f(x x))((x) -> f(x x))

factorial = y shallowfactorial
console.log factorial 5

You’ll get an error telling you that the maximum call stack size is exceeded. That’s because Coffeescript is not a lazy language … i.e. it will try to evaluate the (x x) part before passing it to f, rather than suspending evaluation until it’s needed.

So, we have to somehow suspend evaluation until it’s time. Which is to say, what we need to pass in to f at each of those calls had better be a function itself, not something that constructs a function. So, let’s do that:

y = (f) ->
  ((g) -> f (x) -> g(g)(x))((g) -> f (x) -> g(g)(x))

factorial = y shallowfactorial
console.log factorial 5

And now it works.

Now, this is nothing like the definitive blog entry on the Y Combinator. I’m not entirely sure what that is, but my favorite current candidate (based on reading I did over the weekend), is the well-cited Mike’s World-O-Programming article on it. It’s simply excellent. It covers everything here in greater detail and a whole lot more besides, building up slowly from small steps. It’s also what got me over the lazy eval step. I should also add a link to this gist – which got me out of a Coffeescript syntax error that I just couldn’t see for about 30min. while I was writing this. It also contains a good explanation of how to get Y in Coffeescript.

No, this is more here as a personal reminder. Because, like I said, there’s some kind of decay cycle in my brain where if I don’t think about the Y Combinator for a while, I lull myself into a false sense of security of thinking I understand it, but then not being able to derive it without a lot of thought when put on the spot. So, this is here as my shoving off point the next time that happens. It’s tailored to the two points where I, personally, have difficulty reconstructing it. Again, those are:

  1. The fact that each recusive call inside the target function needs to appy the recursed-on function call argument to itself AND
  2. That we need to pass this whole operation to itself as well, because generalized Y is actually responsible for TWO things: making sure the function that gets passed in is recursive AND making sure that it gets passed to itself before being applied to the actual continuation argument.

So There.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>