A Muggle's Guide to
Tail-Call Optimization
in Ruby

A magic-free tour of tail-call optimization in MRI Ruby

Or...

Or...

Imagine if you dare

A world without

MINASWAN...

^-- Holy $#!%, is that Matz with a Matz puppet? --^

From the blog of

Guido van Rossum...

But the good news is...

No matter what you think...

You're right!

But Siriusly...

What the hell is a Muggle?

Muggle

In the fictional world of J. K. Rowling's Harry Potter book series, a muggle is a person who lacks any sort of magical ability and was not born into the magical world.

But wait...

You must decide where the line between knowledge and magic should fall.

On with the tail calls!

Mommy, what is a tail call?

WRONG

tail call
/ˈtāl ˌkôl/

noun

In computer science, a tail call is a subroutine call performed as the final action of a procedure.

Canonical Example


def a_call_in_tail_position!
  other_method # <<< Tail call
end
    

Canonical Counter-Example


def not_a_call_in_tail_position!
  other_method + 1 # <<< Not a tail call
end
    

Still no good


def still_not_a_call_in_tail_position!
  1 + other_method # <<< Clever, but not a tail call
end
      
Neither of these qualify because there's still additional work (+1) that needs to be performed in each method before the method can pass on its result after the call to other_method is complete.

Other proper tail calls


def iffy_mid_method_tc
  i = rand(2)
  return some_call if i > 0
  i
end
        

def recursive_count(i = 5)
  puts i
  return if i.zero?
  recursive_count(i - 1)
end
        

def or_right_hand_side
  false || other_call
end

def and_right_hand_side
  true && other_call
end
        

def arg_expression(my_int)
  other_call(my_int + 1)
end
        

And more...

Let's take a moment to talk about...

Recursion

Revisiting our recursive example...

Example:


def recursive_count(i = 5)
  puts i
  return if i.zero?
  recursive_count(i - 1)
end
          

Output:


5
4
3
2
1
0
          

Unwound:


recursive_count(5)
puts 5
return if 5.zero? # false
recursive_count(5 - 1)
puts 4
return if 4.zero? # false
recursive_count(4 - 1)
puts 3
return if 3.zero? # false
recursive_count(3 - 1)
puts 2
return if 2.zero? # false
recursive_count(2 - 1)
puts 1
return if 1.zero? # false
recursive_count(1 - 1)
puts 0
return if 0.zero? # true
# Fin
          

Recursion + Tail Call
= Tail Recursion

If a tail call might lead to the same subroutine being called again later in the call chain, the subroutine is said to be tail-recursive, which is a special case of recursion. Tail recursion is particularly useful, especially when combined with tail call optimization.

Who needs loops?

Some functional languages take this to an extreme and use recursion, in paticular, tail recursion, in place of more typical loop constructs.


def recursive_count(i = 5)
  if i >= 0
    puts i
    recursive_count(i - 1)
  end
end
        

def iterative_count(i = 5)
  while i >= 0
    puts i
    i = i - 1
  end
end
        

These two approaches are equivalent, but there is a catch.

The catch...

The link between tail recursion and tail call optimization is so important that a lack of tail call optimization means that recursive solutions aren't really viable for large inputs, which makes many classic functional approaches impossible.

Chill out

What is tail call optimization?

And while we're at it, what is tail call elimination?

Tail call optimization

vs.

Tail call elimination

Kind of the same thing

Tail call elimination

Not some kind of tail call destroying Terminator


(I wish)

Tail call elimination

Tail call optimization

But first, a confession

Remember Obi-wan?

This is only true... from a certain point of view

Example:


def recursive_count(i = 5)
  puts i
  return if i.zero?
  recursive_count(i - 1)
end
          

Output:


5
4
3
2
1
0
          

Unwound:


recursive_count(5)
puts 5
return if 5.zero? # false
recursive_count(5 - 1)
puts 4
return if 4.zero? # false
recursive_count(4 - 1)
puts 3
return if 3.zero? # false
recursive_count(3 - 1)
puts 2
return if 2.zero? # false
recursive_count(2 - 1)
puts 1
return if 1.zero? # false
recursive_count(1 - 1)
puts 0
return if 0.zero? # true
# Fin
          

The truth

The unwound stack from the example is only accurate in an environment with tail call optimization.

I did mention that Ruby isn't tail call optimized, didn't I?

Oops.


recursive_count(5)
puts 5
return if 5.zero? # false
recursive_count(5 - 1)
puts 4
return if 4.zero? # false
recursive_count(4 - 1)
puts 3
return if 3.zero? # false
recursive_count(3 - 1)
puts 2
return if 2.zero? # false
recursive_count(2 - 1)
puts 1
return if 1.zero? # false
recursive_count(1 - 1)
puts 0
return if 0.zero? # true
# Fin
        

Okay, so what?

What's really happening?

In reality, each time recursive_count is called, Ruby allocates a new frame on the stack to execute the method.

So, if you imagine each level of indentation represents a new stack frame, the unwound sequence of instructions is really closer to this:


recursive_count(5)
  puts 5
  return if 5.zero?
  recursive_count(4)
    puts 4
    return if 4.zero?
    recursive_count(3)
      puts 3
      return if 3.zero?
      recursive_count(2)
        puts 2
        return if 2.zero?
        recursive_count(1)
          puts 1
          return if 1.zero?
          recursive_count(0)
            puts 0
            return if 0.zero?
            # Fin
        

This isn't a big deal if we're counting down from five.

Ruby can handle it.

But what if we want to count down from a million?

Or even ten thousand?

Really?


recursive_count(10_000)
  puts 10_000
  return if 10_000.zero?
  recursive_count(9_999)
    puts 9_999
    return if 9_999.zero?
    recursive_count(9_998)
      puts 9_998
      return if 9_998.zero?
      recursive_count(9_997)
        puts 9_997
        return if 9_997.zero?
        recursive_count(9_996)
          puts 9_996
          return if 9_996.zero?
          recursive_count(9_995)
            puts 9_994
            return if 9_994.zero?
            recursive_count(9_994)
              puts 9_994
              return if 9_994.zero?
              recursive_count(9_993)
                puts 9_993
                return if 9_993.zero?
                recursive_count(9_992)
                  puts 9_992
                  return if 9_992.zero?
                  recursive_count(9_991)
                  # ...
        

$ ruby rcount.rb 10000
10000
9999
...
645
644
rcount.rb:2: stack level too deep (SystemStackError)
    from rcount.rb:2:in `recursive_count'
    from rcount.rb:4:in `recursive_count'
     ... 9354 levels...
    from rcount.rb:4:in `recursive_count'
    from rcount.rb:4:in `recursive_count'
    from rcount.rb:7:in `<main>'
    

A lack of tail call optimization means that recursive solutions aren't really viable for large inputs, which makes many classic functional approaches impossible.

The secret magic of tail calls

The secret magic of tail calls

Tail calls are by definition the final action of a procedure

which means we can safely assume that after a tail call is made

nothing in the calling method is needed anymore.

If nothing from the calling method is needed after the tail call,
why keep it around?

While we may prefer to think of a procedure call as meaning "go do this other module and come back", this is only one possble interpretation. We must recognize that there are situations where coming back doesn't matter and these situations may be exploited.
- Guy L. Steele Jr.

Maintaining a reference to the parent stack frame serves two main functions:

Keep a record of the path that YARV took through your program
(i.e. puts caller)
Keep a record of where the result of each method call needs to go

Keeping a record of the execution stack is nice,
but it's really just a convenience.

Let's pretend we don't care about the complete call stack

Is there anyway to avoid keeping the parent stack frame around for the purposes of knowing where to return the result of the tail call?

As it turns out, no.

Thanks for coming!

Enjoy the rest of RubyConf

If the stack frame of the tail call needs to know to return its result to its parent anyway, why not just circumvent the parent entirely and return the result of the tail call to the parent of the parent?

Boom

This is the magic of tail call elimination.

Questions?  Tylenol?

But why stop there?

If you're going to circumvent the parent entirely, why go through the effort of creating a new stack frame for the tail call only to immediately discard the stack frame of the parent?

In practice, tail call elimination is more a kin to tail call avoidance. There is no "extra" stack frame that is eliminated, rather the stack frame of the parent is recycled and reused to execute the tail call.

Without Tail Call Optimization
With Tail Call Optimization

That's all well and good
but if Ruby isn't tail call optimized then...

About that...

Ruby is tail call optimized!

Sort of...

Ruby: optionally tail call optimized!

Support for tail call optimization since Ruby 1.9.2

But it's never been enabled by default.

There was talk of enabling it by default around the time that Ruby 2.0 was released, but nothing ever came of it.

Here be dragons!

Enabling tail call optimization in Ruby

Enabling tail call optimization in Ruby

Tail call optimization can be enabled in a few different ways:

  • Enable universally with a flag at compile time
  • Enable univerally at runtime by tweaking the configuration of RubyVM::InstructionSequence
  • Enable selectively at runtime by creating a specially configured RubyVM::InstructionSequence instance
  • Enable selectively at runtime with helper method annotations using the tco_method gem
    (Full disclosure: I am the author of this gem)

Pause for awe

Patched Ruby with TCO


rvm install 2.2.3 --patch http://blog.tdg5.com/tco_diff
    

--- a/vm_opts.h
+++ b/vm_opts.h
@@ -18,8 +18,8 @@

-#define OPT_TRACE_INSTRUCTION        1
-#define OPT_TAILCALL_OPTIMIZATION    0
+#define OPT_TRACE_INSTRUCTION        0
+#define OPT_TAILCALL_OPTIMIZATION    1
 #define OPT_PEEPHOLE_OPTIMIZATION    1
 #define OPT_SPECIALISED_INSTRUCTION  1
    

RubyVM::InstructionSequence

Since, the tco_method gem uses RubyVM::InstructionSequence under the hood, let's start there.

The RubyVM::InstructionSequence class represents a compiled sequence of instructions for the Ruby Virtual Machine.

With it, you can get a handle to the instructions that make up a method or a proc, compile strings of Ruby code down to VM instructions, and disassemble instruction sequences to strings for easy inspection. It is mostly useful if you want to learn how the Ruby VM works, but it also lets you control various settings for the Ruby iseq compiler.

Selective tail call optimization at runtime with RubyVM::InstructionSequence


ISEQ_OPTIONS = {
  tailcall_optimization: true,
  trace_instruction: false,
}

iseq = RubyVM::InstructionSequence.new(
         code_to_compile, file_path, dir_path,
         line_no, ISEQ_OPTIONS)

iseq.eval
    

Kind of clunky

Recursive factorial using RubyVM::InstructionSequence


code = <<-CODE
  def recursive_factorial(n, accumulator = 1)
    return accumulator if n <= 1
    recursive_factorial(n - 1, accumulator * n)
  end
CODE

iseq = RubyVM::InstructionSequence.new(
         code, nil, nil, nil, ISEQ_OPTIONS)
iseq.eval

recursive_factorial(5000)
    

Clunky town!

Recursive factorial using TCOMethod.tco_eval


require "tco_method"

TCOMethod.tco_eval <<-CODE
  def recursive_factorial(n, accumulator = 1)
    return accumulator if n <= 1
    recursive_factorial(n - 1, accumulator * n)
  end
CODE

recursive_factorial(5000)
    

Still clunky. Code strings are always going to be clunky.

Recursive factorial using TCOMethod::Mixin


require "tco_method"

module Factorial
  extend TCOMethod::Mixin

  def self.recursive_factorial(n, accumulator = 1)
    return accumulator if n <= 1
    recursive_factorial(n - 1, accumulator * n)
  end
  tco_module_method :recursive_factorial
end

Factorial.recursive_factorial(5000)
    

Definitely prettier than a SystemStackError

Recursive factorial using TCOMethod.with_tco


require "tco_method"

TCOMethod.with_tco do
  module Factorial
    def self.recursive_factorial(n, accumulator = 1)
      return accumulator if n <= 1
      recursive_factorial(n - 1, accumulator * n)
    end
  end
end

Factorial.recursive_factorial(5000)
    

Approaching idiomatic.

Benefits of tail call optimization in Ruby

Ruby is a multi-paradigm language

What?

Ruby is a language that borrows the best ideas from many different kinds of languages to bring you the awesomeness that is

Ruby

Multifaceted Ruby

  • Tail call optimization enables a wider ranger of functional solutions by better enabling recursive solutions, including tail recursion.
  • Tail call optimization also benefits strong object oriented designs by enabling better abstractions.
  • Tail call optimization also benefits general performance by more efficiently reusing stack frames and enabling earlier garbage collection of objects in earlier stack frames.

Language Crossover

ECMAScript 6 brings tail call optimization to JavaScript

Why Not Tail Call Optimization?

  • Stack frames ending with tail calls are eliminated from the stack.
    • Makes debugging harder
  • The little known, sometimes used set_trace_func isn't supported when tail call optimization is enabled
  • As currently implemented, can be cumbersome to use
  • Relatively unused/untested despite availability
  • Still some weirdness in what constitutes a tail call

About Me

https://github.com/tdg5
http://blog.tdg5.com
@dannyguinther

Thank you!

This is a test slide!


def this_is_some_test_code
  "testing".to_i + 3
end
  

^--- test animation ---^