A magic-free tour of tail-call optimization in MRI Ruby
In the fictional world of J. K. Rowling's Harry Potter book series, a muggle is a person who lacks any sort of magical ability and was not born into the magical world.
noun
In computer science, a tail call is a subroutine call performed as the final action of a procedure.
def a_call_in_tail_position!
other_method # <<< Tail call
end
def not_a_call_in_tail_position!
other_method + 1 # <<< Not a tail call
end
def still_not_a_call_in_tail_position!
1 + other_method # <<< Clever, but not a tail call
end
other_method
is complete.
def iffy_mid_method_tc
i = rand(2)
return some_call if i > 0
i
end
def recursive_count(i = 5)
puts i
return if i.zero?
recursive_count(i - 1)
end
def or_right_hand_side
false || other_call
end
def and_right_hand_side
true && other_call
end
def arg_expression(my_int)
other_call(my_int + 1)
end
And more...
Example:
def recursive_count(i = 5)
puts i
return if i.zero?
recursive_count(i - 1)
end
Output:
5
4
3
2
1
0
Unwound:
recursive_count(5)
puts 5
return if 5.zero? # false
recursive_count(5 - 1)
puts 4
return if 4.zero? # false
recursive_count(4 - 1)
puts 3
return if 3.zero? # false
recursive_count(3 - 1)
puts 2
return if 2.zero? # false
recursive_count(2 - 1)
puts 1
return if 1.zero? # false
recursive_count(1 - 1)
puts 0
return if 0.zero? # true
# Fin
If a tail call might lead to the same subroutine being called again later in the call chain, the subroutine is said to be tail-recursive, which is a special case of recursion. Tail recursion is particularly useful, especially when combined with tail call optimization.
Some functional languages take this to an extreme and use recursion, in paticular, tail recursion, in place of more typical loop constructs.
def recursive_count(i = 5)
if i >= 0
puts i
recursive_count(i - 1)
end
end
def iterative_count(i = 5)
while i >= 0
puts i
i = i - 1
end
end
These two approaches are equivalent, but there is a catch.
The link between tail recursion and tail call optimization is so important that a lack of tail call optimization means that recursive solutions aren't really viable for large inputs, which makes many classic functional approaches impossible.
And while we're at it, what is tail call elimination?
Not some kind of tail call destroying Terminator
Example:
def recursive_count(i = 5)
puts i
return if i.zero?
recursive_count(i - 1)
end
Output:
5
4
3
2
1
0
Unwound:
recursive_count(5)
puts 5
return if 5.zero? # false
recursive_count(5 - 1)
puts 4
return if 4.zero? # false
recursive_count(4 - 1)
puts 3
return if 3.zero? # false
recursive_count(3 - 1)
puts 2
return if 2.zero? # false
recursive_count(2 - 1)
puts 1
return if 1.zero? # false
recursive_count(1 - 1)
puts 0
return if 0.zero? # true
# Fin
The unwound stack from the example is only accurate in an environment with tail call optimization.
I did mention that Ruby isn't tail call optimized, didn't I?
recursive_count(5)
puts 5
return if 5.zero? # false
recursive_count(5 - 1)
puts 4
return if 4.zero? # false
recursive_count(4 - 1)
puts 3
return if 3.zero? # false
recursive_count(3 - 1)
puts 2
return if 2.zero? # false
recursive_count(2 - 1)
puts 1
return if 1.zero? # false
recursive_count(1 - 1)
puts 0
return if 0.zero? # true
# Fin
In reality, each time recursive_count
is called,
Ruby allocates a new frame on the stack to execute the method.
So, if you imagine each level of indentation represents a new stack frame, the unwound sequence of instructions is really closer to this:
recursive_count(5)
puts 5
return if 5.zero?
recursive_count(4)
puts 4
return if 4.zero?
recursive_count(3)
puts 3
return if 3.zero?
recursive_count(2)
puts 2
return if 2.zero?
recursive_count(1)
puts 1
return if 1.zero?
recursive_count(0)
puts 0
return if 0.zero?
# Fin
This isn't a big deal if we're counting down from five.
Ruby can handle it.
But what if we want to count down from a million?
Or even ten thousand?
recursive_count(10_000)
puts 10_000
return if 10_000.zero?
recursive_count(9_999)
puts 9_999
return if 9_999.zero?
recursive_count(9_998)
puts 9_998
return if 9_998.zero?
recursive_count(9_997)
puts 9_997
return if 9_997.zero?
recursive_count(9_996)
puts 9_996
return if 9_996.zero?
recursive_count(9_995)
puts 9_994
return if 9_994.zero?
recursive_count(9_994)
puts 9_994
return if 9_994.zero?
recursive_count(9_993)
puts 9_993
return if 9_993.zero?
recursive_count(9_992)
puts 9_992
return if 9_992.zero?
recursive_count(9_991)
# ...
$ ruby rcount.rb 10000
10000
9999
...
645
644
rcount.rb:2: stack level too deep (SystemStackError)
from rcount.rb:2:in `recursive_count'
from rcount.rb:4:in `recursive_count'
... 9354 levels...
from rcount.rb:4:in `recursive_count'
from rcount.rb:4:in `recursive_count'
from rcount.rb:7:in `<main>'
A lack of tail call optimization means that recursive solutions aren't really viable for large inputs, which makes many classic functional approaches impossible.
Tail calls are by definition the final action of a procedure
which means we can safely assume that after a tail call is made
nothing in the calling method is needed anymore.
If nothing from the calling method is needed after the tail call,
why keep
it around?
While we may prefer to think of a procedure call as meaning "go do this other module and come back", this is only one possble interpretation. We must recognize that there are situations where coming back doesn't matter and these situations may be exploited.- Guy L. Steele Jr.
Maintaining a reference to the parent stack frame serves two main functions:
puts caller
)
Keeping a record of the execution stack is nice,
but it's really just
a convenience.
Is there anyway to avoid keeping the parent stack frame around for the purposes of knowing where to return the result of the tail call?
If the stack frame of the tail call needs to know to return its result to its parent anyway, why not just circumvent the parent entirely and return the result of the tail call to the parent of the parent?
This is the magic of tail call elimination.
If you're going to circumvent the parent entirely, why go through the effort of creating a new stack frame for the tail call only to immediately discard the stack frame of the parent?
In practice, tail call elimination is more a kin to tail call avoidance. There is no "extra" stack frame that is eliminated, rather the stack frame of the parent is recycled and reused to execute the tail call.
Support for tail call optimization since Ruby 1.9.2
But it's never been enabled by default.
There was talk of enabling it by default around the time that Ruby 2.0 was released, but nothing ever came of it.
Tail call optimization can be enabled in a few different ways:
RubyVM::InstructionSequence
RubyVM::InstructionSequence
instance
tco_method
gem
rvm install 2.2.3 --patch http://blog.tdg5.com/tco_diff
--- a/vm_opts.h
+++ b/vm_opts.h
@@ -18,8 +18,8 @@
-#define OPT_TRACE_INSTRUCTION 1
-#define OPT_TAILCALL_OPTIMIZATION 0
+#define OPT_TRACE_INSTRUCTION 0
+#define OPT_TAILCALL_OPTIMIZATION 1
#define OPT_PEEPHOLE_OPTIMIZATION 1
#define OPT_SPECIALISED_INSTRUCTION 1
RubyVM::InstructionSequence
Since, the tco_method
gem uses
RubyVM::InstructionSequence
under the hood, let's start
there.
The RubyVM::InstructionSequence
class represents a compiled
sequence of instructions for the Ruby Virtual Machine.
With it, you can get a handle to the instructions that make up a method or a proc, compile strings of Ruby code down to VM instructions, and disassemble instruction sequences to strings for easy inspection. It is mostly useful if you want to learn how the Ruby VM works, but it also lets you control various settings for the Ruby iseq compiler.
Selective tail call optimization at runtime with
RubyVM::InstructionSequence
ISEQ_OPTIONS = {
tailcall_optimization: true,
trace_instruction: false,
}
iseq = RubyVM::InstructionSequence.new(
code_to_compile, file_path, dir_path,
line_no, ISEQ_OPTIONS)
iseq.eval
Kind of clunky
Recursive factorial using RubyVM::InstructionSequence
code = <<-CODE
def recursive_factorial(n, accumulator = 1)
return accumulator if n <= 1
recursive_factorial(n - 1, accumulator * n)
end
CODE
iseq = RubyVM::InstructionSequence.new(
code, nil, nil, nil, ISEQ_OPTIONS)
iseq.eval
recursive_factorial(5000)
Clunky town!
Recursive factorial using TCOMethod.tco_eval
require "tco_method"
TCOMethod.tco_eval <<-CODE
def recursive_factorial(n, accumulator = 1)
return accumulator if n <= 1
recursive_factorial(n - 1, accumulator * n)
end
CODE
recursive_factorial(5000)
Still clunky. Code strings are always going to be clunky.
Recursive factorial using TCOMethod::Mixin
require "tco_method"
module Factorial
extend TCOMethod::Mixin
def self.recursive_factorial(n, accumulator = 1)
return accumulator if n <= 1
recursive_factorial(n - 1, accumulator * n)
end
tco_module_method :recursive_factorial
end
Factorial.recursive_factorial(5000)
Definitely prettier than a SystemStackError
Recursive factorial using TCOMethod.with_tco
require "tco_method"
TCOMethod.with_tco do
module Factorial
def self.recursive_factorial(n, accumulator = 1)
return accumulator if n <= 1
recursive_factorial(n - 1, accumulator * n)
end
end
end
Factorial.recursive_factorial(5000)
Approaching idiomatic.
Ruby is a multi-paradigm language
What?
Ruby is a language that borrows the best ideas from many different kinds of languages to bring you the awesomeness that is
ECMAScript 6 brings tail call optimization to JavaScript
set_trace_func
isn't
supported when tail call optimization is enabled
def this_is_some_test_code
"testing".to_i + 3
end