Ruby (1.9.x) has a class named Fiber
. You need to pass a block of code to create an instance. On the new instance you call the resume
method. The method executes the given block. The code can return at any point. Calling resume
again results in resuming the code at the point where it returned the last call.
As an example, we are using a Fiber
to implement a tokenizer (a tokenizer analyzes an input stream, searches for sequences that matches some patterns and produces tokens).
Postfix notation, also known as Reverse Polish Notation (RPN), for arithmetic expressions writes operands before operators. The 3 + 4
expression in postfix notation is 3 4 +
. Such expressions are rather easy to evaluate with a program:
Our implementation requires two components: the tokenizer and the evaluator (presented in a later entry – Ruby: postfix expressions).
The tokenizer produces tokens that the evaluator consumes to perform calculations.
Consider next two approaches for the implementation:
Using a fiber , we can implement an iterator as if it first reads all the input.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | require 'fiber' class Tokenizer def initialize input_stream @stream = input_stream @fiber = create_fiber end # returns next token or nil def next return nil unless @fiber.alive? @fiber.resume end # returns current line def lineno @stream.lineno end end |
The new instance requires the input stream to analyze as parameter. As you see at line 7, a fiber is created using the create_fiber
method. The fiber is the actual tokenizer.
The next
method returns next token or nil
when the tokenizer reaches the end of the stream. It wraps the resume
method so that we write tokenizer.next
instead of tokenizer.resume
which makes more sense.
The next
method also checks if the fiber is till alive (line 13) so that it returns nil
even if it is called several times after reaching the end of input. Calling the resume
method after exiting the fiber’s block raises a FiberError
exception caused by the dead fiber.
The lineno
method returns the current line number, useful for error messages generated by the token consumer.
Here is the fiber creation method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | def create_fiber Fiber.new do @stream.each do |line| tokens = line.split tokens.each do |token| if token =~ /^\d+$/ Fiber.yield :operand, token.to_i elsif token =~ /^[\+\-\*\/]$/ Fiber.yield :operator, token.to_sym else Fiber.yield :error, "Invalid token #{token.inspect} at line #{@stream.lineno}" end end end Fiber.yield nil end end |
Line 3 creates a new Fiber
object passing a block of code.
Line 5 starts an iteration though all the input lines. Line 7 splits the line into words (sequences separated by spaces).
At line 9 we loop through all the words and test if they match one of the supported operators or an integer, returning the token type (:operator
or :operand
) and the value. Here we see how the code returns values from a fiber: Fiber.yield [value {, value}]
.
When reaching the end of the input stream the code returns nil
(line 23).
If some unsupported input appears the code returns an error (line 16).
Without the fiber support the code would not work because we are in the middle of two loops, normal return would break the loops…
Does it work? The following listing presents the unit tests.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | require 'test/unit' require 'stringio' require 'tokenizer' class TokenizerTests < Test::Unit::TestCase def test_one_operator tokenizer = tokenizer_for('3 4 +') assert_equal [:operand, 3], tokenizer.next assert_equal [:operand, 4], tokenizer.next assert_equal [:operator, :+], tokenizer.next assert_nil tokenizer.next end def test_two_operators tokenizer = tokenizer_for('7 4 - 2 *') assert_equal [:operand, 7], tokenizer.next assert_equal [:operand, 4], tokenizer.next assert_equal [:operator, :-], tokenizer.next assert_equal [:operand, 2], tokenizer.next assert_equal [:operator, :*], tokenizer.next assert_nil tokenizer.next end def test_invalid_semantic tokenizer = tokenizer_for('3 +') assert_equal [:operand, 3], tokenizer.next assert_equal [:operator, :+], tokenizer.next assert_nil tokenizer.next end def test_non_integer_error tokenizer = tokenizer_for('3.5 2 +') assert_equal :error, tokenizer.next[0] end def test_invalid_symbol_error tokenizer = tokenizer_for('22 a +') assert_equal [:operand, 22], tokenizer.next assert_equal :error, tokenizer.next[0] end def tokenizer_for string Tokenizer.new(StringIO.new(string)) end end |
In the helper method (tokenizer_for
) that returns the tokenizer under test, we use the StringIO
class to wrap around a String
object because the tokenizer needs a IO stream as input (line 63).
$> ruby test_tokenizer.rb Loaded suite /tmp/release/fiber/test_tokenizer Started ..... Finished in 0.00139421 seconds. ------ 5 tests, 16 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed ------ 3586.26 tests/s, 11476.03 assertions/s
The implementation of the tokenizer is the same as we would write if we created an array of tokens, replacing the Fiber.yield
with the filling of the array. Still, the tokenizer does not need to transform the whole input file to an array of tokens in memory, the client code does not need to wait for the whole file to be processed.
We focus only on reading the input, not on how to interact with the client code.
Are Fiber
objects used? A quick check in a project using 51 gems reveals that one file in one gem does use a fiber (action_view/renderer/streaming_template_renderer.rb
from actionpack 3.2.8).
The same check in the standard library classes, in Ruby sources, shows that they do not make any use of it.
We did not care about performance, we focused on the fiber concept.