Ruby and Fiber

Ruby (1.9.x) has a class named Fiber. You need to pass a block of code to create an instance. On the new instance you call the resume method. The method executes the given block. The code can return at any point. Calling resume again results in resuming the code at the point where it returned the last call.

As an example, we are using a Fiber to implement a tokenizer (a tokenizer analyzes an input stream, searches for sequences that matches some patterns and produces tokens).

A tokenizer for a postfix evaluator

Postfix notation, also known as Reverse Polish Notation (RPN), for arithmetic expressions writes operands before operators. The 3 + 4 expression in postfix notation is 3 4 +. Such expressions are rather easy to evaluate with a program:

Our implementation requires two components: the tokenizer and the evaluator (presented in a later entry – Ruby: postfix expressions).

The tokenizer produces tokens that the evaluator consumes to perform calculations.

The tokenizer

Using a fiber , we can implement an iterator as if it first reads all the input.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
require 'fiber'

class Tokenizer

  def initialize input_stream
    @stream = input_stream
    @fiber = create_fiber
  end
  
  # returns next token or nil
  def next
    
    return nil unless @fiber.alive?
    
    @fiber.resume
    
  end

  # returns current line
  def lineno
    @stream.lineno
  end
  
end

The new instance requires the input stream to analyze as parameter. As you see at line 7, a fiber is created using the create_fiber method. The fiber is the actual tokenizer.

The next method returns next token or nil when the tokenizer reaches the end of the stream. It wraps the resume method so that we write tokenizer.next instead of tokenizer.resume which makes more sense.

The next method also checks if the fiber is till alive (line 13) so that it returns nil even if it is called several times after reaching the end of input. Calling the resume method after exiting the fiber’s block raises a FiberError exception caused by the dead fiber.

The lineno method returns the current line number, useful for error messages generated by the token consumer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
  def create_fiber
    
    Fiber.new do
      
      @stream.each do |line|
        
        tokens = line.split
        
        tokens.each do |token|
          
          if token =~ /^\d+$/
            Fiber.yield :operand, token.to_i
          elsif token =~ /^[\+\-\*\/]$/
            Fiber.yield :operator, token.to_sym
          else
            Fiber.yield :error, "Invalid token #{token.inspect} at line #{@stream.lineno}"
          end
          
        end
        
      end
      
      Fiber.yield nil
      
    end
    
  end

Line 5 starts an iteration though all the input lines. Line 7 splits the line into words (sequences separated by spaces).

At line 9 we loop through all the words and test if they match one of the supported operators or an integer, returning the token type (:operator or :operand) and the value. Here we see how the code returns values from a fiber: Fiber.yield [value {, value}].

Without the fiber support the code would not work because we are in the middle of two loops, normal return would break the loops…

Testing the tokenizer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
require 'test/unit'
require 'stringio'

require 'tokenizer'

class TokenizerTests < Test::Unit::TestCase

  def test_one_operator
    
    tokenizer = tokenizer_for('3 4 +')
    
    assert_equal [:operand, 3], tokenizer.next
    assert_equal [:operand, 4], tokenizer.next
    assert_equal [:operator, :+], tokenizer.next
    
    assert_nil tokenizer.next
    
  end

  def test_two_operators
    
    tokenizer = tokenizer_for('7 4 - 2 *')
    
    assert_equal [:operand, 7], tokenizer.next
    assert_equal [:operand, 4], tokenizer.next
    assert_equal [:operator, :-], tokenizer.next
    assert_equal [:operand, 2], tokenizer.next
    assert_equal [:operator, :*], tokenizer.next
    
    assert_nil tokenizer.next
    
  end

  def test_invalid_semantic
    
    tokenizer = tokenizer_for('3 +')
    
    assert_equal [:operand, 3], tokenizer.next
    assert_equal [:operator, :+], tokenizer.next
    
    assert_nil tokenizer.next
        
  end

  def test_non_integer_error
    
    tokenizer = tokenizer_for('3.5 2 +')
    
    assert_equal :error, tokenizer.next[0]
    
  end
  
  def test_invalid_symbol_error
    
    tokenizer = tokenizer_for('22 a +')
    
    assert_equal [:operand, 22], tokenizer.next
    assert_equal :error, tokenizer.next[0]
    
  end
  
  def tokenizer_for string
    Tokenizer.new(StringIO.new(string))
  end
  
end

In the helper method (tokenizer_for) that returns the tokenizer under test, we use the StringIO class to wrap around a String object because the tokenizer needs a IO stream as input (line 63).

Conclusion

The implementation of the tokenizer is the same as we would write if we created an array of tokens, replacing the Fiber.yield with the filling of the array. Still, the tokenizer does not need to transform the whole input file to an array of tokens in memory, the client code does not need to wait for the whole file to be processed.

We focus only on reading the input, not on how to interact with the client code.

Are Fiber objects used? A quick check in a project using 51 gems reveals that one file in one gem does use a fiber (action_view/renderer/streaming_template_renderer.rb from actionpack 3.2.8).

The same check in the standard library classes, in Ruby sources, shows that they do not make any use of it.