Method chaining and lazy evaluation in Ruby

By on

Method chaining is a convenient way to build up complex queries, which are then lazily executed when needed. Within the chain, a single object is updated and passed from one method to the next, until it’s finally transformed into its output value.

Object-relational mappers like Rails’ ActiveRecord use method chaining in their query interfaces to build up a database query by accepting multiple finder methods like #where, #order and #limit.

To illustrate the method chaining concept, the Collection class is a query interface that mimics an ORM by querying data from an in-memory array of hashes instead of connecting to a database.

The data set is a list of animals, with their common names, classes and types:

data = [
  {
    name: "Blue whale",
    class: "mammalia",
    type: "aquatic"
  },
  {
    name: "European lobster",
    class: "malacostraca",
    type: "aquatic"
  },
  {
    name: "South-American tapir",
    class: "mammalia",
    type: "terrestrial"
  }
]

The query interface exposes the #where method to query the dataset, which allows multiple calls to combine multiple clauses. For example, to get all aquatic mammals from the database, pass the #where clause twice:

require "./collection"

Collection
  .new(data)
  .where(class: "mammalia")
  .where(type: "aquatic")
  .to_a
[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}]

A note on Enumerator::Lazy

This article explains how method chaining and lazy evaluation work internally and uses an in-memory array as a minimal example. Since arrays are enumerable, filtering methods can be chained on them through Enumerator::Lazy instead of writing a custom implementation:

data
  .lazy
  .select { |item| item[:class] == "mammalia" }
  .select { |item| item[:type] == "terrestrial" }
  .to_a
[{:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}]

For most real world use cases, using Ruby’s built in lazy enumeration is the best solution. However, implementing a custom method chain is useful to write query interfaces on objects that do not implement Enumerable, or when the query needs to be executed in a single pass (unlike the example above, which filters the array twice; once for each call to Enumerable#select).

The Collection class

For a custom implementation of method chaining, first implement the #where method on a new class named Collection. A test to verify the behavior of selecting an item from a collection looks like this:

require 'minitest/autorun'
require './collection'

class TestCollection < MiniTest::Test
  def test_where
    whale = {
      name: "Blue whale",
      class: "mammalia",
      type: "aquatic"
    }

    lobster = {
      name: "European lobster",
      class: "malacostraca",
      type: "aquatic"
    }

    tapir = {
      name: "South-American tapir",
      class: "mammalia",
      type: "terrestrial"
    }

    collection = Collection.new([whale, lobster, tapir])

    assert_equal([whale], collection.where(name: "Blue whale"))
  end
end

The test asserts that given a collection with two items, one result is selected which matches the #where clause.

Running the test produces errors stating that the called methods don’t exist until we’ve implemented part of the Collection class:

class Collection
  def initialize(data)
  end

  def where(where)
  end
end

After adding a Collection class that has a stub for both of the called methods, the test errors turn into an assertion failure because the #where method returns nil instead of the asserted result:

ruby collection_test.rb
Run options: --seed 45181

# Running:

F

Finished in 0.010833s, 92.3105 runs/s, 92.3105 assertions/s.

  1) Failure:
TestCollection#test_where [collection_test.rb:26]:
--- expected
+++ actual
@@ -1 +1 @@
-[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}]
+nil


1 runs, 1 assertions, 1 failures, 0 errors, 0 skips

An implementation for #where

An implementation that satisfies the test looks like this:

class Collection
  def initialize(data)
    @data = data
  end

  def where(where)
    @data.select { |item| where <= item }
  end
end

The #initialize method takes the data from its passed argument, and stores it in an instance variable named @data to be used later.

Because @data is an array, the #where method calls Array#select on it to retrieve a selection of its contents. It passes a block that checks each item with Hash#<=, which checks if the where variable is a subset of the item hash.

ruby collection_test.rb
Run options: --seed 13425

# Running:

.

Finished in 0.000539s, 1855.2876 runs/s, 1855.2876 assertions/s.

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

Chaining #where calls

The Collection class needs to chain methods to join multiple where clauses. The updated test suite adds a test that asserts that the #where method can be chained. It also moves the collection to a setup block, as it’s now used by both tests:

require 'minitest/autorun'
require './collection'

class TestCollection < MiniTest::Test
  def setup
    @whale = {
      name: "Blue whale",
      class: "mammalia",
      type: "aquatic"
    }

    @lobster = {
      name: "European lobster",
      class: "malacostraca",
      type: "aquatic"
    }

    @tapir = {
      name: "South-American tapir",
      class: "mammalia",
      type: "terrestrial"
    }

    @collection = Collection.new([@whale, @lobster, @tapir])
  end

  def test_where
    assert_equal([@whale], @collection.where(name: "Blue whale"))
  end

  def test_where_multiple
    assert_equal(
      [@tapir],
      @collection.where(class: "mammalia").where(type: "terrestrial")
    )
  end
end

The new test fails because the second call to #where is executed on the result of the first:

ruby collection_test.rb
Run options: --seed 33677

# Running:

.E

Finished in 0.000555s, 3603.6036 runs/s, 1801.8018 assertions/s.

  1) Error:
TestCollection#test_where_multiple:
NoMethodError: undefined method `where' for [{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}, {:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}]:Array
    collection_test.rb:34:in `test_where_multiple'

2 runs, 1 assertions, 0 failures, 1 errors, 0 skips

In the current implementation, the selection is immediately executed when the first #where is called on the collection. The implementation needs to combine the received #where clauses, and introduce laziness to execute the selection at the last possible moment.

This example combines the received where clauses by merging them into an internal @where instance variable:

class Collection
  def initialize(data)
    @data = data
    @where = {}
  end

  def where(where)
    @where.merge!(where)
    self
  end
end

After merging a received where clause into the internal variable, the where method returns self. This allows for chaining more calls to #where at the end, but removes the filtering, meaning the function does not return filtered data anymore.

ruby collection_test.rb
Run options: --seed 35630

# Running:

FF

Finished in 0.015616s, 128.0738 runs/s, 128.0738 assertions/s.

  1) Failure:
TestCollection#test_where [collection_test.rb:28]:
--- expected
+++ actual
@@ -1 +1 @@
-[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}]
+#<Collection:0xXXXXXX @data=[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}, {:name=>"European lobster", :class=>"malacostraca", :type=>"aquatic"}, {:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}], @where={:name=>"Blue whale"}>


  2) Failure:
TestCollection#test_where_multiple [collection_test.rb:32]:
--- expected
+++ actual
@@ -1 +1 @@
-[{:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}]
+#<Collection:0xXXXXXX @data=[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}, {:name=>"European lobster", :class=>"malacostraca", :type=>"aquatic"}, {:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}], @where={:class=>"mammalia", :type=>"terrestrial"}>


2 runs, 2 assertions, 2 failures, 0 errors, 0 skips

The tests fail, because the #where method no longer produces the asserted result.

Resolving the selection

Being lazy, the method chain needs to eventually be resolved to produce results instead of returning self.

This version of the tests forces the results into an array by calling the #to_a method on the end of each #where-chain:

require 'minitest/autorun'
require './collection'

class TestCollection < MiniTest::Test
  def setup
    @whale = {
      name: "Blue whale",
      class: "mammalia",
      type: "aquatic"
    }

    @lobster = {
      name: "European lobster",
      class: "malacostraca",
      type: "aquatic"
    }

    @tapir = {
      name: "South-American tapir",
      class: "mammalia",
      type: "terrestrial"
    }

    @collection = Collection.new([@whale, @lobster, @tapir])
  end

  def test_where
    assert_equal([@whale], @collection.where(name: "Blue whale").to_a)
  end

  def test_where_multiple
    assert_equal(
      [@tapir],
      @collection
        .where(class: "mammalia")
        .where(type: "terrestrial")
        .to_a
    )
  end
end

Running the tests now produces more errors because Collection does not implement the #to_a method.

By reusing the code from a previous version of the #where method, this implementation of #to_a uses the @where instance variable to execute the query:

class Collection
  def initialize(data)
    @data = data
    @where = {}
  end

  def where(where)
    @where.merge!(where)
    self
  end

  def to_a
    @data.select { |item| @where <= item }
  end
end

Running the test again shows both tests passing:

ruby collection_test.rb
Run options: --seed 16140

# Running:

..

Finished in 0.000800s, 2500.0000 runs/s, 2500.0000 assertions/s.

2 runs, 2 assertions, 0 failures, 0 errors, 0 skips