Method chaining and lazy evaluation in Ruby

By Jeff Kreeftmeijer on 2021-08-07 (originally published on 2011-11-28)

Method chaining is a convenient way to build up complex queries, which are then lazily executed when needed. Within the chain, a single object is updated and passed from one method to the next, until it’s finally transformed into its output value.

Object-relational mappers like Rails’ ActiveRecord use method chaining in their query interfaces to build up a database query by accepting multiple finder methods like #where, #order and #limit.

To illustrate the method chaining concept, the Collection class is a query interface that mimics an ORM by querying data from an in-memory array of hashes instead of connecting to a database.

The data set is a list of animals, with their common names, classes and types:

data = [
  {
    name: "Blue whale",
    class: "mammalia",
    type: "aquatic"
  },
  {
    name: "European lobster",
    class: "malacostraca",
    type: "aquatic"
  },
  {
    name: "South-American tapir",
    class: "mammalia",
    type: "terrestrial"
  }
]

The query interface exposes the #where method to query the dataset, which allows multiple calls to combine multiple clauses. For example, to get all aquatic mammals from the database, pass the #where clause twice:

require "./collection"

Collection
  .new(data)
  .where(class: "mammalia")
  .where(type: "aquatic")
  .to_a

[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}]

A note on `Enumerator::Lazy`

This article explains how method chaining and lazy evaluation work internally and uses an in-memory array as a minimal example. Since arrays are enumerable, filtering methods can be chained on them through Enumerator::Lazy instead of writing a custom implementation:

data
  .lazy
  .select { |item| item[:class] == "mammalia" }
  .select { |item| item[:type] == "terrestrial" }
  .to_a

[{:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}]

For most real world use cases, using Ruby’s built in lazy enumeration is the best solution. However, implementing a custom method chain is useful to write query interfaces on objects that do not implement Enumerable, or when the query needs to be executed in a single pass (unlike the example above, which filters the array twice; once for each call to Enumerable#select).

The `Collection` class

For a custom implementation of method chaining, first implement the #where method on a new class named Collection. A test to verify the behavior of selecting an item from a collection looks like this:

Listing 1: collection_test.rb, with a test for the #where method

require 'minitest/autorun'
require './collection'

class TestCollection < MiniTest::Test
  def test_where
    whale = {
      name: "Blue whale",
      class: "mammalia",
      type: "aquatic"
    }

    lobster = {
      name: "European lobster",
      class: "malacostraca",
      type: "aquatic"
    }

    tapir = {
      name: "South-American tapir",
      class: "mammalia",
      type: "terrestrial"
    }

    collection = Collection.new([whale, lobster, tapir])

    assert_equal([whale], collection.where(name: "Blue whale"))
  end
end

The test asserts that given a collection with two items, one result is selected which matches the #where clause.

Running the test produces errors stating that the called methods don’t exist until we’ve implemented part of the Collection class:

Listing 2: collection.rb, a minimal example of the Collection class that doesn’t throw errors when running the test suite

class Collection
  def initialize(data)
  end

  def where(where)
  end
end

After adding a Collection class that has a stub for both of the called methods, the test errors turn into an assertion failure because the #where method returns nil instead of the asserted result:

ruby collection_test.rb

Run options: --seed 45181

# Running:

F

Finished in 0.010833s, 92.3105 runs/s, 92.3105 assertions/s.

  1) Failure:
TestCollection#test_where [collection_test.rb:26]:
--- expected
+++ actual
@@ -1 +1 @@
-[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}]
+nil


1 runs, 1 assertions, 1 failures, 0 errors, 0 skips

An implementation for `#where`

An implementation that satisfies the test looks like this:

Listing 3: collection.rb, with a #where method that selects entries from the list

class Collection
  def initialize(data)
    @data = data
  end

  def where(where)
    @data.select { |item| where <= item }
  end
end

The #initialize method takes the data from its passed argument, and stores it in an instance variable named @data to be used later.

Because @data is an array, the #where method calls Array#select on it to retrieve a selection of its contents. It passes a block that checks each item with Hash#<=, which checks if the where variable is a subset of the item hash.

ruby collection_test.rb

Run options: --seed 13425

# Running:

.

Finished in 0.000539s, 1855.2876 runs/s, 1855.2876 assertions/s.

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

Chaining `#where` calls

The Collection class needs to chain methods to join multiple where clauses. The updated test suite adds a test that asserts that the #where method can be chained. It also moves the collection to a setup block, as it’s now used by both tests:

Listing 4: collection_test.rb, with a test for multiple #where clauses

require 'minitest/autorun'
require './collection'

class TestCollection < MiniTest::Test
  def setup
    @whale = {
      name: "Blue whale",
      class: "mammalia",
      type: "aquatic"
    }

    @lobster = {
      name: "European lobster",
      class: "malacostraca",
      type: "aquatic"
    }

    @tapir = {
      name: "South-American tapir",
      class: "mammalia",
      type: "terrestrial"
    }

    @collection = Collection.new([@whale, @lobster, @tapir])
  end

  def test_where
    assert_equal([@whale], @collection.where(name: "Blue whale"))
  end

  def test_where_multiple
    assert_equal(
      [@tapir],
      @collection.where(class: "mammalia").where(type: "terrestrial")
    )
  end
end

The new test fails because the second call to #where is executed on the result of the first:

ruby collection_test.rb

Run options: --seed 33677

# Running:

.E

Finished in 0.000555s, 3603.6036 runs/s, 1801.8018 assertions/s.

  1) Error:
TestCollection#test_where_multiple:
NoMethodError: undefined method `where' for [{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}, {:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}]:Array
    collection_test.rb:34:in `test_where_multiple'

2 runs, 1 assertions, 0 failures, 1 errors, 0 skips

In the current implementation, the selection is immediately executed when the first #where is called on the collection. The implementation needs to combine the received #where clauses, and introduce laziness to execute the selection at the last possible moment.

This example combines the received where clauses by merging them into an internal @where instance variable:

Listing 5: collection.rb, with merging #where clauses

class Collection
  def initialize(data)
    @data = data
    @where = {}
  end

  def where(where)
    @where.merge!(where)
    self
  end
end

After merging a received where clause into the internal variable, the where method returns self. This allows for chaining more calls to #where at the end, but removes the filtering, meaning the function does not return filtered data anymore.

ruby collection_test.rb

Run options: --seed 35630

# Running:

FF

Finished in 0.015616s, 128.0738 runs/s, 128.0738 assertions/s.

  1) Failure:
TestCollection#test_where [collection_test.rb:28]:
--- expected
+++ actual
@@ -1 +1 @@
-[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}]
+#<Collection:0xXXXXXX @data=[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}, {:name=>"European lobster", :class=>"malacostraca", :type=>"aquatic"}, {:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}], @where={:name=>"Blue whale"}>


  2) Failure:
TestCollection#test_where_multiple [collection_test.rb:32]:
--- expected
+++ actual
@@ -1 +1 @@
-[{:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}]
+#<Collection:0xXXXXXX @data=[{:name=>"Blue whale", :class=>"mammalia", :type=>"aquatic"}, {:name=>"European lobster", :class=>"malacostraca", :type=>"aquatic"}, {:name=>"South-American tapir", :class=>"mammalia", :type=>"terrestrial"}], @where={:class=>"mammalia", :type=>"terrestrial"}>


2 runs, 2 assertions, 2 failures, 0 errors, 0 skips

The tests fail, because the #where method no longer produces the asserted result.

Resolving the selection

Being lazy, the method chain needs to eventually be resolved to produce results instead of returning self.

This version of the tests forces the results into an array by calling the #to_a method on the end of each #where-chain:

Listing 6: collection_test.rb, with calls to #to_a to force evaluation

require 'minitest/autorun'
require './collection'

class TestCollection < MiniTest::Test
  def setup
    @whale = {
      name: "Blue whale",
      class: "mammalia",
      type: "aquatic"
    }

    @lobster = {
      name: "European lobster",
      class: "malacostraca",
      type: "aquatic"
    }

    @tapir = {
      name: "South-American tapir",
      class: "mammalia",
      type: "terrestrial"
    }

    @collection = Collection.new([@whale, @lobster, @tapir])
  end

  def test_where
    assert_equal([@whale], @collection.where(name: "Blue whale").to_a)
  end

  def test_where_multiple
    assert_equal(
      [@tapir],
      @collection
        .where(class: "mammalia")
        .where(type: "terrestrial")
        .to_a
    )
  end
end

Running the tests now produces more errors because Collection does not implement the #to_a method.

By reusing the code from a previous version of the #where method, this implementation of #to_a uses the @where instance variable to execute the query:

Listing 7: collection.rb, with an implementation of #to_a

class Collection
  def initialize(data)
    @data = data
    @where = {}
  end

  def where(where)
    @where.merge!(where)
    self
  end

  def to_a
    @data.select { |item| @where <= item }
  end
end

Running the test again shows both tests passing:

ruby collection_test.rb

Run options: --seed 16140

# Running:

..

Finished in 0.000800s, 2500.0000 runs/s, 2500.0000 assertions/s.

2 runs, 2 assertions, 0 failures, 0 errors, 0 skips

A note on Enumerator::Lazy

The Collection class

An implementation for #where

Chaining #where calls

Resolving the selection

A note on `Enumerator::Lazy`

The `Collection` class

An implementation for `#where`

Chaining `#where` calls