4 Data abstractions

This chapter covers

Abstracting with modules
Working with hierarchical data
Polymorphism with protocols

This chapter discusses building higher-level data structures. In any complex system, there will be a need for abstractions such as Money, Date, Employee, and OrderItem. These are all textbook examples of higher-level abstractions that usually aren’t directly supported by the language and are instead written on top of built-in types.

In Elixir, such abstractions are implemented with pure, stateless modules. In this chapter, you’ll learn how to create and work with your own abstractions.

In a typical object-oriented (OO) language, the basic abstraction building blocks are classes and objects. For example, there may be a String class that implements various string operations. Each string is then an instance of that class and can be manipulated by calling methods, as the following Ruby snippet illustrates:

"a string".upcase

This approach generally isn’t used in Elixir. Being a functional language, Elixir promotes decoupling of data from the code. Instead of classes, you use modules, which are collections of functions. Instead of calling methods on objects, you explicitly call module functions and provide input data via arguments. The following snippet shows the Elixir way of uppercasing a string:

String.upcase("a string")

Another big difference from OO languages is that data is immutable. To modify data, you must call some function and take its result into a variable; the original data is left intact. The following examples demonstrate this technique:

iex(1)> list = []
[]
 
iex(2)> list = List.insert_at(list, -1, :a)
[:a]
 
iex(3)> list = List.insert_at(list, -1, :b)
[:a, :b]
 
iex(4)> list = List.insert_at(list, -1, :c)
[:a, :b, :c]

In these examples, you’re constantly keeping the result of the last operation and feeding it to the next one.

The important thing to notice in both Elixir snippets is that the module is used as the abstraction over the data type. When you need to work with strings, you reach for the String module. When you need to work with lists, you use the List module.

String and List are examples of modules that are dedicated to a specific data type. They’re implemented in pure Elixir, and their functions rely on the predefined format of the input data. String functions expect a binary string as the first argument, whereas List functions expect a list.

Additionally, modifier functions (the ones that transform the data) return data of the same type. The function String.upcase/1 returns a binary string, whereas List.insert_at/3 returns a list.

Finally, a module also contains query functions that return some piece of information from the data, such as String.length/1 and List.first/1. Such functions still expect an instance of the abstraction as the first argument, but they return another type of information.

The basic principles of abstraction in Elixir can be summarized as follows:

A module is in charge of abstracting some behavior.
The module’s functions usually expect an instance of the abstraction as the first argument.
Modifier functions return a modified version of the abstraction.
Query functions return some other type of data.

Given these principles, it’s fairly straightforward to create your own higher-level abstractions, as you’ll see in the next section.

4.1 Abstracting with modules

Lists and strings are lower-level types, but higher-level abstractions are based on the previously stated principles. In fact, you already saw examples of a higher-level abstraction in chapter 2. For example, a MapSet module implements a set. MapSet is implemented in pure Elixir and can serve as a good template for designing an abstraction in Elixir.

Let’s look at an example that uses MapSet:

iex(1)> days =
          MapSet.new() |>                ❶
          MapSet.put(:monday) |>         ❷
          MapSet.put(:tuesday)           ❷
 
iex(2)> MapSet.member?(days, :monday)    ❸
true

❶ Creates an abstraction instance

❷ Modifies the instance

❸ Queries the instance

This approach more or less follows the principles stated earlier. The code is slightly simplified by using the pipe operator to chain operations together. This is possible because all the functions from the MapSet module take a set as the first argument. Such functions are pipe friendly and can be chained with the |> operator.

Notice the new/0 function that creates a new instance of the abstraction. There’s nothing special about this function, and it could have been given any name. Its only purpose is to create a new data structure you can work with.

Because MapSet is an abstraction, you, as a client of this module, don’t concern yourself with its internal workings or its data structure. You call MapSet functions, keeping whatever result you get and passing that result back to functions from the same module.

Note You may think that abstractions like MapSet are something like user-defined types. Although there are many similarities, module-based abstractions aren’t proper data types, like the ones explained in chapter 2. Instead, they’re implemented by composing built-in data types. For example, a MapSet instance is also a map, which you can verify by invoking is_map(MapSet.new()).

Given this template, let’s try to build a simple abstraction.

4.1.1 Basic abstraction

The example in this section is a simple to-do list. The problem is, admittedly, not spectacular, but it’s complex enough to give you something to work with while not being overly complicated. This will allow you to focus on techniques without spending too much time trying to grasp the problem itself.

The basic version of the to-do list will support the following features:

Creating a new to-do list
Adding new entries to the list
Querying the list

Here’s an example of the desired usage:

$ iex simple_todo.ex
 
iex(1)> todo_list =
          TodoList.new() |>
          TodoList.add_entry(~D[2023-12-19], "Dentist") |>
          TodoList.add_entry(~D[2023-12-20], "Shopping") |>
          TodoList.add_entry(~D[2023-12-19], "Movies")
 
iex(2)> TodoList.entries(todo_list, ~D[2023-12-19])
["Movies", "Dentist"]
 
iex(3)> TodoList.entries(todo_list, ~D[2023-12-18])
[]

This is fairly self-explanatory. You create an instance by calling TodoList.new/0, and then you add some entries. Finally, you execute some queries. The expression ~D[2023-12-19], as explained in section 2.4.11, creates a date (December 19, 2023), powered by the Date module.

As the chapter progresses, you’ll add additional features and modify the interface slightly. You’ll continue adding features throughout this book, and by the end, you’ll have a fully working distributed web server that can manage a large number of to-do lists.

For now, let’s start with this simple interface. First, you must decide on the internal data representation. In the preceding snippet, you can see that the primary use case is finding all entries for a single date. Therefore, using a map seems like a reasonable initial approach. You’ll use dates as keys, with values being lists of entries for given dates. With this in mind, the implementation of the new/0 function is straightforward.

Listing 4.1 Initializing a to-do list (simple_todo.ex)

defmodule TodoList do
  def new(), do: %{}
  ...
end

Next, you must implement the add_entry/3 function. This function expects a to-do list (which you know is a map) and must add the entry to the list under a given key (date). Of course, it’s possible that no entries for that date exist, so you need to cover that case as well. As it turns out, this can be done with a single call to the Map.update/4 function.

Listing 4.2 Adding an entry (simple_todo.ex)

defmodule TodoList do
  ...
  def add_entry(todo_list, date, title) do
    Map.update(
      todo_list,
      date,
      [title],                            ❶
      fn titles -> [title | titles] end   ❷
    )
  end
  ...
end

❶ Initial value

❷ Updater function

The Map.update/4 function receives a map, a key, an initial value, and an updater function. If no value exists for the given key, the initial value is used; otherwise, the updater function is called. The function receives the existing value and returns the new value for that key. In this case, you push the new entry to the top of the list. You may remember from chapter 2 that lists are most efficient when pushing new elements to the top. Therefore, you opt for a fast insertion operation but sacrifice ordering—more recently added entries are placed before the older ones in the list.

Finally, you need to implement the entries/2 function, which returns all entries for a given date, or an empty list, if no task exists for that date. This is fairly straightforward, as you can see in the following listing.

Listing 4.3 Querying the to-do list (simple_todo.ex)

defmodule TodoList do
  ...
  def entries(todo_list, date) do
    Map.get(todo_list, date, [])
  end
end

You fetch a value for the given date from todo_list, which must be a map. The third argument to Map.get/3 is a default value that’s returned if a given key isn’t present in the map.

4.1.2 Composing abstractions

Nothing stops you from creating one abstraction on top of another. In our initial take on the to-do list, there’s an opportunity to move some of the code into a separate abstraction.

Examine the way you operate on a map, allowing multiple values to be stored under a single key and retrieving all values for that key. This code could be moved to a separate abstraction. Let’s call this MultiDict, which is implemented in the next listing.

Listing 4.4 Implementing the MultiDict abstraction (todo_multi_dict.ex)

defmodule MultiDict do
  def new(), do: %{}
 
  def add(dict, key, value) do
    Map.update(dict, key, [value], &[value | &1])
  end
 
  def get(dict, key) do
    Map.get(dict, key, [])
  end
end

This is more or less a copy-and-paste of the initial to-do list implementation. The names are changed a bit, and you use the capture operator to shorten the updater lambda definition: &[value | &1].

With this abstraction in place, the TodoList module becomes much simpler.

Listing 4.5 TodoList relying on a MultiDict (todo_multi_dict.ex)

defmodule TodoList do
  def new(), do: MultiDict.new()
 
  def add_entry(todo_list, date, title) do
    MultiDict.add(todo_list, date, title)
  end
 
  def entries(todo_list, date) do
    MultiDict.get(todo_list, date)
  end
end

This is a classical separation of concerns, where you extract a distinct responsibility into a separate abstraction, and then create another abstraction on top of it. A distinct MultiDict abstraction is now readily available to be used in other places in code, if needed. Furthermore, you can extend TodoList with additional functions that are specific to to-do lists and which, therefore, don’t belong to MultiDict.

The point of this refactoring is to illustrate that the code organization isn’t radically different from an OO approach. You use different tools to create abstractions (modules and functions instead of classes and methods), but the general idea is the same.

4.1.3 Structuring data with maps

TodoList now supports the basic features. You can insert entries into the structure and get all entries for a given date, but the interface is somewhat clumsy. When adding a new entry, you must specify each field as a separate argument:

TodoList.add_entry(todo_list, ~D[2023-12-19], "Dentist")

If you want to extend an entry with another attribute—such as time—you must change the signature of the function, which will, in turn, break all the clients. Moreover, you must change every place in the implementation where this data is being propagated. An obvious solution to this problem is to, somehow, combine all entry fields as a single abstraction.

As explained in section 2.4.6, the most common way of doing this in Elixir is to use maps, with field names stored as keys of the atom type. The following snippet demonstrates how you can create and use an entry instance:

iex(1)> entry = %{date: ~D[2023-12-19], title: "Dentist"}
 
iex(2)> entry.date
~D[2023-12-19]
 
iex(3)> entry.title
"Dentist"

You can immediately adapt your code to represent entries with maps. As it turns out, this change is extremely simple. All you need to do is change the code of the TodoList.add_entry function to accept two arguments: a to-do list instance and a map that describes an entry. The new version is presented in the following listing.

Listing 4.6 Representing entries with maps (todo_entry_map.ex)

defmodule TodoList do
  ...
  def add_entry(todo_list, entry) do
    MultiDict.add(todo_list, entry.date, entry)
  end
  ...
end

That was simple! You assume an entry is a map and add it to MultiDict, using its date field as a key.

Let’s see this in action. To add a new entry, clients now must provide a map:

iex(1)> todo_list = TodoList.new() |>
TodoList.add_entry(%{date: ~D[2023-12-19], title: "Dentist"})

The client code is obviously more verbose because it must provide field names. But because entries are now structured in a map, data retrieval is improved. The TodoList.entries/2 function now returns complete entries, not just their titles:

iex(2)> TodoList.entries(todo_list, ~D[2023-12-19])
[%{date: ~D[2023-12-19], title: "Dentist"}]

The current implementation of TodoList relies on a map. This means that at run time, it’s impossible to make a distinction between a map and a TodoList instance. In some situations, you may want to define and enforce a more precise structure definition. For such cases, Elixir provides a feature called structs.

4.1.4 Abstracting with structs

Let’s say you need to deal with fractions in your program. A fraction is a part of a whole, represented in the form of a/b, where a and b are integers called a numerator and a denominator. Passing around these two values separately is noisy and error prone. Therefore, it makes sense to introduce a small abstraction to help working with fractions. The following snippet demonstrates how such an abstraction could be used:

$ iex fraction.ex
 
iex(1)> Fraction.new(1, 2)
        |> Fraction.add(Fraction.new(1, 4))
        |> Fraction.value()
0.75

Here, you sum one-half (1/2) and one-quarter (1/4) and return the numerical value of the resulting fraction. A fraction is created using Fraction.new/2 and is then passed to various other functions that know how to work with it.

How can you implement this? There are many approaches, such as relying on plain tuples or using maps. In addition, Elixir provides a facility called structs, which allows you to specify the abstraction structure up front and bind it to a module. Each module can define only one struct, which can then be used to create new instances and pattern-match on them.

A fraction has a well-defined structure, so you can use structs to specify and enforce data shape. Let’s see this in action.

To define a struct, use the defstruct macro (https://hexdocs.pm/elixir/Kernel.xhtml#defstruct/1).

Listing 4.7 Defining a structure (fraction.ex)

defmodule Fraction do
  defstruct a: nil, b: nil
  ...
end

A keyword list provided to defstruct defines the struct’s fields along with their initial values. You can now instantiate a struct using this special syntax:

iex(1)> one_half = %Fraction{a: 1, b: 2}
%Fraction{a: 1, b: 2}

Notice how a struct bears the name of the module it’s defined in. There’s a close relation between structs and modules. A struct may exist only in a module, and a single module can define only one struct.

Internally, a struct is a special kind of map. Therefore, individual fields are accessed just like maps:

iex(2)> one_half.a
1
 
iex(3)> one_half.b
2

The nice thing about structs is that you can pattern-match on them:

iex(4)> %Fraction{a: a, b: b} = one_half
%Fraction{a: 1, b: 2}
 
iex(5)> a
1
 
iex(6)> b
2

This makes it possible to assert that some variable is really a struct:

iex(6)> %Fraction{} = one_half                                    ❶
%Fraction{a: 1, b: 2}                                             ❶
 
iex(7)> %Fraction{} = %{a: 1, b: 2}                               ❷
** (MatchError) no match of right hand side value: %{a: 1, b: 2}  ❷

❶ Successful match

❷ A struct pattern doesn’t match a map.

Here, you use a %Fraction{} pattern that matches any Fraction struct, regardless of its contents. Pattern matching with structs works much like it does with maps. This means in a pattern match, you need to specify only the fields you’re interested in, ignoring all other fields.

Updating a struct works similarly to the way it works with maps:

iex(8)> one_quarter = %Fraction{one_half | b: 4}
%Fraction{a: 1, b: 4}

This code creates a new struct instance based on the original one (one_half), changing the value of the field b to 4.

The shape of the struct is defined at compile time. As a result, some errors can be caught by the Elixir compiler. For example, suppose we make a typo in the field name:

iex(9)> %Fraction{a: 1, d: 2}
** (KeyError) key :d not found

The struct doesn’t specify the field :d, so the error is reported. In contrast, if you used a regular map, this code would succeed. However, the program would fail in a distant place with a nonobvious reason, which would make the error more difficult to debug.

It’s worth noting that this error is reported at compile time. If you make the same mistake in a source file, the code won’t even compile.

Armed with this knowledge, let’s add some functionality to the Fraction abstraction. First, you need to provide the creation function.

Listing 4.8 Instantiating a fraction (fraction.ex)

defmodule Fraction do
  ...
  def new(a, b) do
    %Fraction{a: a, b: b}
  end
  ...
end

This is a simple wrapper around the %Fraction{} syntax. It makes the client code clearer and less coupled with the fact that structs are used.

Next, implement a Fraction.value/1 function that returns a float representation of the fraction.

Listing 4.9 Calculating the fraction value (fraction.ex)

defmodule Fraction do
  ...
 
  def value(%Fraction{a: a, b: b}) do     ❶
    a / b
  end
 
  ...
end

❶ Matches a fraction

The value/1 function matches a fraction, taking its fields into individual variables and using them to compute the final result. The benefit of pattern matching is that the input type is enforced. If you pass anything that isn’t a fraction instance, you’ll get a match error.

Instead of decomposing fields into variables, you could also use dot notation:

def value(fraction) do
  fraction.a / fraction.b
end

This version is arguably clearer, but on the flip side, it accepts any map, not just Fraction structs, which might lead to subtle bugs. For example, suppose there’s a Rectangle struct with the same fields. You could accidentally pass such a struct to this function, and instead of failing, the function would return some meaningless result.

One final thing to do is to implement the add function.

Listing 4.10 Adding two fractions (fraction.ex)

defmodule Fraction
  ...
 
  def add(%Fraction{a: a1, b: b1}, %Fraction{a: a2, b: b2}) do
    new(
      a1 * b2 + a2 * b1,
      b2 * b1
    )
  end
 
  ...
end

You can now test your fraction:

iex(1)> Fraction.new(1, 2)
        |> Fraction.add(Fraction.new(1, 4))
        |> Fraction.value()
0.75

The code works as expected. By representing fractions with a struct, you can provide the definition of your type, listing all fields and their default values. Furthermore, it’s possible to distinguish struct instances from any other data type. This allows you to place %Fraction{} matches in function arguments, thus asserting that you only accept fraction instances.

Structs vs. maps

You should always be aware that structs are simply maps, so they have the same characteristics with respect to performance and memory usage. But a struct instance receives special treatment. Some things that can be done with maps don’t work with structs. For example, you can’t call Enum functions on a struct:

iex(1)> one_half = Fraction.new(1, 2)
 
iex(2)> Enum.to_list(one_half)
** (Protocol.UndefinedError) protocol Enumerable not implemented for
  %Fraction{a: 1, b: 2}

Remember that a struct is a functional abstraction and should, therefore, behave according to the implementation of the module where it’s defined. In the case of the Fraction abstraction, you must define whether Fraction is enumerable and, if so, in what way. If this isn’t done, Fraction isn’t an enumerable, so you can’t call Enum functions on it.

In contrast, a plain map is an enumerable, so you can convert it to a list:

iex(3)> Enum.to_list(%{a: 1, b: 2})
[a: 1, b: 2]

On the other hand, because structs are maps, directly calling the Map functions works:

iex(4)> Map.to_list(one_half)
[__struct__: Fraction, a: 1, b: 2]

Notice the __struct__: Fraction bit. This key-value pair is automatically included in each struct. It helps Elixir distinguish structs from plain maps and perform proper runtime dispatches from within polymorphic generic code. You’ll learn more about this later, when we describe protocols.

The __struct__ field has an important consequence for pattern matching. A struct pattern can’t match a plain map:

iex(5)> %Fraction{a: a, b: b} = %{a: 1, b: 2}
** (MatchError) no match of right hand side value: %{a: 1, b: 2}

However, a plain map pattern can match a struct:

iex(5)> %{a: a, b: b} = %Fraction{a: 1, b: 2}
%Fraction{a: 1, b: 2}
 
iex(6)> a
1
 
iex(7)> b
2

This is due to the way pattern matching works with maps. Remember that all fields from the pattern must exist in the matched term. When matching a map to a struct pattern, this isn’t the case because %Fraction{} contains the field struct, which isn’t present in the map being matched.

The opposite works because you match a struct to the %{a: a, b: b} pattern. Because all these fields exist in the Fraction struct, the match is successful.

Records

In addition to maps and structs, there’s another way to structure data: records. This is a feature that lets you use tuples and still be able to access individual elements by name. Records can be defined using the defrecord and defrecordp macros from the Record module (https://hexdocs.pm/elixir/Record.xhtml).

Given that they’re essentially tuples, records should be faster than maps (although the difference usually isn’t significant in the grand scheme of things). On the flip side, the usage is more verbose, and it’s not possible to access fields by name dynamically.

Records are present mostly for historical reasons. Before maps were used, records were one of the main tools for structuring data. In fact, many libraries from the Erlang ecosystem use records as their interface. If you need to interface an Erlang library using a record defined in that library, you must import that record into Elixir and define it as a record. This can be done using the Record.extract/2 function in conjunction with the defrecord macro. This idiom isn’t required often, so records won’t be demonstrated here. Still, it may be useful to keep this information in the back of your mind and research it if the need arises.

4.1.5 Data transparency

The modules you’ve devised so far are abstractions because clients aren’t aware of their implementation details. For example, as a client, you call Fraction.new/2 to create an instance of the abstraction and then send that instance back to some other function from the same module.

But the entire data structure is always visible. As a client, you can obtain individual fraction values, even if this was not intended by the library developer.

It’s important to be aware that data in Elixir is always transparent. Clients can read any information from your structs (and any other data type), and there’s no easy way of preventing that. In that sense, encapsulation works differently than in typical OO languages. In Elixir, modules are in charge of abstracting the data and providing operations to manipulate and query that data, but the data is never hidden.

Let’s verify this in a shell session:

$ iex todo_entry_map.ex
 
iex(1)> todo_list = TodoList.new() |>
          TodoList.add_entry(%{date: ~D[2023-12-19], title: "Dentist"})
 
%{~D[2023-12-19] => [%{date: ~D[2023-12-19], title: "Dentist"}]}     ❶

❶ To-do list with a single element

Looking at the return value, you can see the entire structure of the to-do list. From the output, you can immediately tell that the to-do list is powered by a map, and you can also find out details about how individual entries are kept.

Let’s look at another example. A MapSet instance is also an abstraction, powered by the MapSet module and a corresponding struct. At first glance, this isn’t visible:

iex(1)> mapset = MapSet.new([:monday, :tuesday])
MapSet.new([:monday, :tuesday])

Notice how the result of the expression is printed in a special way, using the MapSet.new(...) output. This is due to the inspection mechanism in Elixir: whenever a result is printed in the shell, the function Kernel.inspect/1 is called to transform the structure into an inspected string. For each abstraction you build, you can override the default behavior and provide your own inspected format. This is exactly what MapSet does, and you’ll learn how to do this for a type later in this chapter, when we discuss protocols.

Occasionally, you may want to see the pure data structure, without this decorated output. This can be useful when you’re debugging, analyzing, or reverse engineering code. To do so, you can provide a special option to the inspect function:

iex(2)> IO.puts(inspect(mapset, structs: false))
%{__struct__: MapSet, map: %{monday: [], tuesday: []}, version: 2}

The output now reveals the complete structure of a date, and you can “see through” the MapSet abstraction. This demonstrates that data privacy can’t be fully enforced in Elixir. Remember from chapter 2 that the only complex types are tuples, lists, and maps. Any other abstraction, such as MapSet or your own TodoList, will ultimately be built on top of these types.

The benefit of data transparency is that the data can be easily inspected, which can be useful for debugging purposes. But as a client of an abstraction, you shouldn’t rely on its internal representation, even though it’s visible to you. You shouldn’t pattern-match on the internal structure or try to extract or modify individual parts of it because a proper abstraction, such as MapSet, doesn’t guarantee what the data will look like. The only guarantee is that the module’s functions will work if you send them a properly structured instance that you already received from that same module.

Sometimes, a module will publicly document some parts of its internal structure. Good examples of this are the date and time modules, such as Date, Time, and DateTime. Looking at the documentation, you’ll see explicit mention that the corresponding data is represented as a structure using fields such as year, month, hour, and so on. In this case, the structure of the data is publicly documented, and you can freely rely on it.

One final thing you should know, related to data inspection, is the IO.inspect/1 function. This function prints the inspected representation of a structure to the screen and returns the structure itself. This is particularly useful when debugging a piece of code. Look at the following example:

iex(1)> Fraction.new(1, 4) |>
          Fraction.add(Fraction.new(1, 4)) |>
          Fraction.add(Fraction.new(1, 2)) |>
          Fraction.value()
1.0

This code relies on the pipe operator to perform a series of fraction operations. Let’s say you want to inspect the entire structure after each step. You can easily insert the call to IO.inspect/1 after every line:

iex(2)> Fraction.new(1, 4) |>
          IO.inspect() |>
          Fraction.add(Fraction.new(1, 4)) |>
          IO.inspect() |>
          Fraction.add(Fraction.new(1, 2)) |>
          IO.inspect() |>
          Fraction.value()
 
%Fraction{a: 1, b: 4}     ❶
%Fraction{a: 8, b: 16}    ❶
%Fraction{a: 32, b: 32}   ❶

❶ Output of each IO.inspect call

This works because IO.inspect/1 prints the data structure and then returns that same data structure, unchanged. You can also take a look at the dbg macro (https://hexdocs.pm/elixir/Kernel.xhtml#dbg/2), which is somewhat similar but provides more debugging features.

We’re now done with the basic theory behind functional abstractions, but you’ll practice some more by extending the to-do list.

4.2 Working with hierarchical data

In this section, you’ll extend the TodoList abstraction to provide basic CRUD support. You already have the C and R parts resolved with the add_entry/2 and entries/2 functions, respectively. Now, you need to add support for updating and deleting entries. To do this, you must be able to uniquely identify each entry in the to-do list, so you’ll begin by adding unique ID values to each entry.

4.2.1 Generating IDs

When adding a new entry to the list, you’ll autogenerate its ID value, using incremental integers for IDs. To implement this, you have to do a couple of things:

Represent the to-do list as a struct. You need to do this because the to-do list now has to keep two pieces of information: the entries collection and the ID value for the next entry.
Use the entry’s ID as the key. So far, when storing entries in a collection, you used the entry’s date as the key. You’ll change this and use the entry’s ID instead. This will make it possible to quickly insert, update, and delete individual entries. You’ll now have exactly one value per each key, so you won’t need the MultiDict abstraction anymore.

Let’s start implementing this. The code in the following listing contains the module and struct definitions.

Listing 4.11 TodoList struct (todo_crud.ex)

defmodule TodoList do
  defstruct next_id: 1, entries: %{}   ❶
 
  def new(), do: %TodoList{}           ❷
  ...
end

❶ Struct that describes the to-do list

❷ Creates a new instance

The to-do list will now be represented as a struct with two fields. The field next_id contains the ID value that will be assigned to the new entry while it’s being added to the structure. The field entries is the collection of entries. As has been mentioned, you’re now using a map, and the keys are entry ID values.

During the struct definition, the default values for the next_id and entries fields are immediately specified. Therefore, you don’t have to provide these when creating a new instance. The new/0 function creates and returns an instance of the struct.

Next, it’s time to reimplement the add_entry/2 function. It has to do more work:

Set the ID for the entry being added.
Add the new entry to the collection.
Increment the next_id field.

Here’s the code.

Listing 4.12 Autogenerating ID values for new entries (todo_crud.ex)

defmodule TodoList do
  ...
 
  def add_entry(todo_list, entry) do
    entry = Map.put(entry, :id, todo_list.next_id)   ❶
 
    new_entries = Map.put(                           ❷
      todo_list.entries,                             ❷
      todo_list.next_id,                             ❷
      entry                                          ❷
    )
 
    %TodoList{todo_list |                            ❸
      entries: new_entries,                          ❸
      next_id: todo_list.next_id + 1                 ❸
    }
  end
 
  ...
end

❶ Sets the new entry’s ID

❷ Adds the new entry to the entries list

❸ Updates the struct

A lot happens here, so let’s take each step, one at a time.

In the function body, you first update the entry’s id value with the value stored in the next_id field. Notice how you use Map.put/3 to update the entry map. The input map may not contain the id field, so you can’t use the standard %{entry | id: next_id} technique, which only works if the id field is already present in the map. Once the entry is updated, you add it to the entries collection, keeping the result in the new_entries variable.

Finally, you must update the TodoList struct instance, setting its entries field to the new_entries collection and incrementing the next_id field. Essentially, you made a complex change in the struct, modifying multiple fields as well as the input entry (because you set its id field).

To the external caller, the entire operation will be atomic. Either everything will happen or, in case of an error, nothing at all. This is the consequence of immutability. The effect of adding an entry is visible to others only when the add_entry/2 function finishes and its result is taken into a variable. If something goes wrong and you raise an error, the effect of any transformations won’t be visible.

It’s also worth repeating, as mentioned in chapter 2, that the new to-do list (the one returned by the add_entry/2 function) will share as much memory as possible with the input to-do list.

With the add_entry/2 function finished, you need to adjust the entries/2 function. This will be more complicated because you changed the internal structure. Earlier, you kept a date-to-entries mapping. Now, entries are stored using id as the key, so you have to iterate through all the entries and return the ones that fall on a given date.

Listing 4.13 Filtering entries for a given date (todo_crud.ex)

defmodule TodoList do
  ...
 
  def entries(todo_list, date) do
    todo_list.entries
    |> Map.values()                                      ❶
    |> Enum.filter(fn entry -> entry.date == date end)   ❷
  end
 
  ...
end

❶ Takes the values

❷ Filters entries for a given date

This function first uses Map.values/1 to take the entries from the entries map. Then, only the entries that fall on the given date are taken using Enum.filter/2.

You can check whether your new version of the to-do list works:

$ iex todo_crud.ex
 
iex(1)> todo_list = TodoList.new() |>
          TodoList.add_entry(%{date: ~D[2023-12-19], title: "Dentist"}) |>
          TodoList.add_entry(%{date: ~D[2023-12-20], title: "Shopping"}) |>
          TodoList.add_entry(%{date: ~D[2023-12-19], title: "Movies"})
 
iex(2)> TodoList.entries(todo_list, ~D[2023-12-19])
[
  %{date: ~D[2023-12-19], id: 1, title: "Dentist"},
  %{date: ~D[2023-12-19], id: 3, title: "Movies"}
]

This works as expected, and you can even see the ID value for each entry. Also note that the interface of the TodoList module is the same as the previous version. You’ve made a number of internal modifications, changed the data representation, and practically rewritten the entire module. And yet the module’s clients don’t need to be altered because you kept the same interface for your functions.

This is nothing revolutionary—it’s a classical benefit of wrapping the behavior behind a properly chosen interface. However, it demonstrates how you can construct and reason about higher-level types when working with stateless modules and immutable data.

4.2.2 Updating entries

Now that your entries have ID values, you can add additional modifier operations. Let’s implement the update_entry operation, which can be used to modify a single entry in the to-do list.

This function will accept an entry ID, and an updater lambda, which will be invoked to update the entry. This will work similarly to Map.update. The lambda will receive the original entry and return its modified version. To keep things simple, the function won’t raise an error if the entry with a given ID doesn’t exist.

The following snippet illustrates the usage. Here, you modify the date of an entry that has an ID value of 1:

iex(1)> TodoList.update_entry(
          todo_list,
          1,                                     ❶
          &Map.put(&1, :date, ~D[2023-12-20])    ❷
        )

❶ ID of the entry to be modified

❷ Modifies an entry date

The implementation is presented in the following listing.

Listing 4.14 Updating an entry (todo_crud.ex)

defmodule TodoList do
  ...
 
  def update_entry(todo_list, entry_id, updater_fun) do
    case Map.fetch(todo_list.entries, entry_id) do
      :error ->                              ❶
        todo_list
 
      {:ok, old_entry} ->                    ❷
        new_entry = updater_fun.(old_entry)
        new_entries = Map.put(todo_list.entries, new_entry.id, new_entry)
        %TodoList{todo_list | entries: new_entries}
    end
  end
 
  ...
end

❶ No entry—returns the unchanged list

❷ Entry exists—performs the update and returns the modified list

Let’s break down what happens here. First, you look up the entry with the given ID, using Map.fetch/2. The function will return :error if the entry doesn’t exist and {:ok, value} otherwise.

In the first case, if the entry doesn’t exist, you return the original version of the list. Otherwise, you have to call the updater lambda to get the modified entry. Then, you store the modified entry into the entries collection. Finally, you store the modified entries collection in the TodoList instance and return that instance.

4.2.3 Immutable hierarchical updates

You may not have noticed, but in the previous example, you performed a deep update of an immutable hierarchy. Let’s break down what happens when you call TodoList .update_entry(todo_list, id, updater_lambda):

You take the target entry into a separate variable.
You call the updater that returns the modified version of the entry to you.
You call Map.put to put the modified entry into the entries collection.
You return the new version of the to-do list, which contains the new entries collection.

Notice that steps 2, 3, and 4 are the ones in which you transform data. Each of these steps creates a new variable that contains the transformed data. In each subsequent step, you take this data and update its container, again by creating a transformed version of it.

This is how you work with immutable data structures. If you have hierarchical data, you can’t directly modify part of it that resides deep in its tree. Instead, you must walk down the tree to the particular part that needs to be modified and then transform it and all of its ancestors. The result is a copy of the entire model (in this case, the to-do list). As mentioned, the two versions—new and previous—will share as much memory as possible.

Provided helpers

Although the technique presented works, it may become cumbersome for deeper hierarchies. Remember that to update an element deep in the hierarchy, you must walk to that element and then update all of its parents. To simplify this, Elixir offers support for more elegant, deep, hierarchical updates.

Let’s look at a basic example. Suppose the to-do list is represented as a simple map, where keys are IDs and values are plain maps consisting of fields. Let’s create one instance of such a to-do list:

iex(1)> todo_list = %{
  1 => %{date: ~D[2023-12-19], title: "Dentist"},
  2 => %{date: ~D[2023-12-20], title: "Shopping"},
  3 => %{date: ~D[2023-12-19], title: "Movies"}
}

Now, let’s say you change your mind and want to go to the theater instead of a movie. The original structure can be modified elegantly using the Kernel.put_in/2 macro:

iex(2)> put_in(todo_list[3].title, "Theater")       ❶
 
%{
  1 => %{date: ~D[2023-12-19], title: "Dentist"},
  2 => %{date: ~D[2023-12-20], title: "Shopping"},
  3 => %{date: ~D[2023-12-19], title: "Theater"}    ❷
}

❶ Hierarchical update

❷ The entry title is updated.

What happened here? Internally, put_in/2 does something similar to what you did. It walks recursively to the desired element, transforms it, and then updates all the parents. Notice that this is still an immutable operation, meaning the original structure is left intact, and you must assign the result to a variable.

To be able to do a recursive walk, put_in/2 needs to receive source data and a path to the target element. In the preceding example, the source is provided as todo_list and the path is specified as [3].title. The macro put_in/2 then walks down that path, rebuilding the new hierarchy on the way up.

It’s also worth noting that Elixir provides similar alternatives for data retrieval and updates in the form of the get_in/2, update_in/2, and get_and_update_in/2 macros. The fact that these are macros means the path you provide is evaluated at compile time and can’t be built dynamically.

If you need to construct paths at run time, there are equivalent functions that accept the data and path as separate arguments. For example, Elixir also includes the put_in/3 macro, which can be used as follows:

iex(3)> path = [3, :title]
 
iex(4)> put_in(todo_list, path, "Theater")   ❶

❶ Using a path constructed at runtime

Functions and macros, such as put_in, rely on the Access module, which allows you to work with key-value structures, such as maps. You can also make your own abstraction work with Access. You need to implement a couple of functions required by the Access contract, and then put_in and related macros and functions will know how to work with your own abstraction. Refer to the official Access documentation (https://hexdocs.pm/elixir/Access.xhtml) for more details.

Exercise: Deleting an entry

Your TodoList module is almost complete. You’ve already implemented create (add_entry/2), retrieve (entries/2), and update (update_entry/3) operations. The last thing to implement is the delete_entry/2 operation. This is straightforward, and it’s left for you to do as an exercise. If you get stuck, the solution is provided in the source file todo_crud.ex.

4.2.4 Iterative updates

So far, you’ve been doing updates manually, one at a time. Now, it’s time to implement iterative updates. Imagine you have a raw list describing the entries:

$ iex todo_builder.ex
 
iex(1)> entries = [
          %{date: ~D[2023-12-19], title: "Dentist"},
          %{date: ~D[2023-12-20], title: "Shopping"},
          %{date: ~D[2023-12-19], title: "Movies"}
        ]

Now, you want to create an instance of the to-do list that contains all of these entries:

iex(2)> todo_list = TodoList.new(entries)

It’s obvious that the new/1 function performs an iterative build of the to-do list. How can you implement such a function? As it turns out, this is simple.

Listing 4.15 Iteratively building the to-do list (todo_builder.ex)

defmodule TodoList do
  ...
 
  def new(entries \\ []) do
    Enum.reduce(
      entries,
      %TodoList{},                       ❶
      fn entry, todo_list_acc ->         ❷
        add_entry(todo_list_acc, entry)
      end
    )
  end
 
  ...
end

❶ Initial accumulator value

❷ Iteratively updates the accumulator

To build the to-do list iteratively, you’re relying on Enum.reduce/3. Recall from chapter 3 that reduce is used to transform something enumerable to anything else. In this case, you’re transforming a raw list of Entry instances into an instance of the TodoList struct. Therefore, you call Enum.reduce/3, passing the input list as the first argument, the new structure instance as the second argument (the initial accumulator value), and the lambda that’s called in each step.

The lambda is called for each entry in the input list. Its task is to add the entry to the current accumulator (TodoList struct) and return the new accumulator value. To do this, the lambda delegates to the already-present add_entry/2 function, reversing the argument order. The arguments need to be reversed because Enum.reduce/3 calls the lambda, passing the iterated element (entry) and accumulator (TodoList struct). In contrast, add_entry accepts a struct and an entry.

Notice that you can make the lambda definition more compact with the help of the capture operator:

def new(entries \\ []) do
  Enum.reduce(
    entries,
    %TodoList{},
    &add_entry(&2, &1)    ❶
  )
end

❶ Reverses the order of arguments and delegates to add_entry/2

Whether you use this version or the previous one is entirely based on your personal taste.

4.2.5 Exercise: Importing from a file

Now, it’s time for you to practice a bit. In this exercise, you’ll create a TodoList instance from a comma-separated file.

Assume you have a todos.csv file in the current folder. Each line in the file describes a single to-do entry:

2023-12-19,Dentist
2023-12-20,Shopping
2023-12-19,Movies

Your task is to create an additional module, TodoList.CsvImporter, that can be used to create a TodoList instance from the file contents:

iex(1)> todo_list = TodoList.CsvImporter.import("todos.csv")

To simplify the task, assume the file is always available and in the correct format. Also assume that the comma character doesn’t appear in the entry title.

This is generally not hard to do, but it might require some cracking and experimenting. The following are a couple of hints that will lead you in the right direction.

First, create a single file with the following layout:

defmodule TodoList do
  ...
end
 
defmodule TodoList.CsvImporter do
  ...
end

Always work in small steps. Implement part of the calculation, and then print the result to the screen using IO.inspect/1. I can’t stress enough how important this is. This task requires some data pipelining. Working in small steps will allow you to move gradually and verify that you’re on the right track.

The general steps you should undertake are as follows:

Open a file and go through it, removing \n from each line. Hint: Use File.stream!/1, Stream.map/2, and String.trim_trailing/2. You did this in chapter 3, when we talked about streams, in the example where you filtered lines longer than 80 characters.
Using Stream.map, transform each line obtained from the previous step into a to-do list entry.
1. Convert the line into a [date_string, title] list, using String.split/2.
2. Convert the date string into a date, using Date.from_iso8601! (https://hexdocs.pm/elixir/Date.xhtml#from_iso8601!/2).
3. Create the to-do list entry (a map in the shape of %{date: date, title: title}).

The output of step 2 is an enumerable that consists of to-do entries. Pass that enumerable to the TodoList.new/1 function you recently implemented.

In each of these steps, you’ll receive an enumerable as an input, transform each element, and pass the resulting enumerable forward to the next step. In the final step, the resulting enumerable is passed to the already implemented TodoList.new/1, and the to-do list is created.

If you work in small steps, it’s harder to get lost. For example, you can start by opening a file and printing each line to the screen. Then, try to remove the trailing newline from each line and print them to the screen, and so on.

While transforming the data in each step, you can work with Enum functions or functions from the Stream module. It will probably be simpler to start with eager functions from the Enum module and get the entire thing to work. Then, try to replace as many of the Enum functions as possible with their Stream counterparts. Recall from chapter 3 that the Stream functions are lazy and composable, which can reduce the amount of intermediate memory required for the operation. If you get lost, the solution is provided in the todo_import.ex file.

In the meantime, we’re almost done with our exploration of higher-level abstractions. The final topic we’ll briefly discuss is the Elixir way of doing polymorphism.

4.3 Polymorphism with protocols

Polymorphism is a runtime decision about which code to execute, based on the nature of the input data. In Elixir, the basic (but not the only) way of doing this is by using the language feature called protocols.

Before discussing protocols, let’s see them in action. You’ve already seen polymorphic code. For example, the entire Enum module is generic code that works on anything enumerable, as the following snippet illustrates:

Enum.each([1, 2, 3], &IO.inspect/1)
Enum.each(1..3, &IO.inspect/1)
Enum.each(%{a: 1, b: 2}, &IO.inspect/1)

Notice how you use the same Enum.each/2 function, sending it different data structures: a list, range, and map. How does Enum.each/2 know how to walk each structure? It doesn’t. The code in Enum.each/2 is generic and relies on a contract. This contract, called a protocol, must be implemented for each data type you wish to use with Enum functions. Next, let’s learn how to define and use protocols.

4.3.1 Protocol basics

A protocol is a module in which you declare functions without implementing them. Consider it a rough equivalent of an OO interface. The generic logic relies on the protocol and calls its functions. Then, you can provide a concrete implementation of the protocol for different data types.

Let’s look at an example. The protocol String.Chars is provided by the Elixir standard library and is used to convert data to a binary string. This is how the protocol is defined in the Elixir source:

defprotocol String.Chars do    ❶
  def to_string(term)          ❷
end

❶ Definition of the protocol

❷ Declaration of protocol functions

This resembles the module definition, with the notable difference that functions are declared but not implemented.

Notice the first argument of the function (the term). At runtime, the type of this argument determines the implementation that’s called. Let’s see this in action. Elixir already implements the protocol for atoms, numbers, and some other data types, so you can issue the following calls:

iex(1)> String.Chars.to_string(1)
"1"
 
iex(2)> String.Chars.to_string(:an_atom)
"an_atom"

If the protocol isn’t implemented for the given data type, an error is raised:

iex(3)> String.Chars.to_string(TodoList.new())
** (Protocol.UndefinedError) protocol String.Chars not implemented

Usually, you don’t need to call the protocol function directly. More often, there’s generic code that relies on the protocol. In the case of String.Chars, this is the auto-imported function Kernel.to_string/1:

iex(4)> to_string(1)
"1"
 
iex(5)> to_string(:an_atom)
"an_atom"
 
iex(6)> to_string(TodoList.new())
** (Protocol.UndefinedError) protocol String.Chars not implemented

As you can see, the behavior of to_string/1 is exactly the same as that of String.Chars.to_string/1. This is because Kernel.to_string/1 delegates to the String.Chars implementation.

In addition, you can send anything that implements String.Chars to IO.puts/1:

iex(7)> IO.puts(1)
1
 
iex(8)> IO.puts(:an_atom)
an_atom
 
iex(9)> IO.puts(TodoList.new())
** (Protocol.UndefinedError) protocol String.Chars not implemented

As you can see, an instance of the TodoList isn’t printable because String.Chars isn’t implemented for the corresponding type.

4.3.2 Implementing a protocol

How do you implement a protocol for a specific type? Let’s refer to the Elixir source again. The following snippet implements String.Chars for integers:

defimpl String.Chars, for: Integer do
  def to_string(term) do
    Integer.to_string(term)
  end
end

You start the implementation by calling the defimpl macro. Then, you specify which protocol to implement and the corresponding data type. Finally, the do/end block contains the implementation of each protocol function. In the example, the implementation delegates to the existing standard library function Integer.to_string/1.

The for: Type part deserves some explanation. The type is an atom and can be any of following aliases: Tuple, Atom, List, Map, BitString, Integer, Float, Function, PID, Port, or Reference. These values correspond to built-in Elixir types.

In addition, the alias Any is allowed, which makes it possible to specify a fallback implementation. If a protocol isn’t defined for a given type, an error will be raised, unless a fallback to Any is specified in the protocol definition and an Any implementation exists. Refer to the protocol documentation (https://hexdocs.pm/elixir/Protocol.xhtml) for details.

Finally, and most importantly, the type can be any other arbitrary alias (but not a regular, simple atom):

defimpl String.Chars, for: SomeAlias do
  ...
end

This implementation will be called if the first argument of the protocol function is a struct defined in the corresponding module. For example, you can implement String.Chars for TodoList as follows:

iex(1)> defimpl String.Chars, for: TodoList do
          def to_string(_) do
            "#TodoList"
          end
        end

Now, you can pass a to-do list instance to IO.puts/1:

iex(2)> IO.puts(TodoList.new())
#TodoList

It’s important to notice that the protocol implementation doesn’t need to be part of any module. This has powerful consequences: you can implement a protocol for a type, even if you can’t modify the type’s source code. You can place the protocol implementation anywhere in your own code, and the runtime will be able to take advantage of it.

4.3.3 Built-in protocols

Elixir comes with some predefined protocols. It’s best to consult the online documentation for the complete reference (https://hexdocs.pm/elixir), but let’s mention some of the most important ones.

You’ve already seen String.Chars, which specifies the contract for converting data into a binary string. There’s also the List.Chars protocol, which converts input data to a character string (a list of characters).

If you want to control how your structure is printed in the debug output (via the inspect function), you can implement the Inspect protocol.

Arguably, the most important protocol is Enumerable. By implementing it, you can make your data structure enumerable. This means you can use all the functions from the Enum and Stream modules for free! This is probably the best demonstration of protocol usefulness. Both Enum and Stream are generic modules that offer many useful functions, which can work on your custom data structures as soon as you implement the Enumerable protocol.

Closely related to enumeration is the Collectable protocol. Recall from chapter 3 that a collectable structure is one that you can repeatedly add elements to. A collectable can be used with comprehensions to collect results or with Enum.into/2 to transfer elements of one structure (enumerable) to another (collectable).

And, of course, you can define your own protocols and implement them for any available data structure (your own or someone else’s). See the Kernel.defprotocol/2 documentation for more information.

Collectable to-do list

Let’s look at a more involved example. You’ll make your to-do list collectable so that you can use it as a comprehension target. This is a slightly more advanced example, so don’t worry if you don’t understand each detail in the first go.

To make the abstraction collectable, you must implement the corresponding protocol:

defimpl Collectable, for: TodoList do                   ❶
  def into(original) do                                 ❶
    {original, &into_callback/2}                        ❶
  end
 
  defp into_callback(todo_list, {:cont, entry}) do      ❷
    TodoList.add_entry(todo_list, entry)                ❷
  end                                                   ❷
                                                        ❷
  defp into_callback(todo_list, :done), do: todo_list   ❷
  defp into_callback(_todo_list, :halt), do: :ok        ❷
end

❶ Returns the appender lambda

❷ Appender implementation

The exported function into/1 is called by the generic code (e.g., comprehensions). Here, you provide the implementation that returns the appender lambda. This appender lambda is then repeatedly invoked by the generic code to append each element to your data structure.

The appender function receives a to-do list and an instruction hint. If you receive {:cont, entry}, you must add a new entry. If you receive :done, you return the list, which, at this point, contains all appended elements. Finally, :halt indicates that the operation has been canceled, and the return value is ignored.

Let’s see this in action. Copy and paste the previous code into the shell, and then try the following:

iex(1)> entries = [
          %{date: ~D[2023-12-19], title: "Dentist"},
          %{date: ~D[2023-12-20], title: "Shopping"},
          %{date: ~D[2023-12-19], title: "Movies"}
        ]
 
iex(2)> Enum.into(entries, TodoList.new())    ❶
%TodoList{...}

❶ Collecting into a TodoList

By implementing the Collectable protocol, you essentially adapt the TodoList abstraction to any generic code that relies on that protocol, such as Enum.into/2 or for comprehension.

Summary

A module is used to create an abstraction. A module’s functions create, manipulate, and query data. Clients can inspect the entire structure but shouldn’t rely on its shape.
Maps can be used to group different fields together in a single structure.
Structs are special kinds of maps that allow you to define data abstractions related to a module.
Polymorphism can be implemented with protocols. A protocol defines an interface that is used by the generic logic. You can then provide specific protocol implementations for a data type.