This chapter discusses building higher-level data structures. In any complex system, there will be a need for abstractions such as Money
, Date
, Employee
, and OrderItem
. These are all textbook examples of higher-level abstractions that usually aren’t directly supported by the language and are instead written on top of built-in types.
In Elixir, such abstractions are implemented with pure, stateless modules. In this chapter, you’ll learn how to create and work with your own abstractions.
In a typical object-oriented (OO) language, the basic abstraction building blocks are classes and objects. For example, there may be a String
class that implements various string operations. Each string is then an instance of that class and can be manipulated by calling methods, as the following Ruby snippet illustrates:
"a string".upcase
This approach generally isn’t used in Elixir. Being a functional language, Elixir promotes decoupling of data from the code. Instead of classes, you use modules, which are collections of functions. Instead of calling methods on objects, you explicitly call module functions and provide input data via arguments. The following snippet shows the Elixir way of uppercasing a string:
String.upcase("a string")
Another big difference from OO languages is that data is immutable. To modify data, you must call some function and take its result into a variable; the original data is left intact. The following examples demonstrate this technique:
iex(1)> list = [] [] iex(2)> list = List.insert_at(list, -1, :a) [:a] iex(3)> list = List.insert_at(list, -1, :b) [:a, :b] iex(4)> list = List.insert_at(list, -1, :c) [:a, :b, :c]
In these examples, you’re constantly keeping the result of the last operation and feeding it to the next one.
The important thing to notice in both Elixir snippets is that the module is used as the abstraction over the data type. When you need to work with strings, you reach for the String
module. When you need to work with lists, you use the List
module.
String
and List
are examples of modules that are dedicated to a specific data type. They’re implemented in pure Elixir, and their functions rely on the predefined format of the input data. String
functions expect a binary string as the first argument, whereas List
functions expect a list.
Additionally, modifier functions (the ones that transform the data) return data of the same type. The function String.upcase/1
returns a binary string, whereas List.insert_at/3
returns a list.
Finally, a module also contains query functions that return some piece of information from the data, such as String.length/1
and List.first/1
. Such functions still expect an instance of the abstraction as the first argument, but they return another type of information.
The basic principles of abstraction in Elixir can be summarized as follows:
The module’s functions usually expect an instance of the abstraction as the first argument.
Modifier functions return a modified version of the abstraction.
Given these principles, it’s fairly straightforward to create your own higher-level abstractions, as you’ll see in the next section.
Lists and strings are lower-level types, but higher-level abstractions are based on the previously stated principles. In fact, you already saw examples of a higher-level abstraction in chapter 2. For example, a MapSet
module implements a set. MapSet
is implemented in pure Elixir and can serve as a good template for designing an abstraction in Elixir.
Let’s look at an example that uses MapSet
:
iex(1)> days = MapSet.new() |> ❶ MapSet.put(:monday) |> ❷ MapSet.put(:tuesday) ❷ iex(2)> MapSet.member?(days, :monday) ❸ true
❶ Creates an abstraction instance
This approach more or less follows the principles stated earlier. The code is slightly simplified by using the pipe operator to chain operations together. This is possible because all the functions from the MapSet
module take a set as the first argument. Such functions are pipe friendly and can be chained with the |>
operator.
Notice the new/0
function that creates a new instance of the abstraction. There’s nothing special about this function, and it could have been given any name. Its only purpose is to create a new data structure you can work with.
Because MapSet
is an abstraction, you, as a client of this module, don’t concern yourself with its internal workings or its data structure. You call MapSet
functions, keeping whatever result you get and passing that result back to functions from the same module.
Note You may think that abstractions like MapSet
are something like user-defined types. Although there are many similarities, module-based abstractions aren’t proper data types, like the ones explained in chapter 2. Instead, they’re implemented by composing built-in data types. For example, a MapSet
instance is also a map, which you can verify by invoking is_map(MapSet.new())
.
Given this template, let’s try to build a simple abstraction.
The example in this section is a simple to-do list. The problem is, admittedly, not spectacular, but it’s complex enough to give you something to work with while not being overly complicated. This will allow you to focus on techniques without spending too much time trying to grasp the problem itself.
The basic version of the to-do list will support the following features:
Here’s an example of the desired usage:
$ iex simple_todo.ex iex(1)> todo_list = TodoList.new() |> TodoList.add_entry(~D[2023-12-19], "Dentist") |> TodoList.add_entry(~D[2023-12-20], "Shopping") |> TodoList.add_entry(~D[2023-12-19], "Movies") iex(2)> TodoList.entries(todo_list, ~D[2023-12-19]) ["Movies", "Dentist"] iex(3)> TodoList.entries(todo_list, ~D[2023-12-18]) []
This is fairly self-explanatory. You create an instance by calling TodoList.new/0
, and then you add some entries. Finally, you execute some queries. The expression ~D[2023-12-19]
, as explained in section 2.4.11, creates a date (December 19, 2023), powered by the Date
module.
As the chapter progresses, you’ll add additional features and modify the interface slightly. You’ll continue adding features throughout this book, and by the end, you’ll have a fully working distributed web server that can manage a large number of to-do lists.
For now, let’s start with this simple interface. First, you must decide on the internal data representation. In the preceding snippet, you can see that the primary use case is finding all entries for a single date. Therefore, using a map seems like a reasonable initial approach. You’ll use dates as keys, with values being lists of entries for given dates. With this in mind, the implementation of the new/0
function is straightforward.
Listing 4.1 Initializing a to-do list (simple_todo.ex)
defmodule TodoList do def new(), do: %{} ... end
Next, you must implement the add_entry/3
function. This function expects a to-do list (which you know is a map) and must add the entry to the list under a given key (date). Of course, it’s possible that no entries for that date exist, so you need to cover that case as well. As it turns out, this can be done with a single call to the Map.update/4
function.
Listing 4.2 Adding an entry (simple_todo.ex)
defmodule TodoList do ... def add_entry(todo_list, date, title) do Map.update( todo_list, date, [title], ❶ fn titles -> [title | titles] end ❷ ) end ... end
The Map.update/4
function receives a map, a key, an initial value, and an updater function. If no value exists for the given key, the initial value is used; otherwise, the updater function is called. The function receives the existing value and returns the new value for that key. In this case, you push the new entry to the top of the list. You may remember from chapter 2 that lists are most efficient when pushing new elements to the top. Therefore, you opt for a fast insertion operation but sacrifice ordering—more recently added entries are placed before the older ones in the list.
Finally, you need to implement the entries/2
function, which returns all entries for a given date, or an empty list, if no task exists for that date. This is fairly straightforward, as you can see in the following listing.
Listing 4.3 Querying the to-do list (simple_todo.ex)
defmodule TodoList do ... def entries(todo_list, date) do Map.get(todo_list, date, []) end end
You fetch a value for the given date from todo_list
, which must be a map. The third argument to Map.get/3
is a default value that’s returned if a given key isn’t present in the map.
Nothing stops you from creating one abstraction on top of another. In our initial take on the to-do list, there’s an opportunity to move some of the code into a separate abstraction.
Examine the way you operate on a map, allowing multiple values to be stored under a single key and retrieving all values for that key. This code could be moved to a separate abstraction. Let’s call this MultiDict
, which is implemented in the next listing.
Listing 4.4 Implementing the MultiDict
abstraction (todo_multi_dict.ex)
defmodule MultiDict do def new(), do: %{} def add(dict, key, value) do Map.update(dict, key, [value], &[value | &1]) end def get(dict, key) do Map.get(dict, key, []) end end
This is more or less a copy-and-paste of the initial to-do list implementation. The names are changed a bit, and you use the capture operator to shorten the updater lambda definition: &[value | &1]
.
With this abstraction in place, the TodoList
module becomes much simpler.
Listing 4.5 TodoList
relying on a MultiDict
(todo_multi_dict.ex)
defmodule TodoList do def new(), do: MultiDict.new() def add_entry(todo_list, date, title) do MultiDict.add(todo_list, date, title) end def entries(todo_list, date) do MultiDict.get(todo_list, date) end end
This is a classical separation of concerns, where you extract a distinct responsibility into a separate abstraction, and then create another abstraction on top of it. A distinct MultiDict
abstraction is now readily available to be used in other places in code, if needed. Furthermore, you can extend TodoList
with additional functions that are specific to to-do lists and which, therefore, don’t belong to MultiDict
.
The point of this refactoring is to illustrate that the code organization isn’t radically different from an OO approach. You use different tools to create abstractions (modules and functions instead of classes and methods), but the general idea is the same.
TodoList
now supports the basic features. You can insert entries into the structure and get all entries for a given date, but the interface is somewhat clumsy. When adding a new entry, you must specify each field as a separate argument:
TodoList.add_entry(todo_list, ~D[2023-12-19], "Dentist")
If you want to extend an entry with another attribute—such as time—you must change the signature of the function, which will, in turn, break all the clients. Moreover, you must change every place in the implementation where this data is being propagated. An obvious solution to this problem is to, somehow, combine all entry fields as a single abstraction.
As explained in section 2.4.6, the most common way of doing this in Elixir is to use maps, with field names stored as keys of the atom type. The following snippet demonstrates how you can create and use an entry
instance:
iex(1)> entry = %{date: ~D[2023-12-19], title: "Dentist"} iex(2)> entry.date ~D[2023-12-19] iex(3)> entry.title "Dentist"
You can immediately adapt your code to represent entries with maps. As it turns out, this change is extremely simple. All you need to do is change the code of the TodoList.add_entry
function to accept two arguments: a to-do list instance and a map that describes an entry. The new version is presented in the following listing.
Listing 4.6 Representing entries with maps (todo_entry_map.ex)
defmodule TodoList do ... def add_entry(todo_list, entry) do MultiDict.add(todo_list, entry.date, entry) end ... end
That was simple! You assume an entry is a map and add it to MultiDict
, using its date field as a key.
Let’s see this in action. To add a new entry, clients now must provide a map:
iex(1)> todo_list = TodoList.new() |> TodoList.add_entry(%{date: ~D[2023-12-19], title: "Dentist"})
The client code is obviously more verbose because it must provide field names. But because entries are now structured in a map, data retrieval is improved. The TodoList.entries/2
function now returns complete entries, not just their titles:
iex(2)> TodoList.entries(todo_list, ~D[2023-12-19]) [%{date: ~D[2023-12-19], title: "Dentist"}]
The current implementation of TodoList
relies on a map. This means that at run time, it’s impossible to make a distinction between a map and a TodoList
instance. In some situations, you may want to define and enforce a more precise structure definition. For such cases, Elixir provides a feature called structs.
Let’s say you need to deal with fractions in your program. A fraction is a part of a whole, represented in the form of a/b
, where a
and b
are integers called a numerator and a denominator. Passing around these two values separately is noisy and error prone. Therefore, it makes sense to introduce a small abstraction to help working with fractions. The following snippet demonstrates how such an abstraction could be used:
$ iex fraction.ex iex(1)> Fraction.new(1, 2) |> Fraction.add(Fraction.new(1, 4)) |> Fraction.value() 0.75
Here, you sum one-half (1/2) and one-quarter (1/4) and return the numerical value of the resulting fraction. A fraction is created using Fraction.new/2
and is then passed to various other functions that know how to work with it.
How can you implement this? There are many approaches, such as relying on plain tuples or using maps. In addition, Elixir provides a facility called structs, which allows you to specify the abstraction structure up front and bind it to a module. Each module can define only one struct, which can then be used to create new instances and pattern-match on them.
A fraction has a well-defined structure, so you can use structs to specify and enforce data shape. Let’s see this in action.
To define a struct, use the defstruct
macro (https://hexdocs.pm/elixir/Kernel.xhtml#defstruct/1).
Listing 4.7 Defining a structure (fraction.ex)
defmodule Fraction do defstruct a: nil, b: nil ... end
A keyword list provided to defstruct
defines the struct’s fields along with their initial values. You can now instantiate a struct using this special syntax:
iex(1)> one_half = %Fraction{a: 1, b: 2} %Fraction{a: 1, b: 2}
Notice how a struct bears the name of the module it’s defined in. There’s a close relation between structs and modules. A struct may exist only in a module, and a single module can define only one struct.
Internally, a struct is a special kind of map. Therefore, individual fields are accessed just like maps:
iex(2)> one_half.a 1 iex(3)> one_half.b 2
The nice thing about structs is that you can pattern-match on them:
iex(4)> %Fraction{a: a, b: b} = one_half %Fraction{a: 1, b: 2} iex(5)> a 1 iex(6)> b 2
This makes it possible to assert that some variable is really a struct:
iex(6)> %Fraction{} = one_half ❶ %Fraction{a: 1, b: 2} ❶ iex(7)> %Fraction{} = %{a: 1, b: 2} ❷ ** (MatchError) no match of right hand side value: %{a: 1, b: 2} ❷
❷ A struct pattern doesn’t match a map.
Here, you use a %Fraction{}
pattern that matches any Fraction
struct, regardless of its contents. Pattern matching with structs works much like it does with maps. This means in a pattern match, you need to specify only the fields you’re interested in, ignoring all other fields.
Updating a struct works similarly to the way it works with maps:
iex(8)> one_quarter = %Fraction{one_half | b: 4} %Fraction{a: 1, b: 4}
This code creates a new struct instance based on the original one (one_half
), changing the value of the field b
to 4
.
The shape of the struct is defined at compile time. As a result, some errors can be caught by the Elixir compiler. For example, suppose we make a typo in the field name:
iex(9)> %Fraction{a: 1, d: 2} ** (KeyError) key :d not found
The struct doesn’t specify the field :d
, so the error is reported. In contrast, if you used a regular map, this code would succeed. However, the program would fail in a distant place with a nonobvious reason, which would make the error more difficult to debug.
It’s worth noting that this error is reported at compile time. If you make the same mistake in a source file, the code won’t even compile.
Armed with this knowledge, let’s add some functionality to the Fraction
abstraction. First, you need to provide the creation function.
Listing 4.8 Instantiating a fraction (fraction.ex)
defmodule Fraction do ... def new(a, b) do %Fraction{a: a, b: b} end ... end
This is a simple wrapper around the %Fraction{}
syntax. It makes the client code clearer and less coupled with the fact that structs are used.
Next, implement a Fraction.value/1
function that returns a float representation of the fraction.
Listing 4.9 Calculating the fraction value (fraction.ex)
defmodule Fraction do
...
def value(%Fraction{a: a, b: b}) do ❶
a / b
end
...
end
The value/1
function matches a fraction, taking its fields into individual variables and using them to compute the final result. The benefit of pattern matching is that the input type is enforced. If you pass anything that isn’t a fraction instance, you’ll get a match error.
Instead of decomposing fields into variables, you could also use dot notation:
def value(fraction) do fraction.a / fraction.b end
This version is arguably clearer, but on the flip side, it accepts any map, not just Fraction
structs, which might lead to subtle bugs. For example, suppose there’s a Rectangle
struct with the same fields. You could accidentally pass such a struct to this function, and instead of failing, the function would return some meaningless result.
One final thing to do is to implement the add
function.
Listing 4.10 Adding two fractions (fraction.ex)
defmodule Fraction ... def add(%Fraction{a: a1, b: b1}, %Fraction{a: a2, b: b2}) do new( a1 * b2 + a2 * b1, b2 * b1 ) end ... end
You can now test your fraction:
iex(1)> Fraction.new(1, 2) |> Fraction.add(Fraction.new(1, 4)) |> Fraction.value() 0.75
The code works as expected. By representing fractions with a struct, you can provide the definition of your type, listing all fields and their default values. Furthermore, it’s possible to distinguish struct instances from any other data type. This allows you to place %Fraction{}
matches in function arguments, thus asserting that you only accept fraction instances.
You should always be aware that structs are simply maps, so they have the same characteristics with respect to performance and memory usage. But a struct instance receives special treatment. Some things that can be done with maps don’t work with structs. For example, you can’t call Enum
functions on a struct:
iex(1)> one_half = Fraction.new(1, 2) iex(2)> Enum.to_list(one_half) ** (Protocol.UndefinedError) protocol Enumerable not implemented for %Fraction{a: 1, b: 2}
Remember that a struct is a functional abstraction and should, therefore, behave according to the implementation of the module where it’s defined. In the case of the Fraction
abstraction, you must define whether Fraction
is enumerable and, if so, in what way. If this isn’t done, Fraction
isn’t an enumerable, so you can’t call Enum
functions on it.
In contrast, a plain map is an enumerable, so you can convert it to a list:
iex(3)> Enum.to_list(%{a: 1, b: 2}) [a: 1, b: 2]
On the other hand, because structs are maps, directly calling the Map
functions works:
iex(4)> Map.to_list(one_half) [__struct__: Fraction, a: 1, b: 2]
Notice the __struct__: Fraction
bit. This key-value pair is automatically included in each struct. It helps Elixir distinguish structs from plain maps and perform proper runtime dispatches from within polymorphic generic code. You’ll learn more about this later, when we describe protocols.
The __struct__
field has an important consequence for pattern matching. A struct pattern can’t match a plain map:
iex(5)> %Fraction{a: a, b: b} = %{a: 1, b: 2} ** (MatchError) no match of right hand side value: %{a: 1, b: 2}
However, a plain map pattern can match a struct:
iex(5)> %{a: a, b: b} = %Fraction{a: 1, b: 2} %Fraction{a: 1, b: 2} iex(6)> a 1 iex(7)> b 2
This is due to the way pattern matching works with maps. Remember that all fields from the pattern must exist in the matched term. When matching a map to a struct pattern, this isn’t the case because %Fraction{}
contains the field struct
, which isn’t present in the map being matched.
The opposite works because you match a struct to the %{a: a, b: b}
pattern. Because all these fields exist in the Fraction
struct, the match is successful.
In addition to maps and structs, there’s another way to structure data: records. This is a feature that lets you use tuples and still be able to access individual elements by name. Records can be defined using the defrecord
and defrecordp
macros from the Record
module (https://hexdocs.pm/elixir/Record.xhtml).
Given that they’re essentially tuples, records should be faster than maps (although the difference usually isn’t significant in the grand scheme of things). On the flip side, the usage is more verbose, and it’s not possible to access fields by name dynamically.
Records are present mostly for historical reasons. Before maps were used, records were one of the main tools for structuring data. In fact, many libraries from the Erlang ecosystem use records as their interface. If you need to interface an Erlang library using a record defined in that library, you must import that record into Elixir and define it as a record. This can be done using the Record.extract/2
function in conjunction with the defrecord
macro. This idiom isn’t required often, so records won’t be demonstrated here. Still, it may be useful to keep this information in the back of your mind and research it if the need arises.
The modules you’ve devised so far are abstractions because clients aren’t aware of their implementation details. For example, as a client, you call Fraction.new/2
to create an instance of the abstraction and then send that instance back to some other function from the same module.
But the entire data structure is always visible. As a client, you can obtain individual fraction values, even if this was not intended by the library developer.
It’s important to be aware that data in Elixir is always transparent. Clients can read any information from your structs (and any other data type), and there’s no easy way of preventing that. In that sense, encapsulation works differently than in typical OO languages. In Elixir, modules are in charge of abstracting the data and providing operations to manipulate and query that data, but the data is never hidden.
Let’s verify this in a shell session:
$ iex todo_entry_map.ex
iex(1)> todo_list = TodoList.new() |>
TodoList.add_entry(%{date: ~D[2023-12-19], title: "Dentist"})
%{~D[2023-12-19] => [%{date: ~D[2023-12-19], title: "Dentist"}]} ❶
❶ To-do list with a single element
Looking at the return value, you can see the entire structure of the to-do list. From the output, you can immediately tell that the to-do list is powered by a map, and you can also find out details about how individual entries are kept.
Let’s look at another example. A MapSet
instance is also an abstraction, powered by the MapSet
module and a corresponding struct. At first glance, this isn’t visible:
iex(1)> mapset = MapSet.new([:monday, :tuesday]) MapSet.new([:monday, :tuesday])
Notice how the result of the expression is printed in a special way, using the MapSet.new(...)
output. This is due to the inspection mechanism in Elixir: whenever a result is printed in the shell, the function Kernel.inspect/1
is called to transform the structure into an inspected string. For each abstraction you build, you can override the default behavior and provide your own inspected format. This is exactly what MapSet
does, and you’ll learn how to do this for a type later in this chapter, when we discuss protocols.
Occasionally, you may want to see the pure data structure, without this decorated output. This can be useful when you’re debugging, analyzing, or reverse engineering code. To do so, you can provide a special option to the inspect
function:
iex(2)> IO.puts(inspect(mapset, structs: false)) %{__struct__: MapSet, map: %{monday: [], tuesday: []}, version: 2}
The output now reveals the complete structure of a date, and you can “see through” the MapSet
abstraction. This demonstrates that data privacy can’t be fully enforced in Elixir. Remember from chapter 2 that the only complex types are tuples, lists, and maps. Any other abstraction, such as MapSet
or your own TodoList
, will ultimately be built on top of these types.
The benefit of data transparency is that the data can be easily inspected, which can be useful for debugging purposes. But as a client of an abstraction, you shouldn’t rely on its internal representation, even though it’s visible to you. You shouldn’t pattern-match on the internal structure or try to extract or modify individual parts of it because a proper abstraction, such as MapSet
, doesn’t guarantee what the data will look like. The only guarantee is that the module’s functions will work if you send them a properly structured instance that you already received from that same module.
Sometimes, a module will publicly document some parts of its internal structure. Good examples of this are the date and time modules, such as Date
, Time
, and DateTime
. Looking at the documentation, you’ll see explicit mention that the corresponding data is represented as a structure using fields such as year, month, hour, and so on. In this case, the structure of the data is publicly documented, and you can freely rely on it.
One final thing you should know, related to data inspection, is the IO.inspect/1
function. This function prints the inspected representation of a structure to the screen and returns the structure itself. This is particularly useful when debugging a piece of code. Look at the following example:
iex(1)> Fraction.new(1, 4) |> Fraction.add(Fraction.new(1, 4)) |> Fraction.add(Fraction.new(1, 2)) |> Fraction.value() 1.0
This code relies on the pipe operator to perform a series of fraction operations. Let’s say you want to inspect the entire structure after each step. You can easily insert the call to IO.inspect/1
after every line:
iex(2)> Fraction.new(1, 4) |> IO.inspect() |> Fraction.add(Fraction.new(1, 4)) |> IO.inspect() |> Fraction.add(Fraction.new(1, 2)) |> IO.inspect() |> Fraction.value() %Fraction{a: 1, b: 4} ❶ %Fraction{a: 8, b: 16} ❶ %Fraction{a: 32, b: 32} ❶
❶ Output of each IO.inspect call
This works because IO.inspect/1
prints the data structure and then returns that same data structure, unchanged. You can also take a look at the dbg
macro (https://hexdocs.pm/elixir/Kernel.xhtml#dbg/2), which is somewhat similar but provides more debugging features.
We’re now done with the basic theory behind functional abstractions, but you’ll practice some more by extending the to-do list.
In this section, you’ll extend the TodoList
abstraction to provide basic CRUD support. You already have the C and R parts resolved with the add_entry/2
and entries/2
functions, respectively. Now, you need to add support for updating and deleting entries. To do this, you must be able to uniquely identify each entry in the to-do list, so you’ll begin by adding unique ID values to each entry.
When adding a new entry to the list, you’ll autogenerate its ID value, using incremental integers for IDs. To implement this, you have to do a couple of things:
Represent the to-do list as a struct. You need to do this because the to-do list now has to keep two pieces of information: the entries collection and the ID value for the next entry.
Use the entry’s ID as the key. So far, when storing entries in a collection, you used the entry’s date as the key. You’ll change this and use the entry’s ID instead. This will make it possible to quickly insert, update, and delete individual entries. You’ll now have exactly one value per each key, so you won’t need the MultiDict
abstraction anymore.
Let’s start implementing this. The code in the following listing contains the module and struct definitions.
Listing 4.11 TodoList
struct (todo_crud.ex)
defmodule TodoList do defstruct next_id: 1, entries: %{} ❶ def new(), do: %TodoList{} ❷ ... end
❶ Struct that describes the to-do list
The to-do list will now be represented as a struct with two fields. The field next_id
contains the ID value that will be assigned to the new entry while it’s being added to the structure. The field entries
is the collection of entries. As has been mentioned, you’re now using a map, and the keys are entry ID values.
During the struct definition, the default values for the next_id
and entries
fields are immediately specified. Therefore, you don’t have to provide these when creating a new instance. The new/0
function creates and returns an instance of the struct.
Next, it’s time to reimplement the add_entry/2
function. It has to do more work:
Listing 4.12 Autogenerating ID values for new entries (todo_crud.ex)
defmodule TodoList do ... def add_entry(todo_list, entry) do entry = Map.put(entry, :id, todo_list.next_id) ❶ new_entries = Map.put( ❷ todo_list.entries, ❷ todo_list.next_id, ❷ entry ❷ ) %TodoList{todo_list | ❸ entries: new_entries, ❸ next_id: todo_list.next_id + 1 ❸ } end ... end
❷ Adds the new entry to the entries list
A lot happens here, so let’s take each step, one at a time.
In the function body, you first update the entry’s id
value with the value stored in the next_id
field. Notice how you use Map.put/3
to update the entry map. The input map may not contain the id
field, so you can’t use the standard %{entry | id: next_id}
technique, which only works if the id
field is already present in the map. Once the entry is updated, you add it to the entries collection, keeping the result in the new_entries
variable.
Finally, you must update the TodoList
struct instance, setting its entries
field to the new_entries
collection and incrementing the next_id
field. Essentially, you made a complex change in the struct, modifying multiple fields as well as the input entry (because you set its id
field).
To the external caller, the entire operation will be atomic. Either everything will happen or, in case of an error, nothing at all. This is the consequence of immutability. The effect of adding an entry is visible to others only when the add_entry/2
function finishes and its result is taken into a variable. If something goes wrong and you raise an error, the effect of any transformations won’t be visible.
It’s also worth repeating, as mentioned in chapter 2, that the new to-do list (the one returned by the add_entry/2
function) will share as much memory as possible with the input to-do list.
With the add_entry/2
function finished, you need to adjust the entries/2
function. This will be more complicated because you changed the internal structure. Earlier, you kept a date-to-entries mapping. Now, entries are stored using id
as the key, so you have to iterate through all the entries and return the ones that fall on a given date.
Listing 4.13 Filtering entries for a given date (todo_crud.ex)
defmodule TodoList do ... def entries(todo_list, date) do todo_list.entries |> Map.values() ❶ |> Enum.filter(fn entry -> entry.date == date end) ❷ end ... end
❷ Filters entries for a given date
This function first uses Map.values/1
to take the entries from the entries
map. Then, only the entries that fall on the given date are taken using Enum.filter/2
.
You can check whether your new version of the to-do list works:
$ iex todo_crud.ex iex(1)> todo_list = TodoList.new() |> TodoList.add_entry(%{date: ~D[2023-12-19], title: "Dentist"}) |> TodoList.add_entry(%{date: ~D[2023-12-20], title: "Shopping"}) |> TodoList.add_entry(%{date: ~D[2023-12-19], title: "Movies"}) iex(2)> TodoList.entries(todo_list, ~D[2023-12-19]) [ %{date: ~D[2023-12-19], id: 1, title: "Dentist"}, %{date: ~D[2023-12-19], id: 3, title: "Movies"} ]
This works as expected, and you can even see the ID value for each entry. Also note that the interface of the TodoList
module is the same as the previous version. You’ve made a number of internal modifications, changed the data representation, and practically rewritten the entire module. And yet the module’s clients don’t need to be altered because you kept the same interface for your functions.
This is nothing revolutionary—it’s a classical benefit of wrapping the behavior behind a properly chosen interface. However, it demonstrates how you can construct and reason about higher-level types when working with stateless modules and immutable data.
Now that your entries have ID values, you can add additional modifier operations. Let’s implement the update_entry
operation, which can be used to modify a single entry in the to-do list.
This function will accept an entry ID, and an updater lambda, which will be invoked to update the entry. This will work similarly to Map.update
. The lambda will receive the original entry and return its modified version. To keep things simple, the function won’t raise an error if the entry with a given ID doesn’t exist.
The following snippet illustrates the usage. Here, you modify the date of an entry that has an ID value of 1
:
iex(1)> TodoList.update_entry( todo_list, 1, ❶ &Map.put(&1, :date, ~D[2023-12-20]) ❷ )
❶ ID of the entry to be modified
The implementation is presented in the following listing.
Listing 4.14 Updating an entry (todo_crud.ex)
defmodule TodoList do ... def update_entry(todo_list, entry_id, updater_fun) do case Map.fetch(todo_list.entries, entry_id) do :error -> ❶ todo_list {:ok, old_entry} -> ❷ new_entry = updater_fun.(old_entry) new_entries = Map.put(todo_list.entries, new_entry.id, new_entry) %TodoList{todo_list | entries: new_entries} end end ... end
❶ No entry—returns the unchanged list
❷ Entry exists—performs the update and returns the modified list
Let’s break down what happens here. First, you look up the entry with the given ID, using Map.fetch/2
. The function will return :error
if the entry doesn’t exist and {:ok, value}
otherwise.
In the first case, if the entry doesn’t exist, you return the original version of the list. Otherwise, you have to call the updater lambda to get the modified entry. Then, you store the modified entry into the entries collection. Finally, you store the modified entries collection in the TodoList
instance and return that instance.
You may not have noticed, but in the previous example, you performed a deep update of an immutable hierarchy. Let’s break down what happens when you call TodoList .update_entry(todo_list, id, updater_lambda)
:
You call the updater that returns the modified version of the entry to you.
You call Map.put
to put the modified entry into the entries collection.
You return the new version of the to-do list, which contains the new entries collection.
Notice that steps 2, 3, and 4 are the ones in which you transform data. Each of these steps creates a new variable that contains the transformed data. In each subsequent step, you take this data and update its container, again by creating a transformed version of it.
This is how you work with immutable data structures. If you have hierarchical data, you can’t directly modify part of it that resides deep in its tree. Instead, you must walk down the tree to the particular part that needs to be modified and then transform it and all of its ancestors. The result is a copy of the entire model (in this case, the to-do list). As mentioned, the two versions—new and previous—will share as much memory as possible.
Although the technique presented works, it may become cumbersome for deeper hierarchies. Remember that to update an element deep in the hierarchy, you must walk to that element and then update all of its parents. To simplify this, Elixir offers support for more elegant, deep, hierarchical updates.
Let’s look at a basic example. Suppose the to-do list is represented as a simple map, where keys are IDs and values are plain maps consisting of fields. Let’s create one instance of such a to-do list:
iex(1)> todo_list = %{ 1 => %{date: ~D[2023-12-19], title: "Dentist"}, 2 => %{date: ~D[2023-12-20], title: "Shopping"}, 3 => %{date: ~D[2023-12-19], title: "Movies"} }
Now, let’s say you change your mind and want to go to the theater instead of a movie. The original structure can be modified elegantly using the Kernel.put_in/2
macro:
iex(2)> put_in(todo_list[3].title, "Theater") ❶ %{ 1 => %{date: ~D[2023-12-19], title: "Dentist"}, 2 => %{date: ~D[2023-12-20], title: "Shopping"}, 3 => %{date: ~D[2023-12-19], title: "Theater"} ❷ }
What happened here? Internally, put_in/2
does something similar to what you did. It walks recursively to the desired element, transforms it, and then updates all the parents. Notice that this is still an immutable operation, meaning the original structure is left intact, and you must assign the result to a variable.
To be able to do a recursive walk, put_in/2
needs to receive source data and a path to the target element. In the preceding example, the source is provided as todo_list
and the path is specified as [3].title
. The macro put_in/2
then walks down that path, rebuilding the new hierarchy on the way up.
It’s also worth noting that Elixir provides similar alternatives for data retrieval and updates in the form of the get_in/2
, update_in/2
, and get_and_update_in/2
macros. The fact that these are macros means the path you provide is evaluated at compile time and can’t be built dynamically.
If you need to construct paths at run time, there are equivalent functions that accept the data and path as separate arguments. For example, Elixir also includes the put_in/3
macro, which can be used as follows:
iex(3)> path = [3, :title]
iex(4)> put_in(todo_list, path, "Theater") ❶
❶ Using a path constructed at runtime
Functions and macros, such as put_in
, rely on the Access
module, which allows you to work with key-value structures, such as maps. You can also make your own abstraction work with Access
. You need to implement a couple of functions required by the Access
contract, and then put_in
and related macros and functions will know how to work with your own abstraction. Refer to the official Access
documentation (https://hexdocs.pm/elixir/Access.xhtml) for more details.
Your TodoList
module is almost complete. You’ve already implemented create (add_entry/2
), retrieve (entries/2
), and update (update_entry/3
) operations. The last thing to implement is the delete_entry/2
operation. This is straightforward, and it’s left for you to do as an exercise. If you get stuck, the solution is provided in the source file todo_crud.ex.
So far, you’ve been doing updates manually, one at a time. Now, it’s time to implement iterative updates. Imagine you have a raw list describing the entries:
$ iex todo_builder.ex iex(1)> entries = [ %{date: ~D[2023-12-19], title: "Dentist"}, %{date: ~D[2023-12-20], title: "Shopping"}, %{date: ~D[2023-12-19], title: "Movies"} ]
Now, you want to create an instance of the to-do list that contains all of these entries:
iex(2)> todo_list = TodoList.new(entries)
It’s obvious that the new/1
function performs an iterative build of the to-do list. How can you implement such a function? As it turns out, this is simple.
Listing 4.15 Iteratively building the to-do list (todo_builder.ex)
defmodule TodoList do ... def new(entries \\ []) do Enum.reduce( entries, %TodoList{}, ❶ fn entry, todo_list_acc -> ❷ add_entry(todo_list_acc, entry) end ) end ... end
❷ Iteratively updates the accumulator
To build the to-do list iteratively, you’re relying on Enum.reduce/3
. Recall from chapter 3 that reduce
is used to transform something enumerable to anything else. In this case, you’re transforming a raw list of Entry
instances into an instance of the TodoList
struct. Therefore, you call Enum.reduce/3
, passing the input list as the first argument, the new structure instance as the second argument (the initial accumulator value), and the lambda that’s called in each step.
The lambda is called for each entry in the input list. Its task is to add the entry to the current accumulator (TodoList
struct) and return the new accumulator value. To do this, the lambda delegates to the already-present add_entry/2
function, reversing the argument order. The arguments need to be reversed because Enum.reduce/3
calls the lambda, passing the iterated element (entry) and accumulator (TodoList
struct). In contrast, add_entry
accepts a struct and an entry.
Notice that you can make the lambda definition more compact with the help of the capture operator:
def new(entries \\ []) do
Enum.reduce(
entries,
%TodoList{},
&add_entry(&2, &1) ❶
)
end
❶ Reverses the order of arguments and delegates to add_entry/2
Whether you use this version or the previous one is entirely based on your personal taste.
Now, it’s time for you to practice a bit. In this exercise, you’ll create a TodoList
instance from a comma-separated file.
Assume you have a todos.csv file in the current folder. Each line in the file describes a single to-do entry:
2023-12-19,Dentist 2023-12-20,Shopping 2023-12-19,Movies
Your task is to create an additional module, TodoList.CsvImporter
, that can be used to create a TodoList
instance from the file contents:
iex(1)> todo_list = TodoList.CsvImporter.import("todos.csv")
To simplify the task, assume the file is always available and in the correct format. Also assume that the comma character doesn’t appear in the entry title.
This is generally not hard to do, but it might require some cracking and experimenting. The following are a couple of hints that will lead you in the right direction.
First, create a single file with the following layout:
defmodule TodoList do ... end defmodule TodoList.CsvImporter do ... end
Always work in small steps. Implement part of the calculation, and then print the result to the screen using IO.inspect/1
. I can’t stress enough how important this is. This task requires some data pipelining. Working in small steps will allow you to move gradually and verify that you’re on the right track.
The general steps you should undertake are as follows:
Open a file and go through it, removing \n
from each line. Hint: Use File.stream!/1
, Stream.map/2
, and String.trim_trailing/2
. You did this in chapter 3, when we talked about streams, in the example where you filtered lines longer than 80 characters.
Using Stream.map
, transform each line obtained from the previous step into a to-do list entry.
Convert the line into a [date_string, title]
list, using String.split/2
.
Convert the date string into a date, using Date.from_iso8601!
(https://hexdocs.pm/elixir/Date.xhtml#from_iso8601!/2).
Create the to-do list entry (a map in the shape of %{date: date, title: title}
).
The output of step 2 is an enumerable that consists of to-do entries. Pass that enumerable to the TodoList.new/1
function you recently implemented.
In each of these steps, you’ll receive an enumerable as an input, transform each element, and pass the resulting enumerable forward to the next step. In the final step, the resulting enumerable is passed to the already implemented TodoList.new/1
, and the to-do list is created.
If you work in small steps, it’s harder to get lost. For example, you can start by opening a file and printing each line to the screen. Then, try to remove the trailing newline from each line and print them to the screen, and so on.
While transforming the data in each step, you can work with Enum
functions or functions from the Stream
module. It will probably be simpler to start with eager functions from the Enum
module and get the entire thing to work. Then, try to replace as many of the Enum
functions as possible with their Stream
counterparts. Recall from chapter 3 that the Stream
functions are lazy and composable, which can reduce the amount of intermediate memory required for the operation. If you get lost, the solution is provided in the todo_import.ex file.
In the meantime, we’re almost done with our exploration of higher-level abstractions. The final topic we’ll briefly discuss is the Elixir way of doing polymorphism.
Polymorphism is a runtime decision about which code to execute, based on the nature of the input data. In Elixir, the basic (but not the only) way of doing this is by using the language feature called protocols.
Before discussing protocols, let’s see them in action. You’ve already seen polymorphic code. For example, the entire Enum
module is generic code that works on anything enumerable, as the following snippet illustrates:
Enum.each([1, 2, 3], &IO.inspect/1) Enum.each(1..3, &IO.inspect/1) Enum.each(%{a: 1, b: 2}, &IO.inspect/1)
Notice how you use the same Enum.each/2
function, sending it different data structures: a list, range, and map. How does Enum.each/2
know how to walk each structure? It doesn’t. The code in Enum.each/2
is generic and relies on a contract. This contract, called a protocol, must be implemented for each data type you wish to use with Enum
functions. Next, let’s learn how to define and use protocols.
A protocol is a module in which you declare functions without implementing them. Consider it a rough equivalent of an OO interface. The generic logic relies on the protocol and calls its functions. Then, you can provide a concrete implementation of the protocol for different data types.
Let’s look at an example. The protocol String.Chars
is provided by the Elixir standard library and is used to convert data to a binary string. This is how the protocol is defined in the Elixir source:
defprotocol String.Chars do ❶ def to_string(term) ❷ end
❷ Declaration of protocol functions
This resembles the module definition, with the notable difference that functions are declared but not implemented.
Notice the first argument of the function (the term
). At runtime, the type of this argument determines the implementation that’s called. Let’s see this in action. Elixir already implements the protocol for atoms, numbers, and some other data types, so you can issue the following calls:
iex(1)> String.Chars.to_string(1) "1" iex(2)> String.Chars.to_string(:an_atom) "an_atom"
If the protocol isn’t implemented for the given data type, an error is raised:
iex(3)> String.Chars.to_string(TodoList.new()) ** (Protocol.UndefinedError) protocol String.Chars not implemented
Usually, you don’t need to call the protocol function directly. More often, there’s generic code that relies on the protocol. In the case of String.Chars
, this is the auto-imported function Kernel.to_string/1
:
iex(4)> to_string(1) "1" iex(5)> to_string(:an_atom) "an_atom" iex(6)> to_string(TodoList.new()) ** (Protocol.UndefinedError) protocol String.Chars not implemented
As you can see, the behavior of to_string/1
is exactly the same as that of String.Chars.to_string/1
. This is because Kernel.to_string/1
delegates to the String.Chars
implementation.
In addition, you can send anything that implements String.Chars
to IO.puts/1
:
iex(7)> IO.puts(1) 1 iex(8)> IO.puts(:an_atom) an_atom iex(9)> IO.puts(TodoList.new()) ** (Protocol.UndefinedError) protocol String.Chars not implemented
As you can see, an instance of the TodoList
isn’t printable because String.Chars
isn’t implemented for the corresponding type.
How do you implement a protocol for a specific type? Let’s refer to the Elixir source again. The following snippet implements String.Chars
for integers:
defimpl String.Chars, for: Integer do def to_string(term) do Integer.to_string(term) end end
You start the implementation by calling the defimpl
macro. Then, you specify which protocol to implement and the corresponding data type. Finally, the do/end
block contains the implementation of each protocol function. In the example, the implementation delegates to the existing standard library function Integer.to_string/1
.
The for: Type
part deserves some explanation. The type is an atom and can be any of following aliases: Tuple
, Atom
, List
, Map
, BitString
, Integer
, Float
, Function
, PID
, Port
, or Reference
. These values correspond to built-in Elixir types.
In addition, the alias Any
is allowed, which makes it possible to specify a fallback implementation. If a protocol isn’t defined for a given type, an error will be raised, unless a fallback to Any
is specified in the protocol definition and an Any
implementation exists. Refer to the protocol documentation (https://hexdocs.pm/elixir/Protocol.xhtml) for details.
Finally, and most importantly, the type can be any other arbitrary alias (but not a regular, simple atom):
defimpl String.Chars, for: SomeAlias do ... end
This implementation will be called if the first argument of the protocol function is a struct defined in the corresponding module. For example, you can implement String.Chars
for TodoList
as follows:
iex(1)> defimpl String.Chars, for: TodoList do def to_string(_) do "#TodoList" end end
Now, you can pass a to-do list instance to IO.puts/1
:
iex(2)> IO.puts(TodoList.new()) #TodoList
It’s important to notice that the protocol implementation doesn’t need to be part of any module. This has powerful consequences: you can implement a protocol for a type, even if you can’t modify the type’s source code. You can place the protocol implementation anywhere in your own code, and the runtime will be able to take advantage of it.
Elixir comes with some predefined protocols. It’s best to consult the online documentation for the complete reference (https://hexdocs.pm/elixir), but let’s mention some of the most important ones.
You’ve already seen String.Chars
, which specifies the contract for converting data into a binary string. There’s also the List.Chars
protocol, which converts input data to a character string (a list of characters).
If you want to control how your structure is printed in the debug output (via the inspect
function), you can implement the Inspect
protocol.
Arguably, the most important protocol is Enumerable
. By implementing it, you can make your data structure enumerable. This means you can use all the functions from the Enum
and Stream
modules for free! This is probably the best demonstration of protocol usefulness. Both Enum
and Stream
are generic modules that offer many useful functions, which can work on your custom data structures as soon as you implement the Enumerable
protocol.
Closely related to enumeration is the Collectable
protocol. Recall from chapter 3 that a collectable structure is one that you can repeatedly add elements to. A collectable can be used with comprehensions to collect results or with Enum.into/2
to transfer elements of one structure (enumerable) to another (collectable).
And, of course, you can define your own protocols and implement them for any available data structure (your own or someone else’s). See the Kernel.defprotocol/2
documentation for more information.
Let’s look at a more involved example. You’ll make your to-do list collectable so that you can use it as a comprehension target. This is a slightly more advanced example, so don’t worry if you don’t understand each detail in the first go.
To make the abstraction collectable, you must implement the corresponding protocol:
defimpl Collectable, for: TodoList do ❶ def into(original) do ❶ {original, &into_callback/2} ❶ end defp into_callback(todo_list, {:cont, entry}) do ❷ TodoList.add_entry(todo_list, entry) ❷ end ❷ ❷ defp into_callback(todo_list, :done), do: todo_list ❷ defp into_callback(_todo_list, :halt), do: :ok ❷ end
The exported function into/1
is called by the generic code (e.g., comprehensions). Here, you provide the implementation that returns the appender lambda. This appender lambda is then repeatedly invoked by the generic code to append each element to your data structure.
The appender function receives a to-do list and an instruction hint. If you receive {:cont, entry}
, you must add a new entry. If you receive :done
, you return the list, which, at this point, contains all appended elements. Finally, :halt
indicates that the operation has been canceled, and the return value is ignored.
Let’s see this in action. Copy and paste the previous code into the shell, and then try the following:
iex(1)> entries = [
%{date: ~D[2023-12-19], title: "Dentist"},
%{date: ~D[2023-12-20], title: "Shopping"},
%{date: ~D[2023-12-19], title: "Movies"}
]
iex(2)> Enum.into(entries, TodoList.new()) ❶
%TodoList{...}
By implementing the Collectable
protocol, you essentially adapt the TodoList
abstraction to any generic code that relies on that protocol, such as Enum.into/2
or for
comprehension.
A module is used to create an abstraction. A module’s functions create, manipulate, and query data. Clients can inspect the entire structure but shouldn’t rely on its shape.
Maps can be used to group different fields together in a single structure.
Structs are special kinds of maps that allow you to define data abstractions related to a module.
Polymorphism can be implemented with protocols. A protocol defines an interface that is used by the generic logic. You can then provide specific protocol implementations for a data type.