Tables in Lua are not a data structure; they are the data structure. We can represent all structures that other languages offer —arrays, records, lists, queues, sets— with tables in Lua. Moreover, Lua tables implement all these structures efficiently.
In more traditional languages, such as C and Pascal, we implement most data structures with arrays and lists (where lists = records + pointers). Although we can implement arrays and lists using Lua tables (and sometimes we do this), tables are more powerful than arrays and lists; many algorithms are simplified to the point of triviality with the use of tables. For instance, we seldom write a search in Lua, because tables offer direct access to any type.
It takes a while to learn how to use tables efficiently. Here, we will see how to implement typical data structures with tables and cover some examples of their uses. We will start with arrays and lists, not because we need them for the other structures, but because most programmers are already familiar with them. (We have already seen the basics of this material in Chapter 5, Tables, but I will repeat it here for completeness.) Then we will continue with more advanced examples, such as sets, bags, and graphs.
We implement arrays in Lua simply by indexing tables with integers. Therefore, arrays do not have a fixed size, but grow as needed. Usually, when we initialize the array we define its size indirectly. For instance, after the following code, any attempt to access a field outside the range 1–1000 will return nil, instead of zero:
local a = {} -- new array for i = 1, 1000 do a[i] = 0 end
The length operator (#
)
uses this fact to find the size of an array:
print(#a) --> 1000
We can start an array at index zero, one, or any other value:
-- create an array with indices from -5 to 5 a = {} for i = -5, 5 do a[i] = 0 end
However, it is customary in Lua to start arrays with index one. The Lua libraries adhere to this convention; so does the length operator. If our arrays do not start with one, we will not be able to use these facilities.
We can use a constructor to create and initialize arrays in a single expression:
squares = {1, 4, 9, 16, 25, 36, 49, 64, 81}
Such constructors can be as large as we need. In Lua, it is not uncommon data-description files with constructors with a few million elements.
There are two main ways to represent matrices in Lua.
The first one is with a jagged array (an array of arrays),
that is, a table wherein each element is another table.
For instance, we can create a matrix of zeros with
dimensions N
by M
with the following code:
local mt = {} -- create the matrix for i = 1, N do local row = {} -- create a new row mt[i] = row for j = 1, M do row[j] = 0 end end
Because tables are objects in Lua,
we have to create each row explicitly to build a matrix.
On the one hand, this is certainly more verbose than simply
declaring a matrix, as we do in C.
On the other hand, it gives us more flexibility.
For instance, we can create a triangular matrix
by changing the inner loop in the previous example to
for j=1,i do ... end
.
With this code, the triangular matrix uses only half the memory
of the original one.
The second way to represent a matrix is by
composing the two indices into a single one.
Typically,
we do this by multiplying the first index
by a suitable constant and then
adding the second index.
With this approach,
the following code would create
our matrix of zeros with dimensions N
by M
:
local mt = {} -- create the matrix for i = 1, N do local aux = (i - 1) * M for j = 1, M do mt[aux + j] = 0 end end
Quite often, applications use a sparse matrix, a matrix wherein most elements are zero or nil. For instance, we can represent a graph by its adjacency matrix, which has the value x in position (m,n) when there is a connection with cost x between nodes m and n. When these nodes are not connected, the value in position (m,n) is nil. To represent a graph with ten thousand nodes, where each node has about five neighbors, we will need a matrix with a hundred million entries (a square matrix with 10000 columns and 10000 rows), but approximately only fifty thousand of them will not be nil (five non-nil columns for each row, corresponding to the five neighbors of each node). Many books on data structures discuss at length how to implement such sparse matrices without wasting 800 MB of memory, but we seldom need these techniques when programming in Lua. Because we represent arrays with tables, they are naturally sparse. With our first representation (tables of tables), we will need ten thousand tables, each one with about five elements, with a grand total of fifty thousand entries. With the second representation, we will have a single table, with fifty thousand entries in it. Whatever the representation, we need space only for the non-nil elements.
We cannot use the length operator over sparse matrices,
because of the holes (nil values) between active entries.
This is not a big loss;
even if we could use it, we probably would not.
For most operations,
it would be quite inefficient to traverse all those empty entries.
Instead,
we can use pairs
to traverse only the non-nil elements.
As an example,
let us see how to do matrix multiplication
with sparse matrices represented by jagged arrays.
Suppose we want to multiply a matrix a[M,K]
by a matrix b[K,N]
,
producing the matrix c[M,N]
.
The usual matrix-multiplication algorithm goes like this:
for i = 1, M do for j = 1, N do c[i][j] = 0 for k = 1, K do c[i][j] = c[i][j] + a[i][k] * b[k][j] end end end
The two outer loops traverse the entire resulting matrix, and for each element, the inner loop computes its value.
For sparse matrices with jagged arrays,
this inner loop is a problem.
Because it traverses a column of b
, instead of a row,
we cannot use something like pairs
here:
the loop has to visit each row looking whether that row
has an element in that column.
Instead of visiting only a few non-zero elements,
the loop visits all zero elements, too.
(Traversing a column can be an issue in other contexts, too,
because of its loss of spatial locality.)
The following algorithm is quite similar to the previous one, but it reverses the order of the two inner loops. With this simple change, it avoids traversing columns:
-- assumes 'c' has zeros in all elements for i = 1, M do for k = 1, K do for j = 1, N do c[i][j] = c[i][j] + a[i][k] * b[k][j] end end end
Now, the middle loop traverses the row a[i]
,
and the inner loop traverses the row b[k]
.
Both can use pairs
,
visiting only the non-zero elements.
The initialization of the resulting matrix c
is not an issue here,
because an empty sparse matrix is naturally filled with zeros.
Figure 14.1. Multiplication of sparse matrices
function mult (a, b) local c = {} -- resulting matrix for i = 1, #a do local resultline = {} -- will be 'c[i]' for k, va in pairs(a[i]) do -- 'va' is a[i][k] for j, vb in pairs(b[k]) do -- 'vb' is b[k][j] local res = (resultline[j] or 0) + va * vb resultline[j] = (res ~= 0) and res or nil end end c[i] = resultline end return c end
Figure 14.1, “Multiplication of sparse matrices” shows the complete implementation
of the above algorithm,
using pairs
and taking care of sparse entries.
This implementation visits only the non-nil elements,
and the result is naturally sparse.
Moreover, the code deletes resulting entries that
by chance evaluate to zero.
Because tables are dynamic entities,
it is easy to implement linked lists in Lua.
We represent each node with a table (what else?);
links are simply table fields that contain references
to other tables.
For instance,
let us implement a singly-linked list,
where each node has two fields, value
and next
.
A simple variable is the list root:
list = nil
To insert an element at the beginning of the list,
with a value v
, we do this:
list = {next = list, value = v}
To traverse the list, we write this:
local l = list
while l do
visit l.value
l = l.next
end
We can also implement easily other kinds of lists, such as doubly-linked lists or circular lists. However, we seldom need those structures in Lua, because usually there is a simpler way to represent our data without using linked lists. For instance, we can represent a stack with an (unbounded) array.
A simple way to implement queues in Lua is with functions
insert
and remove
from the table
library.
As we saw in the section called “The Table Library”,
these functions insert and remove elements
in any position of an array,
moving other elements to accommodate the operation.
However, these moves can be expensive for large structures.
A more efficient implementation uses two indices,
one for the first element and another for the last.
With that representation,
we can insert or remove an element at both ends in constant time,
as shown in Figure 14.2, “A double-ended queue”.
Figure 14.2. A double-ended queue
function listNew () return {first = 0, last = -1} end function pushFirst (list, value) local first = list.first - 1 list.first = first list[first] = value end function pushLast (list, value) local last = list.last + 1 list.last = last list[last] = value end function popFirst (list) local first = list.first if first > list.last then error("list is empty") end local value = list[first] list[first] = nil -- to allow garbage collection list.first = first + 1 return value end function popLast (list) local last = list.last if list.first > last then error("list is empty") end local value = list[last] list[last] = nil -- to allow garbage collection list.last = last - 1 return value end
If we use this structure in a strict queue discipline,
calling only pushLast
and popFirst
,
both first
and last
will increase continually.
However, because we represent arrays in Lua with tables,
we can index them either from 1 to 20 or
from 16777201 to 16777220.
With 64-bit integers,
such a queue can run for thirty thousand years,
doing ten million insertions per second,
before it has problems with overflows.
As I said, before, we seldom do searches in Lua. Instead, we use what we call an index table or a reverse table.
Suppose we have a table with the names of the days of the week:
days = {"Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"}
Now we want to translate a name into its position in the week.
We can search the table, looking for the given name.
A more efficient approach, however,
is to build a reverse table,
say revDays
,
which has the names as indices and the numbers as values.
This table would look like this:
revDays = {["Sunday"] = 1, ["Monday"] = 2, ["Tuesday"] = 3, ["Wednesday"] = 4, ["Thursday"] = 5, ["Friday"] = 6, ["Saturday"] = 7}
Then, all we have to do to find the order of a name is to index this reverse table:
x = "Tuesday" print(revDays[x]) --> 3
Of course, we do not need to declare the reverse table manually. We can build it automatically from the original one:
revDays = {} for k,v in pairs(days) do revDays[v] = k end
The loop will do the assignment for each element of days
,
with the variable k
getting the keys (1, 2, ...) and
v
the values ("Sunday"
, "Monday"
, ...).
Suppose we want to list all identifiers used in a program source; for that, we need to filter the reserved words out of our listing. Some C programmers could be tempted to represent the set of reserved words as an array of strings and search this array to know whether a given word is in the set. To speed up the search, they could even use a binary tree to represent the set.
In Lua, an efficient and simple way to represent such sets is to put the set elements as indices in a table. Then, instead of searching the table for a given element, we just index the table and test whether the result is nil. In our example, we could write the following code:
reserved = {
["while"] = true, ["if"] = true,
["else"] = true, ["do"] = true,
}
for w in string.gmatch(s, "[%a_][%w_]*") do
if not reserved[w] then
do something with 'w'
-- 'w' is not a reserved word
end
end
(In the definition of reserved
,
we cannot write while = true
,
because while
is not a valid name in Lua.
Instead, we use the notation ["while"] = true
.)
We can have a clearer initialization using an auxiliary function to build the set:
function Set (list) local set = {} for _, l in ipairs(list) do set[l] = true end return set end reserved = Set{"while", "end", "function", "local", }
We can also use another set to collect the identifiers:
local ids = {} for w in string.gmatch(s, "[%a_][%w_]*") do if not reserved[w] then ids[w] = true end end -- print each identifier once for w in pairs(ids) do print(w) end
Bags, also called multisets, differ from regular sets in that each element can appear multiple times. An easy representation for bags in Lua is similar to the previous representation for sets, but with a counter associated with each key.[14] To insert an element, we increment its counter:
function insert (bag, element) bag[element] = (bag[element] or 0) + 1 end
To remove an element, we decrement its counter:
function remove (bag, element) local count = bag[element] bag[element] = (count and count > 1) and count - 1 or nil end
We only keep the counter if it already exists and it is still greater than zero.
Suppose we are building a string piecemeal, for instance reading a file line by line. Our typical code could look like this:
local buff = "" for line in io.lines() do buff = buff .. line .. "\n" end
Despite its innocent look, this code in Lua can cause a huge performance penalty for large files: for instance, it takes more than 30 seconds to read a 4.5 MB file on my new machine.
Why is that?
To understand what happens,
let us imagine that we are in the middle of the read loop;
each line has 20 bytes and we have already read some 2500 lines,
so buff
is a string with 50 kB.
When Lua concatenates buff..line.."\n"
,
it allocates a new string with 50020 bytes
and copies the 50000 bytes from buff
into this new string.
That is, for each new line,
Lua moves around 50 kB of memory, and growing.
The algorithm is quadratic.
After reading 100 new lines (only 2 kB),
Lua has already moved more than 5 MB of memory.
When Lua finishes reading 350 kB,
it has moved around more than 50 GB.
(This problem is not peculiar to Lua:
other languages wherein strings are immutable values
present a similar behavior,
Java being a famous example.)
Before we continue, we should remark that,
despite all I said, this situation is not a common problem.
For small strings, the above loop is fine.
To read an entire file,
Lua provides the io.read("a")
option,
which reads the file at once.
However, sometimes we must face this problem.
Java offers the StringBuffer
class to
ameliorate the problem.
In Lua, we can use a table as the string buffer.
The key to this approach is the function table.concat
,
which returns the concatenation of all the strings of a given list.
Using concat
, we can write our previous loop as follows:
local t = {} for line in io.lines() do t[#t + 1] = line .. "\n" end local s = table.concat(t)
This algorithm takes less than 0.05 seconds to read the same file
that took more than half a minute to read with the original code.
(Nevertheless,
for reading a whole file it is still better to use io.read
with the "a"
option.)
We can do even better.
The function concat
accepts an optional second argument,
which is a separator to be inserted between the strings.
Using this separator,
we do not need to insert a newline after each line:
local t = {} for line in io.lines() do t[#t + 1] = line end s = table.concat(t, "\n") .. "\n"
The function inserts the separator between the strings,
but we still have to add the last newline.
This last concatenation creates a new copy of the resulting string,
which can be quite long.
There is no option to make concat
insert this extra separator,
but we can deceive it,
inserting an extra empty string in t
:
t[#t + 1] = "" s = table.concat(t, "\n")
Now, the extra newline that concat
adds before this empty string
is at the end of the resulting string, as we wanted.
Like any decent language, Lua allows multiple implementations for graphs, each one better adapted to some particular algorithms. Here we will see a simple object-oriented implementation, where we represent nodes as objects (actually tables, of course) and arcs as references between nodes.
We will represent each node as a table with two fields:
name
, with the node’s name;
and adj
, with the set of nodes adjacent to this one.
Because we will read the graph from a text file,
we need a way to find a node given its name.
So, we will use an extra table mapping names to nodes.
Given a name, function name2node
returns the
corresponding node:
local function name2node (graph, name) local node = graph[name] if not node then -- node does not exist; create a new one node = {name = name, adj = {}} graph[name] = node end return node end
Figure 14.3, “Reading a graph from a file” shows the function that builds a graph.
Figure 14.3. Reading a graph from a file
function readgraph () local graph = {} for line in io.lines() do -- split line in two names local namefrom, nameto = string.match(line, "(%S+)%s+(%S+)") -- find corresponding nodes local from = name2node(graph, namefrom) local to = name2node(graph, nameto) -- adds 'to' to the adjacent set of 'from' from.adj[to] = true end return graph end
It reads a file where each line has two node names,
meaning that there is an arc from the first node to the second.
For each line,
the function uses string.match
to split the line in two names,
finds the nodes corresponding to these names
(creating the nodes if needed),
and connects the nodes.
Figure 14.4, “Finding a path between two nodes” illustrates an algorithm using such graphs.
Figure 14.4. Finding a path between two nodes
function findpath (curr, to, path, visited) path = path or {} visited = visited or {} if visited[curr] then -- node already visited? return nil -- no path here end visited[curr] = true -- mark node as visited path[#path + 1] = curr -- add it to path if curr == to then -- final node? return path end -- try all adjacent nodes for node in pairs(curr.adj) do local p = findpath(node, to, path, visited) if p then return p end end table.remove(path) -- remove node from path end
The function findpath
searches for a path between two
nodes using a depth-first traversal.
Its first parameter is the current node;
the second is its goal;
the third parameter keeps the path from the origin
to the current node;
the last parameter is a set with all the nodes
already visited, to avoid loops.
Note how the algorithm manipulates nodes directly,
without using their names.
For instance, visited
is a set of nodes,
not of node names.
Similarly, path
is a list of nodes.
To test this code, we add a function to print a path and some code to put it all to work:
function printpath (path) for i = 1, #path do print(path[i].name) end end g = readgraph() a = name2node(g, "a") b = name2node(g, "b") p = findpath(a, b) if p then printpath(p) end
Exercise 14.1: Write a function to add two sparse matrices.
Exercise 14.2: Modify the queue implementation in Figure 14.2, “A double-ended queue” so that both indices return to zero when the queue is empty.
Exercise 14.3: Modify the graph structure so that it can keep a label for each arc. The structure should represent each arc by an object, too, with two fields: its label and the node it points to. Instead of an adjacent set, each node keeps an incident set that contains the arcs that originate at that node.
Adapt the function readgraph
to read two node names
plus a label from each line in the input file.
(Assume that the label is a number.)
Exercise 14.4: Assume the graph representation of the previous exercise, where the label of each arc represents the distance between its end nodes. Write a function to find the shortest path between two given nodes, using Dijkstra’s algorithm.
[14] We already used this representation for the most-frequent-words program in Chapter 11, Interlude: Most Frequent Words.
Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>