5 Tables

Tables are the main (in fact, the only) data structuring mechanism in Lua, and a powerful one. We use tables to represent arrays, sets, records, and many other data structures in a simple, uniform, and efficient way. Lua uses tables to represent packages and objects as well. When we write math.sin, we think about “the function sin from the math library”. For Lua, this expression means “index the table math using the string "sin" as the key”.

A table in Lua is essentially an associative array. A table is an array that accepts not only numbers as indices, but also strings or any other value of the language (except nil).

Tables in Lua are neither values nor variables; they are objects. If you are familiar with arrays in Java or Scheme, then you have a fair idea of what I mean. You may think of a table as a dynamically-allocated object; programs manipulate only references (or pointers) to them. Lua never does hidden copies or creation of new tables behind the scenes.

We create tables by means of a constructor expression, which in its simplest form is written as {}:

      > a = {}           -- create a table and assign its reference
      > k = "x"
      > a[k] = 10        -- new entry, with key="x" and value=10
      > a[20] = "great"  -- new entry, with key=20 and value="great"
      > a["x"]                  --> 10
      > k = 20
      > a[k]                    --> "great"
      > a["x"] = a["x"] + 1     -- increments entry "x"
      > a["x"]                  --> 11

A table is always anonymous. There is no fixed relationship between a variable that holds a table and the table itself:

      > a = {}
      > a["x"] = 10
      > b = a            -- 'b' refers to the same table as 'a'
      > b["x"]           --> 10
      > b["x"] = 20
      > a["x"]           --> 20
      > a = nil          -- only 'b' still refers to the table
      > b = nil          -- no references left to the table

When a program has no more references to a table, the garbage collector will eventually delete the table and reuse its memory.

Table Indices

Each table can store values with different types of indices, and it grows as needed to accommodate new entries:

      > a = {}     -- empty table
      > -- create 1000 new entries
      > for i = 1, 1000 do a[i] = i*2 end
      > a[9]               --> 18
      > a["x"] = 10
      > a["x"]             --> 10
      > a["y"]             --> nil

Note the last line: like global variables, table fields evaluate to nil when not initialized. Also like global variables, we can assign nil to a table field to delete it. This is not a coincidence: Lua stores global variables in ordinary tables. (We will discuss this subject further in Chapter 22, The Environment.)

To represent structures, we use the field name as an index. Lua supports this representation by providing a.name as syntactic sugar for a["name"]. Therefore, we could write the last lines of the previous example in a cleaner manner as follows:

      > a = {}                      -- empty table
      > a.x = 10                    -- same as a["x"] = 10
      > a.x       --> 10            -- same as a["x"]
      > a.y       --> nil           -- same as a["y"]

For Lua, the two forms are equivalent and can be intermixed freely. For a human reader, however, each form may signal a different intention. The dot notation clearly shows that we are using the table as a structure, where we have some set of fixed, predefined keys. The string notation gives the idea that the table can have any string as a key, and that for some reason we are manipulating that specific key.

A common mistake for beginners is to confuse a.x with a[x]. The first form represents a["x"], that is, a table indexed by the string "x". The second form is a table indexed by the value of the variable x. See the difference:

      > a = {}
      > x = "y"
      > a[x] = 10                 -- put 10 in field "y"
      > a[x]          --> 10      -- value of field "y"
      > a.x           --> nil     -- value of field "x" (undefined)
      > a.y           --> 10      -- value of field "y"

Because we can index a table with any type, when indexing a table we have the same subtleties that arise in equality. Although we can index a table both with the number 0 and with the string "0", these two values are different and therefore denote different entries in a table. Similarly, the strings "+1", "01", and "1" all denote different entries. When in doubt about the actual types of your indices, use an explicit conversion to be sure:

      > i = 10; j = "10"; k = "+10"
      > a = {}
      > a[i] = "number key"
      > a[j] = "string key"
      > a[k] = "another string key"
      > a[i]               --> number key
      > a[j]               --> string key
      > a[k]               --> another string key
      > a[tonumber(j)]     --> number key
      > a[tonumber(k)]     --> number key

You can introduce subtle bugs in your program if you do not pay attention to this point.

Integers and floats do not have the above problem. In the same way that 2 compares equal to 2.0, both values refer to the same table entry, when used as keys:

      > a = {}
      > a[2.0] = 10
      > a[2.1] = 20
      > a[2]             --> 10
      > a[2.1]           --> 20

More specifically, when used as a key, any float value that can be converted to an integer is converted. For instance, when Lua executes a[2.0] = 10, it converts the key 2.0 to 2. Float values that cannot be converted to integers remain unaltered.

Table Constructors

Constructors are expressions that create and initialize tables. They are a distinctive feature of Lua and one of its most useful and versatile mechanisms.

The simplest constructor is the empty constructor, {}, as we have seen. Constructors also initialize lists. For instance, the following statement will initialize days[1] with the string "Sunday" (the first element of the constructor has index 1, not 0), days[2] with "Monday", and so on:

      days = {"Sunday", "Monday", "Tuesday", "Wednesday",
              "Thursday", "Friday", "Saturday"}
      
      print(days[4])  --> Wednesday

Lua also offers a special syntax to initialize a record-like table, as in the next example:

      a = {x = 10, y = 20}

This previous line is equivalent to these commands:

      a = {}; a.x = 10; a.y = 20

The original expression, however, is faster, because Lua creates the table already with the right size.

No matter what constructor we use to create a table, we can always add fields to and remove fields from the result:

      w = {x = 0, y = 0, label = "console"}
      x = {math.sin(0), math.sin(1), math.sin(2)}
      w[1] = "another field"   -- add key 1 to table 'w'
      x.f = w                  -- add key "f" to table 'x'
      print(w["x"])            --> 0
      print(w[1])              --> another field
      print(x.f[1])            --> another field
      w.x = nil                -- remove field "x"

However, as I just mentioned, creating a table with a proper constructor is more efficient, besides being cleaner.

We can mix record-style and list-style initializations in the same constructor:

      polyline = {color="blue",
                  thickness=2,
                  npoints=4,
                  {x=0,   y=0},    -- polyline[1]
                  {x=-10, y=0},    -- polyline[2]
                  {x=-10, y=1},    -- polyline[3]
                  {x=0,   y=1}     -- polyline[4]
                 }

The above example also illustrates how we can nest tables (and constructors) to represent more complex data structures. Each of the elements polyline[i] is a table representing a record:

      print(polyline[2].x)    --> -10
      print(polyline[4].y)    --> 1

Those two constructor forms have their limitations. For instance, we cannot initialize fields with negative indices, nor with string indices that are not proper identifiers. For such needs, there is another, more general, format. In this format, we explicitly write each index as an expression, between square brackets:

      opnames = {["+"] = "add", ["-"] = "sub",
                 ["*"] = "mul", ["/"] = "div"}
      
      i = 20; s = "-"
      a = {[i+0] = s, [i+1] = s..s, [i+2] = s..s..s}
      
      print(opnames[s])    --> sub
      print(a[22])         --> ---

This syntax is more cumbersome, but more flexible too: both the list-style and the record-style forms are special cases of this more general syntax, as we show in the following equivalences:

      {x = 0, y = 0}    <-->    {["x"] = 0, ["y"] = 0}
      {"r", "g", "b"}   <-->    {[1] = "r", [2] = "g", [3] = "b"}

We can always put a comma after the last entry. These trailing commas are optional, but are always valid:

      a = {[1] = "red", [2] = "green", [3] = "blue",}

This flexibility frees programs that generate Lua constructors from the need to handle the last element as a special case.

Finally, we can always use a semicolon instead of a comma in a constructor. This facility is a leftover from older Lua versions and I guess it is seldom used nowadays.

Arrays, Lists, and Sequences

To represent a conventional array or a list, we simply use a table with integer keys. There is neither a way nor a need to declare a size; we just initialize the elements we need:

      -- read 10 lines, storing them in a table
      a = {}
      for i = 1, 10 do
        a[i] = io.read()
      end

Given that we can index a table with any value, we can start the indices of an array with any number that pleases us. However, it is customary in Lua to start arrays with one (and not with zero, as in C) and many facilities in Lua stick to this convention.

Usually, when we manipulate a list we must know its length. It can be a constant or it can be stored somewhere. Often we store the length of a list in a non-numeric field of the table; for historical reasons, several programs use the field "n" for this purpose. Often, however, the length is implicit. Remember that any non-initialized index results in nil; we can use this value as a sentinel to mark the end of the list. For instance, after we read 10 lines into a list, it is easy to know that its length is 10, because its numeric keys are 1, 2, ..., 10. This technique only works when the list does not have holes, which are nil elements inside it. We call such a list without holes a sequence.

For sequences, Lua offers the length operator (#). As we have seen, on strings it gives the number of bytes in the string. On tables, it gives the length of the sequence represented by the table. For instance, we could print the lines read in the last example with the following code:

      -- print the lines, from 1 to #a
      for i = 1, #a do
        print(a[i])
      end

The length operator also provides a useful idiom for manipulating sequences:

      a[#a + 1] = v        -- appends 'v' to the end of the sequence

The length operator is unreliable for lists with holes (nils). It only works for sequences, which we defined as lists without holes. More precisely, a sequence is a table where the positive numeric keys comprise a set {1,...,n} for some n. (Remember that any key with value nil is actually not in the table.) In particular, a table with no numeric keys is a sequence with length zero.

The behavior of the length operator for lists with holes is one of the most contentious features of Lua. Over the years, there have been many proposals either to raise an error when we apply the length operator to a list with holes, or to extend its meaning to those lists. However, these proposals are easier said than done. The problem is that, because a list is actually a table, the concept of “length” is somewhat fuzzy. For instance, consider the list resulting from the following code:

      a = {}
      a[1] = 1
      a[2] = nil    -- does nothing, as a[2] is already nil
      a[3] = 1
      a[4] = 1

It is easy to say that the length of this list is four, and that is has a hole at index 2. However, what can we say about the next similar example?

      a = {}
      a[1] = 1
      a[10000] = 1

Should we consider a as a list with 10000 elements, with 9998 holes? Now, the program does this:

      a[10000] = nil

What is the list length now? Should it be 9999, because the program deleted the last element? Or maybe still 10000, as the program only changed the last element to nil? Or should the length collapse to one?

Another common proposal is to make the # operator return the total number of elements in the table. This semantics is clear and well defined, but not very useful or intuitive. Consider all the examples we are discussing here and think how useful would be such operator for them.

Yet more troubling are nils at the end of the list. What should be the length of the following list?

      a = {10, 20, 30, nil, nil}

Remember that, for Lua, a field with nil is indistinct from an absent field. Therefore, the previous table is equal to {10, 20, 30}; its length is 3, not 5.

You may consider that a nil at the end of a list is a very special case. However, many lists are built by adding elements one by one. Any list with holes that was built that way must have had nils at its end along the way.

Despite all these discussions, most lists we use in our programs are sequences (e.g., a file line cannot be nil) and, therefore, most of the time the use of the length operator is safe. If you really need to handle lists with holes, you should store the length explicitly somewhere.

Table Traversal

We can traverse all key–value pairs in a table with the pairs iterator:

      t = {10, print, x = 12, k = "hi"}
      for k, v in pairs(t) do
        print(k, v)
      end
        --> 1    10
        --> k    hi
        --> 2    function: 0x420610
        --> x    12

Due to the way that Lua implements tables, the order that elements appear in a traversal is undefined. The same program can produce different orders each time it runs. The only certainty is that each element will appear once during the traversal.

For lists, we can use the ipairs iterator:

      t = {10, print, 12, "hi"}
      for k, v in ipairs(t) do
        print(k, v)
      end
        --> 1    10
        --> 2    function: 0x420610
        --> 3    12
        --> 4    hi

In this case, Lua trivially ensures the order.

Another way to traverse a sequence is with a numerical for:

      t = {10, print, 12, "hi"}
      for k = 1, #t do
        print(k, t[k])
      end
        --> 1    10
        --> 2    function: 0x420610
        --> 3    12
        --> 4    hi

Safe Navigation

Suppose the following situation: we want to know whether a given function from a given library is present. If we know for sure that the library itself exists, we can write something like if lib.foo then .... Otherwise, we have to write something like if lib and lib.foo then ....

When the level of nested tables gets deeper, this notation becomes problematic, as the next example illustrates:

      zip = company and company.director and
              company.director.address and
                company.director.address.zipcode

This notation is not only cumbersome, but inefficient, too. It performs six table accesses in a successful access, instead of three.

Some programming languages, such as C#, offer a safe navigation operator (written as ?. in C#) for this task. When we write a ?. b and a is nil, the result is also nil, instead of an error. Using that operator, we could write our previous example like this:

      zip = company?.director?.address?.zipcode

If any component in the path were nil, the safe operator would propagate that nil until the final result.

Lua does not offer a safe navigation operator, and we do not think it should. Lua is minimalistic. Moreover, this operator is quite controversial, with many people arguing —not without some reason— that it promotes careless programming. However, we can emulate it in Lua with a bit of extra notation.

If we execute a or {} when a is nil, the result is the empty table. So, if we execute (a or {}).b when a is nil, the result will be also nil. Using this idea, we can rewrite our original expression like this:

      zip = (((company or {}).director or {}).address or {}).zipcode

Still better, we can make it a little shorter and slightly more efficient:

      E = {}     -- can be reused in other similar expressions
      ...
      zip = (((company or E).director or E).address or E).zipcode

Granted, this syntax is more complex than the one with the safe navigation operator. Nevertheless, we write each field name only once, it performs the minimum required number of table accesses (three, in this example), and it requires no new operators in the language. In my personal opinion, it is a good enough substitute.

The Table Library

The table library offers several useful functions to operate over lists and sequences.^[7]

The function table.insert inserts an element in a given position of a sequence, moving up other elements to open space. For instance, if t is the list {10, 20, 30}, after the call table.insert(t, 1, 15) it will become {15, 10, 20, 30}. As a special and frequent case, if we call insert without a position, it inserts the element in the last position of the sequence, moving no elements. As an example, the following code reads the input stream line by line, storing all lines in a sequence:

      t = {}
      for line in io.lines() do
        table.insert(t, line)
      end
      print(#t)         --> (number of lines read)

The function table.remove removes and returns an element from the given position in a sequence, moving subsequent elements down to fill the gap. When called without a position, it removes the last element of the sequence.

With these two functions, it is straightforward to implement stacks, queues, and double queues. We can initialize such structures as t = {}. A push operation is equivalent to table.insert(t, x); a pop operation is equivalent to table.remove(t). The call table.insert(t, 1, x) inserts at the other end of the structure (its beginning, actually), and table.remove(t, 1) removes from this end. The last two operations are not particularly efficient, as they must move elements up and down. However, because the table library implements these functions in C, these loops are not too expensive, so that this implementation is good enough for small arrays (up to a few hundred elements, say).

Lua 5.3 has introduced a more general function for moving elements in a table. The call table.move(a, f, e, t) moves the elements in table a from index f until e (both inclusive) to position t. For instance, to insert an element in the beginning of a list a, we can do the following:

      table.move(a, 1, #a, 2)
      a[1] = newElement

The next code removes the first element:

      table.move(a, 2, #a, 1)
      a[#a] = nil

Note that, as is common in computing, a move actually copies values from one place to another. In this last example, we must explicitly erase the last element after the move.

We can call table.move with an extra optional parameter, a table. In that case, the function moves the elements from the first table into the second one. For instance, the call table.move(a, 1, #a, 1, {}) returns a clone of list a (by copying all its elements into a new list), while table.move(a, 1, #a, #b + 1, b) appends all elements from list a to the end of list b.

Exercises

Exercise 5.1: What will the following script print? Explain.

      sunday = "monday"; monday = "sunday"
      t = {sunday = "monday", [sunday] = monday}
      print(t.sunday, t[sunday], t[t.sunday])

Exercise 5.2: Assume the following code:

      a = {};  a.a = a

What would be the value of a.a.a.a? Is any a in that sequence somehow different from the others?

Now, add the next line to the previous code:

      a.a.a.a = 3

What would be the value of a.a.a.a now?

Exercise 5.3: Suppose that you want to create a table that maps each escape sequence for strings (the section called “Literal strings”) to its meaning. How could you write a constructor for that table?

Exercise 5.4: We can represent a polynomial a_nxⁿ + a_n-1x^n-1 + ... + a₁x¹ + a₀ in Lua as a list of its coefficients, such as {a₀, a₁, ..., a_n}.

Write a function that takes a polynomial (represented as a table) and a value for x and returns the polynomial value.

Exercise 5.5: Can you write the function from the previous item so that it uses at most n additions and n multiplications (and no exponentiations)?

Exercise 5.6: Write a function to test whether a given table is a valid sequence.

Exercise 5.7: Write a function that inserts all elements of a given list into a given position of another given list.

Exercise 5.8: The table library offers a function table.concat, which receives a list of strings and returns their concatenation:

      print(table.concat({"hello", " ", "world"}))    --> hello world

Write your own version for this function.

Compare the performance of your implementation against the built-in version for large lists, with hundreds of thousands of entries. (You can use a for loop to create those large lists.)

^[7] You can think of it as “The Sequence Library” or “The List Library”; we have kept the original name for compatibility with old versions.