31 User-Defined Types in C

In the previous chapter, we saw how to extend Lua with new functions written in C. Now, we will see how to extend Lua with new types written in C. We will start with a small example; through the chapter, we will extend it with metamethods and other goodies.

Our running example in this chapter will be a quite simple type: Boolean arrays. The main motivation for this example is that it does not involve complex algorithms, so we can concentrate on API issues. Nevertheless, the example is useful by itself. Of course, we can use tables to implement arrays of Booleans in Lua. But a C implementation, where we store each entry in one single bit, uses less than 3% of the memory used by a table.

Our implementation will need the following definitions:

      #include <limits.h>
      
      #define BITS_PER_WORD (CHAR_BIT * sizeof(unsigned int))
      #define I_WORD(i)     ((unsigned int)(i) / BITS_PER_WORD)
      #define I_BIT(i)      (1 << ((unsigned int)(i) % BITS_PER_WORD))

BITS_PER_WORD is the number of bits in an unsigned integer. The macro I_WORD computes the word that stores the bit corresponding to a given index, and I_BIT computes a mask to access the correct bit inside this word.

We will represent our arrays with the following structure:

      typedef struct BitArray {
        int size;
        unsigned int values[1];  /* variable part */
      } BitArray;

We declare the array values with size 1 only as a placeholder, because C 89 does not allow an array with size 0; we will set the actual size when we allocate the array. The next expression computes the total size for an array with n elements:

      sizeof(BitArray) + I_WORD(n - 1) * sizeof(unsigned int)

(We subtract one from n because the original structure already includes space for one element.)

In this first version, we will use explicit calls to set and get values, as in the next example:

      a = array.new(1000)
      for i = 1, 1000 do
        array.set(a, i, i % 2 == 0)     -- a[i] = (i % 2 == 0)
      end
      print(array.get(a, 10))           --> true
      print(array.get(a, 11))           --> false
      print(array.size(a))              --> 1000

Later we will see how to support both an object-oriented style, like a:get(i), and a conventional syntax, like a[i]. For all versions, the underlying functions are the same, defined in Figure 31.1, “Manipulating a Boolean array”.

Let us see them, bit by bit.

Our first concern is how to represent a C structure in Lua. Lua provides a basic type specifically for this task, called userdata. A userdata offers a raw memory area, with no predefined operations in Lua, which we can use to store anything.

The function lua_newuserdata allocates a block of memory with a given size, pushes the corresponding userdata on the stack, and returns the block address:

      void *lua_newuserdata (lua_State *L, size_t size);

If for some reason we need to allocate memory by other means, it is very easy to create a userdata with the size of a pointer and to store there a pointer to the real memory block. We will see examples of this technique in Chapter 32, Managing Resources.

Our first function in Figure 31.1, “Manipulating a Boolean array”, newarray, uses lua_newuserdata to create new arrays. Its code is straightforward. It checks its sole parameter (the array size, in bits), computes the array size in bytes, creates a userdata with the appropriate size, initializes its fields, and returns the userdata to Lua.

The next function is setarray, which receives three arguments: the array, the index, and the new value. It assumes that indices start at one, as usual in Lua. Because Lua accepts any value for a Boolean, we use luaL_checkany for the third parameter: it ensures only that there is a value (any value) for this parameter. If we call setarray with bad arguments, we get explanatory error messages, as in the following examples:

      array.set(0, 11, 0)
        --> stdin:1: bad argument #1 to 'set' ('array' expected)
      array.set(a, 1)
        --> stdin:1: bad argument #3 to 'set' (value expected)

The last function in Figure 31.1, “Manipulating a Boolean array” is getarray, the function to retrieve an entry. It is similar to setarray.

We will also define a function to retrieve the size of an array and some extra code to initialize our library; see Figure 31.2, “Extra code for the Boolean array library”.

Again, we use luaL_newlib, from the auxiliary library. It creates a table and fills it with the pairs name–function specified by the array arraylib.

Our current implementation has a major vulnerability. Suppose the user writes something like array.set(io.stdin, 1, false). The value of io.stdin is a userdata with a pointer to a stream (FILE *). Because it is a userdata, array.set will gladly accept it as a valid argument; the probable result will be a memory corruption (with luck we will get an index-out-of-range error instead). Such behavior is unacceptable for any Lua library. No matter how we use a library, it should neither corrupt C data nor cause the Lua system to crash.

The usual method to distinguish one type of userdata from another is to create a unique metatable for that type. Every time we create a userdata, we mark it with the corresponding metatable; every time we get a userdata, we check whether it has the right metatable. Because Lua code cannot change the metatable of a userdata, it cannot deceive these checks.

We also need a place to store this new metatable, so that we can access it to create new userdata and to check whether a given userdata has the correct type. As we saw earlier, there are two options for storing the metatable: in the registry or as an upvalue for the functions in the library. It is customary, in Lua, to register any new C type into the registry, using a type name as the index and the metatable as the value. As with any other registry index, we must choose a type name with care, to avoid clashes. Our example will use the name "LuaBook.array" for its new type.

As usual, the auxiliary library offers some functions to help us here. The new auxiliary functions we will use are these:

      int   luaL_newmetatable (lua_State *L, const char *tname);
      void  luaL_getmetatable (lua_State *L, const char *tname);
      void *luaL_checkudata   (lua_State *L, int index,
                                             const char *tname);

The function luaL_newmetatable creates a new table (to be used as a metatable), leaves the new table on the top of the stack, and maps the table to the given name in the registry. The function luaL_getmetatable retrieves the metatable associated with tname from the registry. Finally, luaL_checkudata checks whether the object at the given stack position is a userdata with a metatable that matches the given name. It raises an error if the object is not a userdata or if it does not have the correct metatable; otherwise, it returns the userdata address.

Now we can start our modifications. The first step is to change the function that opens the library so that it creates the metatable for arrays:

      int luaopen_array (lua_State *L) {
        luaL_newmetatable(L, "LuaBook.array");
        luaL_newlib(L, arraylib);
        return 1;
      }

The next step is to change newarray so that it sets this metatable in all arrays that it creates:

      static int newarray (lua_State *L) {
      
        as before
      
        luaL_getmetatable(L, "LuaBook.array");
        lua_setmetatable(L, -2);
      
        return 1;  /* new userdata is already on the stack */
      }

The function lua_setmetatable pops a table from the stack and sets it as the metatable of the object at the given index. In our case, this object is the new userdata.

Finally, setarray, getarray, and getsize have to check whether they have got a valid array as their first argument. To simplify their tasks, we define the following macro:

      #define checkarray(L) \
                (BitArray *)luaL_checkudata(L, 1, "LuaBook.array")

Using this macro, the new definition for getsize is straightforward:

      static int getsize (lua_State *L) {
        BitArray *a = checkarray(L);
        lua_pushinteger(L, a->size);
        return 1;
      }

Because setarray and getarray also share code to read and check the index as their second argument, we factor out their common parts in a new auxiliary function (getparams).

With this new function, setarray and getarray are straightforward, see Figure 31.3, “New versions for setarray/getarray. Now, if we call them with an invalid userdata, we will get a proper error message:

      a = array.get(io.stdin, 10)
      --> bad argument #1 to 'get' (LuaBook.array expected, got FILE*)

Our next step is to transform our new type into an object, so that we can operate on its instances using the usual object-oriented syntax, like this:

      a = array.new(1000)
      print(a:size())     --> 1000
      a:set(10, true)
      print(a:get(10))    --> true

Remember that a:size() is equivalent to a.size(a). Therefore, we have to arrange for the expression a.size to return our function getsize. The key mechanism here is the __index metamethod. For tables, Lua calls this metamethod whenever it cannot find a value for a given key. For userdata, Lua calls it in every access, because userdata have no keys at all.

Assume that we run the following code:

      do
        local metaarray = getmetatable(array.new(1))
        metaarray.__index = metaarray
        metaarray.set = array.set
        metaarray.get = array.get
        metaarray.size = array.size
      end

In the first line, we create an array only to get its metatable, which we assign to metaarray. (We cannot set the metatable of a userdata from Lua, but we can get it.) Then we set metaarray.__index to metaarray. When we evaluate a.size, Lua cannot find the key "size" in the object a, because the object is a userdata. Therefore, Lua tries to get this value from the field __index of the metatable of a, which happens to be metaarray itself. But metaarray.size is array.size, so a.size(a) results in array.size(a), as we wanted.

Of course, we can write the same thing in C. We can do even better: now that arrays are objects, with their own operations, we do not need to have these operations in the table array anymore. The only function that our library still has to export is new, to create new arrays. All other operations come only as methods. The C code can register them directly as such.

The operations getsize, getarray, and setarray do not change from our previous approach. What will change is how we register them. That is, we have to change the code that opens the library. First, we need two separate function lists: one for regular functions and one for methods.

      static const struct luaL_Reg arraylib_f [] = {
        {"new", newarray},
        {NULL, NULL}
      };
      
      static const struct luaL_Reg arraylib_m [] = {
        {"set", setarray},
        {"get", getarray},
        {"size", getsize},
        {NULL, NULL}
      };

The new version of the open function luaopen_array has to create the metatable, assign it to its own __index field, register all the methods there, and create and fill the array table:

      int luaopen_array (lua_State *L) {
        luaL_newmetatable(L, "LuaBook.array");  /* create metatable */
        lua_pushvalue(L, -1);  /* duplicate the metatable */
        lua_setfield(L, -2, "__index");  /* mt.__index = mt */
        luaL_setfuncs(L, arraylib_m, 0);  /* register metamethods */
        luaL_newlib(L, arraylib_f);  /* create lib table */
        return 1;
      }

Here we use luaL_setfuncs again, to set the functions from the list arraylib_m into the metatable, which is on the top of the stack. Then we call luaL_newlib to create a new table and register the functions from the list arraylib_f there.

As a final touch, we will add a __tostring method to our new type, so that print(a) prints "array" plus the size of the array inside parentheses. The function itself is here:

      int array2string (lua_State *L) {
        BitArray *a = checkarray(L);
        lua_pushfstring(L, "array(%d)", a->size);
        return 1;
      }

The call to lua_pushfstring formats the string and leaves it on the top of the stack. We also have to add array2string to the list arraylib_m, to include it in the metatable of array objects:

      static const struct luaL_Reg arraylib_m [] = {
        {"__tostring", array2string},
        other methods
      };

A better alternative to the object-oriented notation is to use a regular array notation to access our arrays. Instead of writing a:get(i), we could simply write a[i]. For our example, this is easy to do, because our functions setarray and getarray already receive their arguments in the order that they are given to the corresponding metamethods. A quick solution is to define these metamethods directly in Lua:

      local metaarray = getmetatable(array.new(1))
      metaarray.__index = array.get
      metaarray.__newindex = array.set
      metaarray.__len = array.size

(We must run this code on the original implementation for arrays, without the modifications for object-oriented access.) That is all we need to use the standard syntax:

      a = array.new(1000)
      a[10] = true         -- 'setarray'
      print(a[10])         -- 'getarray'   --> true
      print(#a)            -- 'getsize'    --> 1000

If we prefer, we can register these metamethods in our C code. For this, we again modify our initialization function; see Figure 31.4, “New initialization code for the Bit Array library”.

In this new version, again we have only one public function, new. All other functions are available only as metamethods for specific operations.

The kind of userdata that we have been using until now is called full userdata. Lua offers another kind of userdata, called light userdata.

A light userdata is a value that represents a C pointer, that is, a void * value. A light userdata is a value, not an object; we do not create them (in the same way that we do not create numbers). To put a light userdata onto the stack, we call lua_pushlightuserdata:

      void lua_pushlightuserdata (lua_State *L, void *p);

Despite their common name, light userdata and full userdata are quite different things. Light userdata are not buffers, but bare pointers. They have no metatables. Like numbers, light userdata are not managed by the garbage collector.

Sometimes, people use light userdata as a cheap alternative to full userdata. This is not a typical use, however. First, light userdata do not have metatables, so there is no way to know their types. Second, despite the name, full userdata are inexpensive, too. They add little overhead compared to a malloc for the given memory size.

The real use of light userdata comes from equality. As a full userdata is an object, it is only equal to itself. A light userdata, on the other hand, represents a C pointer value. As such, it is equal to any userdata that represents the same pointer. Therefore, we can use light userdata to find C objects inside Lua.

We have already seen a typical use of light userdata, as keys in the registry (the section called “The registry”). There, the equality of light userdata was fundamental. Every time we push the same address with lua_pushlightuserdata, we get the same Lua value and, therefore, the same entry in the registry.

Another typical scenario in Lua is to have Lua objects acting as proxies to corresponding C objects. For instance, the I/O library uses Lua userdata to represent C streams inside Lua. When the action goes from Lua to C, the mapping from the Lua object to the C object is easy. Again using the example of the I/O library, each Lua stream keeps a pointer to its corresponding C stream. However, when the action goes from C to Lua, the mapping can be tricky. As an example, suppose we have some kind of callback in our I/O system (e.g., to tell that there is data to be read). The callback receives the C stream where is should operate. From that, how can we find its corresponding Lua object? Because the C stream is defined by the C standard library, not by us, we cannot store anything there.

Light userdata provide a nice solution for this mapping. We keep a table where the indices are light userdata with the stream addresses, and the values are the full userdata that represent the streams in Lua. In a callback, once we have a stream address, we use it —as a light userdata— as an index into that table to retrieve its corresponding Lua object. (The table should probably have weak values; otherwise, those full userdata would never be collected.)

Exercise 31.1: Modify the implementation of setarray so that it accepts only Boolean values.

Exercise 31.2: We can see a Boolean array as a set of integers (the indices with true values in the array). Add to the implementation of Boolean arrays functions to compute the union and intersection of two arrays. These functions should receive two arrays and return a new one, without modifying its parameters.

Exercise 31.3: Extend the previous exercise so that we can use addition to get the union of two arrays and multiplication for the intersection.

Exercise 31.4: Modify the implementation of the __tostring metamethod so that it shows the full contents of the array in an appropriate way. Use the buffer facility (the section called “String Manipulation”) to create the resulting string.

Exercise 31.5: Based on the example for Boolean arrays, implement a small C library for integer arrays.

Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>