32 Managing Resources

In our implementation of Boolean arrays in the previous chapter, we did not need to worry about managing resources. Those arrays need only memory. Each userdata representing an array has its own memory, which is managed by Lua. When an array becomes garbage (that is, inaccessible by the program), Lua eventually collects it and frees its memory.

Life is not always that easy. Sometimes, an object needs other resources besides raw memory, such as file descriptors, window handles, and the like. (Often these resources are just memory too, but managed by some other part of the system.) In such cases, when the object becomes garbage and is collected, somehow these other resources must be released too.

As we saw in the section called “Finalizers”, Lua provides finalizers in the form of the __gc metamethod. To illustrate the use of this metamethod in C and of the API as a whole, in this chapter we will develop two Lua bindings for external facilities. The first example is another implementation for a function to traverse a directory. The second (and more substantial) example is a binding to Expat, an open source XML parser.

In the section called “C Functions”, we implemented a function dir to traverse directories that returned a table with all files from a given directory. Our new implementation will return an iterator that returns a new entry each time it is called. With this new implementation, we will be able to traverse a directory with a loop like this:

      for fname in dir.open(".") do
        print(fname)
      end

To iterate over a directory, in C, we need a DIR structure. Instances of DIR are created by opendir and must be explicitly released with a call to closedir. Our previous implementation kept its DIR instance as a local variable and closed this instance after retrieving the last file name. Our new implementation cannot keep this DIR instance in a local variable, because it must query this value over several calls. Moreover, it cannot close the directory only after retrieving the last name; if the program breaks the loop, the iterator will never retrieve this last name. Therefore, to make sure that the DIR instance is always released, we will store its address in a userdata and use the __gc metamethod of this userdata to release the directory structure.

Despite its central role in our implementation, this userdata representing a directory does not need to be visible to Lua. The function dir.open returns an iterator function, and this function is what Lua sees. The directory can be an upvalue of the iterator function. As such, the iterator function has direct access to this structure, but Lua code does not (and does not need to).

In all, we need three C functions. First, we need the function dir.open, a factory function that Lua calls to create iterators; it must open a DIR structure and create a closure of the iterator function with this structure as an upvalue. Second, we need the iterator function. Third, we need the __gc metamethod, which closes a DIR structure. As usual, we also need an extra function to make initial arrangements, such as to create and initialize a metatable for directories.

Let us start our code with the function dir.open, shown in Figure 32.1, “The dir.open factory function”.

A subtle point in this function is that it must create the userdata before opening the directory. If it first opens the directory, and then the call to lua_newuserdata raises a memory error, the function loses and leaks the DIR structure. With the correct order, the DIR structure, once created, is immediately associated with the userdata; whatever happens after that, the __gc metamethod will eventually release the structure.

Another subtle point is the consistency of the userdata. Once we set its metatable, the __gc metamethod will be called no matter what. So, before setting the metatable, we pre-initialize the userdata with NULL to ensure that it has some well-defined value.

The next function is dir_iter (in Figure 32.2, “Other functions for the dir library”), the iterator itself.

Its code is straightforward. It gets the DIR structure’s address from its upvalue and calls readdir to read the next entry.

The function dir_gc (also in Figure 32.2, “Other functions for the dir library”) is the __gc metamethod. This metamethod closes a directory. As we mentioned before, it must take one precaution: in case of errors in the initialization, the directory can be NULL.

The last function in Figure 32.2, “Other functions for the dir library”, luaopen_dir, is the function that opens this one-function library.

This complete example has an interesting subtlety. At first, it may seem that dir_gc should check whether its argument is a directory and whether it has not been closed already. Otherwise, a malicious user could call it with another kind of userdata (a file, for instance) or finalize a directory twice, with disastrous consequences. However, there is no way for a Lua program to access this function: it is stored only in the metatable of directories, which in turn are stored as upvalues of the iteration functions. Lua programs cannot access these directories.

Now we will look at a simplified implementation of a Lua binding for Expat, which we will call lxp. Expat is an open source XML 1.0 parser written in C. It implements SAX, the Simple API for XML. SAX is an event-based API. This means that a SAX parser reads an XML document and, as it goes, reports to the application what it finds, through callbacks. For instance, if we instruct Expat to parse a string like "<tag cap="5">hi</tag>", it will generate three events: a start-element event, when it reads the substring "<tag cap="5">"; a text event (also called a character data event), when it reads "hi"; and an end-element event, when it reads "</tag>". Each of these events calls an appropriate callback handler in the application.

Here we will not cover the entire Expat library. We will concentrate only on those parts that illustrate new techniques for interacting with Lua. Although Expat handles more than a dozen different events, we will consider only the three events that we saw in the previous example (start elements, end elements, and text).[33]

The part of the Expat API that we need for this example is small. First, we need the functions to create and destroy an Expat parser:

      XML_Parser XML_ParserCreate (const char *encoding);
      void XML_ParserFree (XML_Parser p);

The encoding argument is optional; we will use NULL in our binding.

After we have a parser, we must register its callback handlers:

      void XML_SetElementHandler(XML_Parser p,
                                 XML_StartElementHandler start,
                                 XML_EndElementHandler end);
      
      void XML_SetCharacterDataHandler(XML_Parser p,
                                       XML_CharacterDataHandler hndl);

The first function registers handlers for start and end elements. The second function registers handlers for text (character data, in XML parlance).

All callback handlers take a user data as their first parameter. The start-element handler receives also the tag name and its attributes:

      typedef void (*XML_StartElementHandler)(void *uData,
                                              const char *name,
                                              const char **atts);

The attributes come as a NULL-terminated array of strings, where each pair of consecutive strings holds an attribute name and its value. The end-element handler has only one extra parameter, the tag name:

      typedef void (*XML_EndElementHandler)(void *uData,
                                            const char *name);

Finally, a text handler receives only the text as an extra parameter. This text string is not null-terminated; instead, it has an explicit length:

      typedef void (*XML_CharacterDataHandler)(void *uData,
                                               const char *s,
                                               int len);

To feed text to Expat, we use the following function:

      int XML_Parse (XML_Parser p, const char *s, int len, int isLast);

Expat receives the document to be parsed in pieces, through successive calls to the function XML_Parse. The last argument to XML_Parse, the Boolean isLast, informs Expat whether that piece is the last one of a document. This function returns zero if it detects a parse error. (Expat also provides functions to retrieve error information, but we will ignore them here, for the sake of simplicity.)

The last function we need from Expat allows us to set the user data that will be passed to the handlers:

      void XML_SetUserData (XML_Parser p, void *uData);

Now let us have a look at how we can use this library in Lua. A first approach is a direct approach: simply export all those functions to Lua. A better approach is to adapt the functionality to Lua. For instance, because Lua is untyped, we do not need different functions to set each kind of callback. Better yet, we can avoid the callback registering functions altogether. Instead, when we create a parser, we give a callback table that contains all callback handlers, each with an appropriate key related to its corresponding event. For instance, if we want to print a layout of a document, we could use the following callback table:

      local count = 0
      
      callbacks = {
        StartElement = function (parser, tagname)
          io.write("+ ", string.rep("  ", count), tagname, "\n")
          count = count + 1
        end,
      
        EndElement = function (parser, tagname)
          count = count - 1
          io.write("- ", string.rep("  ", count), tagname, "\n")
        end,
      }

Fed with the input "<to> <yes/> </to>", these handlers would print this output:

      + to
      +   yes
      -   yes
      - to

With this API, we do not need functions to manipulate callbacks. We manipulate them directly in the callback table. Thus, the whole API needs only three functions: one to create parsers, one to parse a piece of text, and one to close a parser. Actually, we will implement the last two functions as methods of parser objects. A typical use of the API could be like this:

      local lxp = require "lxp"
      
      p = lxp.new(callbacks)       -- create new parser
      
      for l in io.lines() do       -- iterate over input lines
        assert(p:parse(l))         -- parse the line
        assert(p:parse("\n"))      -- add newline
      end
      
      assert(p:parse())            -- finish document
      p:close()                    -- close parser

Now let us turn our attention to the implementation. The first decision is how to represent a parser in Lua. It is quite natural to use a userdata containing a C structure, but what do we need to put in it? We need at least the actual Expat parser and the callback table. We must also store a Lua state, because these parser objects are all that an Expat callback receives, and the callbacks need to call Lua. We can store the Expat parser and the Lua state (which are C values) directly in a C structure. For the callback table, which is a Lua value, one option is to create a reference to it in the registry and store that reference. (We will explore this option in Exercise 32.2). Another option is to use a user value. Each userdata can have one single Lua value directly associated with it; this value is called a user value.[34] With this option, the definition for a parser object is as follows:

      #include <stdlib.h>
      #include "expat.h"
      #include "lua.h"
      #include "lauxlib.h"
      
      typedef struct lxp_userdata {
        XML_Parser parser;          /* associated expat parser */
        lua_State *L;
      } lxp_userdata;

The next step is the function that creates parser objects, lxp_make_parser. Figure 32.3, “Function to create XML parser objects” shows its code.

This function has four main steps:

The next step is the parse method lxp_parse (Figure 32.4, “Function to parse an XML fragment”), which parses a piece of XML data.

It gets two arguments: the parser object (the self of the method) and an optional piece of XML data. When called without any data, it informs Expat that the document has no more parts.

When lxp_parse calls XML_Parse, the latter function will call the handlers for each relevant element that it finds in the given piece of document. These handlers will need to access the callback table, so lxp_parse puts this table at stack index three (right after the parameters). There is one more detail in the call to XML_Parse: remember that the last argument to this function tells Expat whether the given piece of text is the last one. When we call parse without an argument, s will be NULL, so this last argument will be true.

Now let us turn our attention to the functions f_CharData, f_StartElement, and f_EndElement, which handle the callbacks. All these three functions have a similar structure: each checks whether the callback table defines a Lua handler for its specific event and, if so, prepares the arguments and then calls this Lua handler.

Let us see first the f_CharData handler, in Figure 32.5, “Handler for character data”.

Its code is quite simple. The handler receives a lxp_userdata structure as its first argument, due to our call to XML_SetUserData when we created the parser. After retrieving the Lua state, the handler can access the callback table at stack index 3, as set by lxp_parse, and the parser itself at stack index 1. Then it calls its corresponding handler in Lua (when present), with two arguments: the parser and the character data (a string).

The f_EndElement handler is quite similar to f_CharData; see Figure 32.6, “Handler for end elements”.

It also calls its corresponding Lua handler with two arguments —the parser and the tag name (again a string, but now null-terminated).

Figure 32.7, “Handler for start elements” shows the last handler, f_StartElement.

It calls the Lua handler with three arguments: the parser, the tag name, and a list of attributes. This handler is a little more complex than the others, because it needs to translate the tag’s list of attributes into Lua. It uses a quite natural translation, building a table that maps attribute names to their values. For instance, a start tag like

      <to method="post" priority="high">

generates the following table of attributes:

      {method = "post", priority = "high"}

The last method for parsers is close, in Figure 32.8, “Method to close an XML parser”.

When we close a parser, we have to free its resources, namely the Expat structure. Remember that, due to occasional errors during its creation, a parser may not have this resource. Notice how we keep the parser in a consistent state as we close it, so there is no problem if we try to close it again or when the garbage collector finalizes it. Actually, we will use exactly this function as the finalizer. This ensures that every parser eventually frees its resources, even if the programmer does not close it.

Figure 32.9, “Initialization code for the lxp library” is the final step: it shows luaopen_lxp, which opens the library, putting all previous parts together.

We use here the same scheme that we used in the object-oriented Boolean-array example from the section called “Object-Oriented Access”: we create a metatable, make its __index field point to itself, and put all the methods inside it. For that, we need a list with the parser methods (lxp_meths). We also need a list with the functions of this library (lxp_funcs). As is common with object-oriented libraries, this list has a single function, which creates new parsers.

Exercise 32.1: Modify the function dir_iter in the directory example so that it closes the DIR structure as soon as it reaches the end of the traversal. With this change, the program does not need to wait for a garbage collection to release a resource that it knows it will not need anymore.

(When you close the directory, you should set the address stored in the userdata to NULL, to signal to the finalizer that the directory is already closed. Also, dir_iter will have to check whether the directory is closed before using it.)

Exercise 32.2: In the lxp example, we used user values to associate the callback table with the userdata that represents a parser. This choice created a small problem, because what the C callbacks receive is the lxp_userdata structure, and that structure does not offer direct access to the table. We solved this problem by storing the callback table at a fixed stack index during the parse of each fragment.

An alternative design would be to associate the callback table with the userdata through references (the section called “The registry”): we create a reference to the callback table and store the reference (an integer) in the lxp_userdata structure. Implement this alternative. Do not forget to release the reference when closing the parser.



[34] In Lua 5.2, this user value must be table.

Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>