In our implementation of Boolean arrays in the previous chapter, we did not need to worry about managing resources. Those arrays need only memory. Each userdata representing an array has its own memory, which is managed by Lua. When an array becomes garbage (that is, inaccessible by the program), Lua eventually collects it and frees its memory.
Life is not always that easy. Sometimes, an object needs other resources besides raw memory, such as file descriptors, window handles, and the like. (Often these resources are just memory too, but managed by some other part of the system.) In such cases, when the object becomes garbage and is collected, somehow these other resources must be released too.
As we saw in the section called “Finalizers”,
Lua provides finalizers in the form of the __gc
metamethod.
To illustrate the use of this metamethod in C and of the API as a whole,
in this chapter we will develop
two Lua bindings for external facilities.
The first example is another implementation
for a function to traverse a directory.
The second (and more substantial)
example is a binding to Expat,
an open source XML parser.
In the section called “C Functions”,
we implemented a function dir
to traverse directories
that returned a table with all files from a given directory.
Our new implementation will return an iterator that returns
a new entry each time it is called.
With this new implementation,
we will be able to traverse a directory with a loop like this:
for fname in dir.open(".") do print(fname) end
To iterate over a directory, in C,
we need a DIR
structure.
Instances of DIR
are created by opendir
and must be explicitly released with a call to closedir
.
Our previous implementation kept its DIR
instance
as a local variable
and closed this instance after retrieving the last file name.
Our new implementation cannot keep this
DIR
instance in a local variable,
because it must query this value over several calls.
Moreover,
it cannot close the directory only after retrieving the last name;
if the program breaks the loop,
the iterator will never retrieve this last name.
Therefore,
to make sure that the DIR
instance is always released,
we will store its address in a userdata
and use the __gc
metamethod of this userdata
to release the directory structure.
Despite its central role in our implementation,
this userdata representing a directory
does not need to be visible to Lua.
The function dir.open
returns an iterator function,
and this function is what Lua sees.
The directory can be an upvalue of the iterator function.
As such, the iterator function has direct access to this structure,
but Lua code does not (and does not need to).
In all, we need three C functions.
First, we need the function dir.open
,
a factory function that Lua calls to create iterators;
it must open a DIR
structure and
create a closure of the iterator function
with this structure as an upvalue.
Second, we need the iterator function.
Third, we need the __gc
metamethod,
which closes a DIR
structure.
As usual,
we also need an extra function to make initial arrangements,
such as to create and initialize a metatable for directories.
Let us start our code with the function dir.open
,
shown in Figure 32.1, “The dir.open
factory function”.
Figure 32.1. The dir.open
factory function
#include <dirent.h> #include <errno.h> #include <string.h> #include "lua.h" #include "lauxlib.h" /* forward declaration for the iterator function */ static int dir_iter (lua_State *L); static int l_dir (lua_State *L) { const char *path = luaL_checkstring(L, 1); /* create a userdata to store a DIR address */ DIR **d = (DIR **)lua_newuserdata(L, sizeof(DIR *)); /* pre-initialize it */ *d = NULL; /* set its metatable */ luaL_getmetatable(L, "LuaBook.dir"); lua_setmetatable(L, -2); /* try to open the given directory */ *d = opendir(path); if (*d == NULL) /* error opening the directory? */ luaL_error(L, "cannot open %s: %s", path, strerror(errno)); /* creates and returns the iterator function; its sole upvalue, the directory userdata, is already on the top of the stack */ lua_pushcclosure(L, dir_iter, 1); return 1; }
A subtle point in this function is that it must create the userdata
before opening the directory.
If it first opens the directory,
and then the call to lua_newuserdata
raises a memory error,
the function loses and leaks the DIR
structure.
With the correct order,
the DIR
structure, once created,
is immediately associated with the userdata;
whatever happens after that,
the __gc
metamethod will eventually release the structure.
Another subtle point is
the consistency of the userdata.
Once we set its metatable,
the __gc
metamethod will be called no matter what.
So, before setting the metatable,
we pre-initialize the userdata with NULL
to ensure that it has some well-defined value.
The next function is dir_iter
(in Figure 32.2, “Other functions for the dir
library”),
the iterator itself.
Figure 32.2. Other functions for the dir
library
static int dir_iter (lua_State *L) { DIR *d = *(DIR **)lua_touserdata(L, lua_upvalueindex(1)); struct dirent *entry = readdir(d); if (entry != NULL) { lua_pushstring(L, entry->d_name); return 1; } else return 0; /* no more values to return */ } static int dir_gc (lua_State *L) { DIR *d = *(DIR **)lua_touserdata(L, 1); if (d) closedir(d); return 0; } static const struct luaL_Reg dirlib [] = { {"open", l_dir}, {NULL, NULL} }; int luaopen_dir (lua_State *L) { luaL_newmetatable(L, "LuaBook.dir"); /* set its __gc field */ lua_pushcfunction(L, dir_gc); lua_setfield(L, -2, "__gc"); /* create the library */ luaL_newlib(L, dirlib); return 1; }
Its code is straightforward.
It gets the DIR
structure’s address from its upvalue and
calls readdir
to read the next entry.
The function dir_gc
(also in Figure 32.2, “Other functions for the dir
library”)
is the __gc
metamethod.
This metamethod closes a directory.
As we mentioned before,
it must take one precaution:
in case of errors in the initialization,
the directory can be NULL
.
The last function in Figure 32.2, “Other functions for the dir
library”,
luaopen_dir
,
is the function that opens this one-function library.
This complete example has an interesting subtlety.
At first, it may seem that dir_gc
should check
whether its argument is a directory
and whether it has not been closed already.
Otherwise,
a malicious user could call it with another kind of userdata
(a file, for instance) or finalize a directory twice,
with disastrous consequences.
However, there is no way for a Lua program to access this function:
it is stored only in the metatable of directories,
which in turn are stored as upvalues of the iteration functions.
Lua programs cannot access these directories.
Now we will look at a simplified implementation of
a Lua binding for Expat,
which we will call lxp
.
Expat is an open source XML 1.0 parser written in C.
It implements SAX, the Simple API for XML.
SAX is an event-based API.
This means that a SAX parser reads an XML document and,
as it goes,
reports to the application what it finds, through callbacks.
For instance,
if we instruct Expat to parse a string like "<tag cap="5">hi</tag>"
,
it will generate three events:
a start-element event,
when it reads the substring "<tag cap="5">"
;
a text event (also called a character data event),
when it reads "hi"
;
and an end-element event, when it reads "</tag>"
.
Each of these events calls an appropriate
callback handler in the application.
Here we will not cover the entire Expat library. We will concentrate only on those parts that illustrate new techniques for interacting with Lua. Although Expat handles more than a dozen different events, we will consider only the three events that we saw in the previous example (start elements, end elements, and text).[33]
The part of the Expat API that we need for this example is small. First, we need the functions to create and destroy an Expat parser:
XML_Parser XML_ParserCreate (const char *encoding); void XML_ParserFree (XML_Parser p);
The encoding
argument is optional;
we will use NULL
in our binding.
After we have a parser, we must register its callback handlers:
void XML_SetElementHandler(XML_Parser p, XML_StartElementHandler start, XML_EndElementHandler end); void XML_SetCharacterDataHandler(XML_Parser p, XML_CharacterDataHandler hndl);
The first function registers handlers for start and end elements. The second function registers handlers for text (character data, in XML parlance).
All callback handlers take a user data as their first parameter. The start-element handler receives also the tag name and its attributes:
typedef void (*XML_StartElementHandler)(void *uData, const char *name, const char **atts);
The attributes come as a NULL-terminated array of strings, where each pair of consecutive strings holds an attribute name and its value. The end-element handler has only one extra parameter, the tag name:
typedef void (*XML_EndElementHandler)(void *uData, const char *name);
Finally, a text handler receives only the text as an extra parameter. This text string is not null-terminated; instead, it has an explicit length:
typedef void (*XML_CharacterDataHandler)(void *uData, const char *s, int len);
To feed text to Expat, we use the following function:
int XML_Parse (XML_Parser p, const char *s, int len, int isLast);
Expat receives the document to be parsed in pieces,
through successive calls to the function XML_Parse
.
The last argument to XML_Parse
, the Boolean isLast
,
informs Expat whether that piece is the last one of a document.
This function returns zero if it detects a parse error.
(Expat also provides functions to retrieve error information,
but we will ignore them here, for the sake of simplicity.)
The last function we need from Expat allows us to set the user data that will be passed to the handlers:
void XML_SetUserData (XML_Parser p, void *uData);
Now let us have a look at how we can use this library in Lua. A first approach is a direct approach: simply export all those functions to Lua. A better approach is to adapt the functionality to Lua. For instance, because Lua is untyped, we do not need different functions to set each kind of callback. Better yet, we can avoid the callback registering functions altogether. Instead, when we create a parser, we give a callback table that contains all callback handlers, each with an appropriate key related to its corresponding event. For instance, if we want to print a layout of a document, we could use the following callback table:
local count = 0 callbacks = { StartElement = function (parser, tagname) io.write("+ ", string.rep(" ", count), tagname, "\n") count = count + 1 end, EndElement = function (parser, tagname) count = count - 1 io.write("- ", string.rep(" ", count), tagname, "\n") end, }
Fed with the input "<to> <yes/> </to>"
,
these handlers would print this output:
+ to + yes - yes - to
With this API, we do not need functions to manipulate callbacks. We manipulate them directly in the callback table. Thus, the whole API needs only three functions: one to create parsers, one to parse a piece of text, and one to close a parser. Actually, we will implement the last two functions as methods of parser objects. A typical use of the API could be like this:
local lxp = require "lxp" p = lxp.new(callbacks) -- create new parser for l in io.lines() do -- iterate over input lines assert(p:parse(l)) -- parse the line assert(p:parse("\n")) -- add newline end assert(p:parse()) -- finish document p:close() -- close parser
Now let us turn our attention to the implementation. The first decision is how to represent a parser in Lua. It is quite natural to use a userdata containing a C structure, but what do we need to put in it? We need at least the actual Expat parser and the callback table. We must also store a Lua state, because these parser objects are all that an Expat callback receives, and the callbacks need to call Lua. We can store the Expat parser and the Lua state (which are C values) directly in a C structure. For the callback table, which is a Lua value, one option is to create a reference to it in the registry and store that reference. (We will explore this option in Exercise 32.2). Another option is to use a user value. Each userdata can have one single Lua value directly associated with it; this value is called a user value.[34] With this option, the definition for a parser object is as follows:
#include <stdlib.h> #include "expat.h" #include "lua.h" #include "lauxlib.h" typedef struct lxp_userdata { XML_Parser parser; /* associated expat parser */ lua_State *L; } lxp_userdata;
The next step is the function that creates parser objects,
lxp_make_parser
.
Figure 32.3, “Function to create XML parser objects” shows its code.
Figure 32.3. Function to create XML parser objects
/* forward declarations for callback functions */ static void f_StartElement (void *ud, const char *name, const char **atts); static void f_CharData (void *ud, const char *s, int len); static void f_EndElement (void *ud, const char *name); static int lxp_make_parser (lua_State *L) { XML_Parser p; /* (1) create a parser object */ lxp_userdata *xpu = (lxp_userdata *)lua_newuserdata(L, sizeof(lxp_userdata)); /* pre-initialize it, in case of error */ xpu->parser = NULL; /* set its metatable */ luaL_getmetatable(L, "Expat"); lua_setmetatable(L, -2); /* (2) create the Expat parser */ p = xpu->parser = XML_ParserCreate(NULL); if (!p) luaL_error(L, "XML_ParserCreate failed"); /* (3) check and store the callback table */ luaL_checktype(L, 1, LUA_TTABLE); lua_pushvalue(L, 1); /* push table */ lua_setuservalue(L, -2); /* set it as the user value */ /* (4) configure Expat parser */ XML_SetUserData(p, xpu); XML_SetElementHandler(p, f_StartElement, f_EndElement); XML_SetCharacterDataHandler(p, f_CharData); return 1; }
This function has four main steps:
Its first step follows a common pattern: it first creates a userdata; then it pre-initializes the userdata with consistent values; and finally it sets its metatable. (The pre-initialization ensures that if there is any error during the initialization, the finalizer will find the userdata in a consistent state.)
In step 2, the function creates an Expat parser, stores it in the userdata, and checks for errors.
Step 3 ensures that the first argument to the function is actually a table (the callback table), and sets it as the user value for the new userdata.
The last step initializes the Expat parser. It sets the userdata as the object to be passed to the callback functions and it sets the callback functions. Notice that these callback functions are the same for all parsers; after all, it is impossible to dynamically create new functions in C. Instead, those fixed C functions will use the callback table to decide which Lua functions they should call each time.
The next step is the parse method
lxp_parse
(Figure 32.4, “Function to parse an XML fragment”),
which parses a piece of XML data.
Figure 32.4. Function to parse an XML fragment
static int lxp_parse (lua_State *L) { int status; size_t len; const char *s; lxp_userdata *xpu; /* get and check first argument (should be a parser) */ xpu = (lxp_userdata *)luaL_checkudata(L, 1, "Expat"); /* check if it is not closed */ luaL_argcheck(L, xpu->parser != NULL, 1, "parser is closed"); /* get second argument (a string) */ s = luaL_optlstring(L, 2, NULL, &len); /* put callback table at stack index 3 */ lua_settop(L, 2); lua_getuservalue(L, 1); xpu->L = L; /* set Lua state */ /* call Expat to parse string */ status = XML_Parse(xpu->parser, s, (int)len, s == NULL); /* return error code */ lua_pushboolean(L, status); return 1; }
It gets two arguments: the parser object (the self of the method) and an optional piece of XML data. When called without any data, it informs Expat that the document has no more parts.
When lxp_parse
calls XML_Parse
,
the latter function will call the handlers for each relevant
element that it finds in the given piece of document.
These handlers will need to access the callback table,
so lxp_parse
puts this table at stack index three
(right after the parameters).
There is one more detail in the call to XML_Parse
:
remember that the last argument to this function
tells Expat whether the given piece of text is the last one.
When we call parse
without an argument,
s
will be NULL
,
so this last argument will be true.
Now let us turn our attention to the functions
f_CharData
, f_StartElement
, and f_EndElement
,
which handle the callbacks.
All these three functions have a similar structure:
each checks whether the callback table defines a Lua
handler for its specific event and, if so, prepares the
arguments and then calls this Lua handler.
Let us see first the f_CharData
handler,
in Figure 32.5, “Handler for character data”.
Figure 32.5. Handler for character data
static void f_CharData (void *ud, const char *s, int len) { lxp_userdata *xpu = (lxp_userdata *)ud; lua_State *L = xpu->L; /* get handler from callback table */ lua_getfield(L, 3, "CharacterData"); if (lua_isnil(L, -1)) { /* no handler? */ lua_pop(L, 1); return; } lua_pushvalue(L, 1); /* push the parser ('self') */ lua_pushlstring(L, s, len); /* push Char data */ lua_call(L, 2, 0); /* call the handler */ }
Its code is quite simple.
The handler receives a
lxp_userdata
structure as its first argument,
due to our call to XML_SetUserData
when
we created the parser.
After retrieving the Lua state,
the handler can access the callback table at stack index 3,
as set by lxp_parse
,
and the parser itself at stack index 1.
Then it calls its corresponding handler in Lua (when present),
with two arguments:
the parser and the character data (a string).
The f_EndElement
handler is quite similar to f_CharData
;
see Figure 32.6, “Handler for end elements”.
Figure 32.6. Handler for end elements
static void f_EndElement (void *ud, const char *name) { lxp_userdata *xpu = (lxp_userdata *)ud; lua_State *L = xpu->L; lua_getfield(L, 3, "EndElement"); if (lua_isnil(L, -1)) { /* no handler? */ lua_pop(L, 1); return; } lua_pushvalue(L, 1); /* push the parser ('self') */ lua_pushstring(L, name); /* push tag name */ lua_call(L, 2, 0); /* call the handler */ }
It also calls its corresponding Lua handler with two arguments —the parser and the tag name (again a string, but now null-terminated).
Figure 32.7, “Handler for start elements” shows
the last handler, f_StartElement
.
Figure 32.7. Handler for start elements
static void f_StartElement (void *ud, const char *name, const char **atts) { lxp_userdata *xpu = (lxp_userdata *)ud; lua_State *L = xpu->L; lua_getfield(L, 3, "StartElement"); if (lua_isnil(L, -1)) { /* no handler? */ lua_pop(L, 1); return; } lua_pushvalue(L, 1); /* push the parser ('self') */ lua_pushstring(L, name); /* push tag name */ /* create and fill the attribute table */ lua_newtable(L); for (; *atts; atts += 2) { lua_pushstring(L, *(atts + 1)); lua_setfield(L, -2, *atts); /* table[*atts] = *(atts+1) */ } lua_call(L, 3, 0); /* call the handler */ }
It calls the Lua handler with three arguments: the parser, the tag name, and a list of attributes. This handler is a little more complex than the others, because it needs to translate the tag’s list of attributes into Lua. It uses a quite natural translation, building a table that maps attribute names to their values. For instance, a start tag like
<to method="post" priority="high">
generates the following table of attributes:
{method = "post", priority = "high"}
The last method for parsers is close
,
in Figure 32.8, “Method to close an XML parser”.
When we close a parser, we have to free its resources, namely the Expat structure. Remember that, due to occasional errors during its creation, a parser may not have this resource. Notice how we keep the parser in a consistent state as we close it, so there is no problem if we try to close it again or when the garbage collector finalizes it. Actually, we will use exactly this function as the finalizer. This ensures that every parser eventually frees its resources, even if the programmer does not close it.
Figure 32.9, “Initialization code for the lxp
library” is the final step:
it shows luaopen_lxp
,
which opens the library,
putting all previous parts together.
Figure 32.9. Initialization code for the lxp
library
static const struct luaL_Reg lxp_meths[] = { {"parse", lxp_parse}, {"close", lxp_close}, {"__gc", lxp_close}, {NULL, NULL} }; static const struct luaL_Reg lxp_funcs[] = { {"new", lxp_make_parser}, {NULL, NULL} }; int luaopen_lxp (lua_State *L) { /* create metatable */ luaL_newmetatable(L, "Expat"); /* metatable.__index = metatable */ lua_pushvalue(L, -1); lua_setfield(L, -2, "__index"); /* register methods */ luaL_setfuncs(L, lxp_meths, 0); /* register functions (only lxp.new) */ luaL_newlib(L, lxp_funcs); return 1; }
We use here the same scheme that we used in the
object-oriented Boolean-array example from the section called “Object-Oriented Access”:
we create a metatable,
make its __index
field point to itself,
and put all the methods inside it.
For that, we need a list with the parser methods
(lxp_meths
).
We also need a list with the functions of this library
(lxp_funcs
).
As is common with object-oriented libraries,
this list has a single function,
which creates new parsers.
Exercise 32.1:
Modify the function dir_iter
in the directory example
so that it closes the DIR
structure as soon as it reaches
the end of the traversal.
With this change,
the program does not need to wait for a garbage collection
to release a resource that it knows it will not need anymore.
(When you close the directory,
you should set the address stored in the userdata to NULL
,
to signal to the finalizer that the directory is already closed.
Also, dir_iter
will have to check whether the directory
is closed before using it.)
Exercise 32.2:
In the lxp
example,
we used user values to associate the callback table
with the userdata that represents a parser.
This choice created a small problem,
because what the C callbacks receive
is the lxp_userdata
structure,
and that structure does not offer direct access to the table.
We solved this problem by storing the callback table at
a fixed stack index during the parse of each fragment.
An alternative design would be to associate the callback table
with the userdata through references (the section called “The registry”):
we create a reference to the callback table and
store the reference (an integer) in the lxp_userdata
structure.
Implement this alternative.
Do not forget to release the reference when closing the parser.
Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>