33 Threads and States

Lua does not support true multithreading, that is, preemptive threads sharing memory. There are two reasons for this lack of support. The first reason is that ISO C does not offer it, and so there is no portable way to implement this mechanism in Lua. The second and stronger reason is that we do not think multithreading is a good idea for Lua.

Multithreading was developed for low-level programming. Synchronization mechanisms like semaphores and monitors were proposed in the context of operating systems (and seasoned programmers), not application programs. It is very hard to find and correct bugs related to multithreading, and several of these bugs can lead to security breaches. Moreover, multithreading can lead to performance penalties related to the need of synchronization in some critical parts of a program, such as the memory allocator.

The problems with multithreading arise from the combination of preemption with shared memory, so we can avoid them either using non-preemptive threads or not sharing memory. Lua offers support for both. Lua threads (also known as coroutines) are collaborative, and therefore avoid the problems created by unpredictable thread switching. Lua states share no memory, and therefore form a good base for parallelism in Lua. We will cover both options in this chapter.

A thread is the essence of a coroutine in Lua. We can think of a coroutine as a thread plus a nice interface, or we can think of a thread as a coroutine with a lower-level API.

From the C API perspective, you may find it useful to think of a thread as a stack—which is what a thread actually is, from an implementation point of view. Each stack keeps information about the pending calls of a thread, plus the parameters and local variables of each call. In other words, a stack has all the information that a thread needs to continue running. So, multiple threads mean multiple independent stacks.

Most functions in Lua’s C API operate on a specific stack. How does Lua know which stack to use? When calling lua_pushnumber, how do we say where to push the number? The secret is that the type lua_State, the first argument to these functions, represents not only a Lua state, but also a thread within that state. (Many people argue that this type should be called lua_Thread. Maybe they are right.)

Whenever we create a Lua state, Lua automatically creates a main thread within this state and returns a lua_State representing this thread. This main thread is never collected. It is released together with the state, when we close the state with lua_close. Programs that do not bother with threads run everything in this main thread.

We can create other threads in a state calling lua_newthread:

      lua_State *lua_newthread (lua_State *L);

This function pushes the new thread on the stack, as a value of type "thread", and returns a lua_State pointer representing this new thread. For instance, consider the following statement:

      L1 = lua_newthread(L);

After running it, we will have two threads, L1 and L, both referring internally to the same Lua state. Each thread has its own stack. The new thread L1 starts with an empty stack; the old thread L has a reference to the new thread on top of its stack:

      printf("%d\n", lua_gettop(L1));          --> 0
      printf("%s\n", luaL_typename(L, -1));    --> thread

Except for the main thread, threads are subject to garbage collection, like any other Lua object. When we create a new thread, the value pushed on the stack ensures that the thread is not collected. We should never use a thread that is not properly anchored in the state. (The main thread is internally anchored, so we do not have to worry about it.) Any call to the Lua API may collect a non-anchored thread, even a call using this thread. For instance, consider the following fragment:

        lua_State *L1 = lua_newthread (L);
        lua_pop(L, 1);         /* L1 now is garbage for Lua */
        lua_pushstring(L1, "hello");

The call to lua_pushstring may trigger the garbage collector and collect L1, crashing the application, despite the fact that L1 is in use. To avoid this, always keep a reference to the threads you are using, for instance on the stack of an anchored thread, in the registry, or in a Lua variable.

Once we have a new thread, we can use it like the main thread. We can push to and pop elements from its stack, we can use it to call functions, and the like. For instance, the following code does the call f(5) in the new thread and then moves the result to the old thread:

      lua_getglobal(L1, "f");   /* assume a global function 'f' */
      lua_pushinteger(L1, 5);
      lua_call(L1, 1, 1);
      lua_xmove(L1, L, 1);

The function lua_xmove moves Lua values between two stacks in the same state. A call like lua_xmove(F, T, n) pops n elements from the stack F and pushes them on T.

For these uses, however, we do not need a new thread; we could just use the main thread as well. The main point of using multiple threads is to implement coroutines, so that we can suspend their executions and resume them later. For that, we need the function lua_resume:

      int lua_resume (lua_State *L, lua_State *from, int narg);

To start running a coroutine, we use lua_resume as we use lua_pcall: we push the function to be called (which is the coroutine body), push its arguments, and call lua_resume passing in narg the number of arguments. (The from parameter is the thread that is doing the call or NULL.) The behavior is also much like lua_pcall, with three differences. First, lua_resume does not have a parameter for the number of wanted results; it always returns all results from the called function. Second, it does not have a parameter for a message handler; an error does not unwind the stack, so we can inspect the stack after the error. Third, if the running function yields, lua_resume returns the code LUA_YIELD and leaves the thread in a state that can be resumed later.

When lua_resume returns LUA_YIELD, the visible part of the thread’s stack contains only the values passed to yield. A call to lua_gettop will return the number of yielded values. To move these values to another thread, we can use lua_xmove.

To resume a suspended thread, we call lua_resume again. In such calls, Lua assumes that all values on the stack are to be returned by the call to yield. For instance, if we do not touch the thread’s stack between a return from lua_resume and the next resume, yield will return exactly the values it yielded.

Typically, we start a coroutine with a Lua function as its body. This Lua function can call other functions, and any of these functions can occasionally yield, terminating the call to lua_resume. For instance, assume the following definitions:

      function foo (x)  coroutine.yield(10, x)  end
      
      function foo1 (x)  foo(x + 1); return 3  end

Now, we run this C code:

      lua_State *L1 = lua_newthread(L);
      lua_getglobal(L1, "foo1");
      lua_pushinteger(L1, 20);
      lua_resume(L1, L, 1);

The call to lua_resume will return LUA_YIELD, to signal that the thread yielded. At this point, the L1 stack has the values given to yield:

      printf("%d\n", lua_gettop(L1));             --> 2
      printf("%lld\n", lua_tointeger(L1, 1));     --> 10
      printf("%lld\n", lua_tointeger(L1, 2));     --> 21

When we resume the thread again, it continues from where it stopped (the call to yield). From there, foo returns to foo1, which in turn returns to lua_resume:

      lua_resume(L1, L, 0);
      printf("%d\n", lua_gettop(L1));             --> 1
      printf("%lld\n", lua_tointeger(L1, 1));     --> 3

This second call to lua_resume will return LUA_OK, which means a normal return.

A coroutine can also call C functions, which can call back other Lua functions. We have already discussed how to use continuations to allow those Lua functions to yield (the section called “Continuations”). A C function can yield, too. In that case, it also must provide a continuation function to be called when the thread resumes. To yield, a C function must call the following function:

      int lua_yieldk (lua_State *L, int nresults, int ctx,
                                    lua_CFunction k);

We should use this function always in a return statement, such as here:

      static inf myCfunction (lua_State *L) {
        ...
        return lua_yieldk(L, nresults, ctx, k);
      }

This call immediately suspends the running coroutine. The nresults parameter is the number of values on the stack to be returned to the respective lua_resume; ctx is the context information to be passed to the continuation; and k is the continuation function. When the coroutine resumes, the control goes directly to the continuation function k. After yielding, myCfunction cannot do anything else; it must delegate any further work to its continuation.

Let us see a typical example. Suppose we want to write a function that reads some data, yielding if the data is not available. We may write the function in C like this:[35]

      int readK (lua_State *L, int status, lua_KContext ctx) {
        (void)status;  (void)ctx;  /* unused parameters */
        if (something_to_read()) {
          lua_pushstring(L, read_some_data());
          return 1;
        }
        else
          return lua_yieldk(L, 0, 0, &readK);
      }
      
      int prim_read (lua_State *L) {
        return readK(L, 0, 0);
      }

In this example, prim_read does not need to do any initialization, so it calls directly the continuation function (readK). If there is data to read, readK reads and returns this data. Otherwise, it yields. When the thread resumes, it calls the continuation function again, which will try again to read some data.

If a C function has nothing else to do after yielding, it can call lua_yieldk without a continuation function or use the macro lua_yield:

      return lua_yield(L, nres);

After this call, when the thread resumes, control returns to the function that called myCfunction.

Each call to luaL_newstate (or to lua_newstate) creates a new Lua state. Different Lua states are completely independent of each other. They share no data at all. This means that no matter what happens inside a Lua state, it cannot corrupt another Lua state. This also means that Lua states cannot communicate directly; we have to use some intervening C code. For instance, given two states L1 and L2, the following command pushes in L2 the string on the top of the stack in L1:

        lua_pushstring(L2, lua_tostring(L1, -1));

Because data must pass through C, Lua states can exchange only types that are representable in C, like strings and numbers. Other types, such as tables, must be serialized to be transferred.

In systems that offer multithreading, an interesting design is to create an independent Lua state for each thread. This design results in threads similar to POSIX processes, where we have concurrency without shared memory. In this section, we will develop a prototype implementation for multithreading following this approach. I will use POSIX threads (pthreads) for this implementation. It should not be difficult to port the code to other thread systems, as it uses only basic facilities.

The system we are going to develop is very simple. Its main purpose is to show the use of multiple Lua states in a multithreading context. After we have it up and running, we can add several advanced features on top of it. We will call our library lproc. It offers only four functions:

The library identifies channels by strings and uses them to match senders and receivers. A send operation can send any number of string values, which are returned by the matching receive operation. All communication is synchronous: a process sending a message to a channel blocks until there is a process receiving from this channel, while a process receiving from a channel blocks until there is a process sending to it.

Like its interface, the implementation of lproc is also simple. It uses two circular double-linked lists, one for processes waiting to send a message and another for processes waiting to receive a message. It uses a single mutex to control access to these lists. Each process has an associated condition variable. When a process wants to send a message to a channel, it traverses the receiving list looking for a process waiting for that channel. If it finds one, it removes the process from the waiting list, moves the message’s values from itself to the found process, and signals the other process. Otherwise, it inserts itself into the sending list and waits on its condition variable. To receive a message, it does a symmetrical operation.

A main element in the implementation is the structure that represents a process:

      #include <pthread.h>
      #include "lua.h"
      #include "lauxlib.h"
      
      typedef struct Proc {
        lua_State *L;
        pthread_t thread;
        pthread_cond_t cond;
        const char *channel;
        struct Proc *previous, *next;
      } Proc;

The first two fields represent the Lua state used by the process and the C thread that runs the process. The third field, cond, is the condition variable that the thread uses to block itself when it has to wait for a matching send/receive. The fourth field stores the channel that the process is waiting, if any. The last two fields, previous and next, are used to link the process structure into a waiting list.

The following code declares the two waiting lists and the associated mutex:

      static Proc *waitsend = NULL;
      static Proc *waitreceive = NULL;
      
      static pthread_mutex_t kernel_access = PTHREAD_MUTEX_INITIALIZER;

Each process needs a Proc structure, and it needs to access this structure whenever its script calls send or receive. The only parameter that these functions receive is the process’s Lua state; therefore, each process should store its Proc structure inside its Lua state. In our implementation, each state keeps its corresponding Proc structure as a full userdata in the registry, associated with the key "_SELF". The auxiliary function getself retrieves the Proc structure associated with a given state:

      static Proc *getself (lua_State *L) {
        Proc *p;
        lua_getfield(L, LUA_REGISTRYINDEX, "_SELF");
        p = (Proc *)lua_touserdata(L, -1);
        lua_pop(L, 1);
        return p;
      }

The next function, movevalues, moves values from a sender process to a receiver process:

      static void movevalues (lua_State *send, lua_State *rec) {
        int n = lua_gettop(send);
        int i;
        luaL_checkstack(rec, n, "too many results");
        for (i = 2; i <= n; i++)  /* move values to receiver */
          lua_pushstring(rec, lua_tostring(send, i));
      }

It moves to the receiver all values in the sender stack but the first, which will be the channel. Note that, as we are pushing an arbitrary number of elements, we have to check for stack space.

Figure 33.1, “Function to search for a process waiting for a channel” defines the function searchmatch, which traverses a list looking for a process that is waiting for a given channel.

If it finds one, it removes the process from the list and returns it; otherwise, the function returns NULL.

The last auxiliary function, in Figure 33.2, “Function to add a process to a waiting list”, is called when a process cannot find a match.

In this case, the process links itself at the end of the appropriate waiting list and waits until another process matches with it and wakes it up. (The loop around pthread_cond_wait handles the spurious wakeups allowed in POSIX threads.) When a process wakes up another, it sets the other process’s field channel to NULL. So, if p->channel is not NULL, it means that nobody matched process p, so it has to keep waiting.

With these auxiliary functions in place, we can write send and receive (Figure 33.3, “Functions to send and receive messages”).

The function ll_send starts getting the channel. Then it locks the mutex and searches for a matching receiver. If it finds one, it moves its values to this receiver, marks the receiver as ready, and wakes it up. Otherwise, it puts itself on wait. When it finishes the operation, it unlocks the mutex and returns with no values to Lua. The function ll_receive is similar, but it has to return all received values.

Now let us see how to create new processes. A new process needs a new POSIX thread, and a new thread needs a body to run. We will define this body later; here is its prototype, dictated by pthreads:

      static void *ll_thread (void *arg);

To create and run a new process, the system must create a new Lua state, start a new thread, compile the given chunk, call the chunk, and finally free its resources. The original thread does the first three tasks, and the new thread does the rest. (To simplify error handling, the system only starts the new thread after it has successfully compiled the given chunk.)

The function ll_start creates a new process (Figure 33.4, “Function to create new processes”).

This function creates a new Lua state L1 and compiles the given chunk in this new state. In case of error, it signals the error to the original state L. Then it creates a new thread (using pthread_create) with body ll_thread, passing the new state L1 as the argument to the body. The call to pthread_detach tells the system that we will not want any final answer from this thread.

The body of each new thread is the function ll_thread (Figure 33.5, “Body for new threads”), which receives its corresponding Lua state (created by ll_start) with only the precompiled main chunk on the stack.

First, it opens the standard Lua libraries and the lproc library. Second, it creates and initializes its own control block. Then, it calls its main chunk. Finally, it destroys its condition variable and closes its Lua state.

Note the use of luaL_requiref to open the lproc library.[36] This function is somewhat equivalent to require but, instead of searching for a loader, it uses the given function (luaopen_lproc, in our case) to open the library. After calling the open function, luaL_requiref registers the result in the package.loaded table, so that future calls to require the library will not try to open it again. With true as its last parameter, it also registers the library in the corresponding global variable (lproc, in our case).

Figure 33.6, “Extra functions for the lproc module” presents the last functions for the module.

Both are quite simple. The function ll_exit should be called only by the main process, when it finishes, to avoid the immediate end of the whole program. The function luaopen_lproc is a standard function for opening the module.

As I said earlier, this implementation of processes in Lua is a very simple one. There are endless improvements we can make. Here I will briefly discuss some of them.

A first obvious improvement is to change the linear search for a matching channel. A nice alternative is to use a hash table to find a channel and to use independent waiting lists for each channel.

Another improvement relates to the efficiency of process creation. The creation of new Lua states is a lightweight operation. However, the opening of all standard libraries is not that lightweight, and most processes probably will not need all standard libraries. We can avoid the cost of opening a library by using the pre-registration of libraries, as we discussed in the section called “The Function require. With this approach, instead of calling luaL_requiref for each standard library, we just put the library opening function into the package.preload table. If the process calls require "lib", then—and only then—require will call the associated function to open the library. The function registerlib, in Figure 33.7, “Registering libraries to be opened on demand”, does this registration.

It is always a good idea to open the basic library. We also need the package library; otherwise, we will not have require available to open the other libraries. All other libraries can be optional. Therefore, instead of calling luaL_openlibs, we can call our own function openlibs (shown also in Figure 33.7, “Registering libraries to be opened on demand”) when opening new states. Whenever a process needs one of these libraries, it requires the library explicitly, and require will call the corresponding luaopen_* function.

Other improvements involve the communication primitives. For instance, it would be useful to provide limits on how long lproc.send and lproc.receive should wait for a match. In particular, a zero limit would make these functions non-blocking. With POSIX threads, we could use pthread_cond_timedwait to implement this feature.

Exercise 33.1: As we saw, if a function calls lua_yield (the version with no continuation), control returns to the function that called it when the thread resumes. What values does the calling function receive as results from that call?

Exercise 33.2: Modify the lproc library so that it can send and receive other primitive types such as Booleans and numbers without converting them to strings. (Hint: you only have to modify the function movevalues.)

Exercise 33.3: Modify the lproc library so that it can send and receive tables. (Hint: you can traverse the original table building a copy in the receiving state.)

Exercise 33.4: Implement in the lproc library a non-blocking send operation.



[35] As I already mentioned, the API for continuations prior to Lua 5.3 is a little different. In particular, the continuation function has only one parameter, the Lua state.

[36] This function was introduced in Lua 5.2.

Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>