22 The Environment

Global variables are a necessary evil of most programming languages. On one hand, the use of global variables can easily lead to complex code, entangling apparently unrelated parts of a program. On the other hand, the judicious use of global variables can better express truly global aspects of a program; moreover, global constants are innocuous, but dynamic languages like Lua have no way to distinguish constants from variables. An embedded language like Lua adds another ingredient to this mix: a global variable is a variable that is visible in the whole program, but Lua has no clear concept of a program, having instead pieces of code (chunks) called by the host application.

Lua solves this conundrum by not having global variables, but going to great lengths to pretend it has. In a first approximation, we can think that Lua keeps all its global variables in a regular table, called the global environment. Later in this chapter, we will see that Lua can keep its global variables in several environments. For now, we will stick to that first approximation.

The use of a table to store global variables simplifies the internal implementation of Lua, because there is no need for a different data structure for global variables. Another advantage is that we can manipulate this table like any other table. To help such manipulations, Lua stores the global environment itself in the global variable _G. (As a result, _G._G is equal to _G.) For instance, the following code prints the names of all the variables defined in the global environment:

      for n in pairs(_G) do print(n) end

Usually, assignment is enough for accessing and setting global variables. However, sometimes we need some form of meta-programming, such as when we need to manipulate a global variable whose name is stored in another variable or is somehow computed at run time. To get the value of such a variable, some programmers are tempted to write something like this:

      value = load("return " .. varname)()

If varname is x, for example, the concatenation will result in "return x", which when run achieves the desired result. However, this code involves the creation and compilation of a new chunk, which is somewhat expensive. We can accomplish the same effect with the following code, which is more than an order of magnitude more efficient than the previous one:

      value = _G[varname]

Because the environment is a regular table, we can simply index it with the desired key (the variable name).

In a similar way, we can assign a value to a global variable whose name is computed dynamically by writing _G[varname] = value. Beware, however: some programmers get a little excited with these facilities and end up writing code like _G["a"] = _G["b"], which is just a complicated way to write a = b.

A generalization of the previous problem is to allow fields in the dynamic name, such as "io.read" or "a.b.c.d". If we write _G["io.read"], clearly we will not get the field read from the table io. But we can write a function getfield such that getfield("io.read") returns the expected result. This function is mainly a loop, which starts at _G and evolves field by field:

      function getfield (f)
        local v = _G    -- start with the table of globals
        for w in string.gmatch(f, "[%a_][%w_]*") do
          v = v[w]
        end
        return v
      end

We rely on gmatch to iterate over all identifiers in f.

The corresponding function to set fields is a little more complex. An assignment like a.b.c.d = v is equivalent to the following code:

      local temp = a.b.c
      temp.d = v

That is, we must retrieve up to the last name and then handle this last name separately. The function setfield, in Figure 22.1, “The function setfield, does the task and also creates intermediate tables in a path when they do not exist.

The pattern there captures the field name in the variable w and an optional following dot in the variable d. If a field name is not followed by a dot, then it is the last name.

With the previous functions in place, the next call creates a global table t, another table t.x, and assigns 10 to t.x.y:

      setfield("t.x.y", 10)
      
      print(t.x.y)               --> 10
      print(getfield("t.x.y"))   --> 10

Global variables in Lua do not need declarations. Although this behavior is handy for small programs, in larger programs a simple typo can cause bugs that are difficult to find. However, we can change this behavior if we like. Because Lua keeps its global variables in a regular table, we can use metatables to detect when Lua accesses non-existent variables.

A first approach simply detects any access to absent keys in the global table:

      setmetatable(_G, {
        __newindex = function (_, n)
          error("attempt to write to undeclared variable " .. n, 2)
        end,
        __index = function (_, n)
          error("attempt to read undeclared variable " .. n, 2)
        end,
      })

After this code, any attempt to access a non-existent global variable will trigger an error:

      > print(a)
      stdin:1: attempt to read undeclared variable a

But how do we declare new variables? One option is to use rawset, which bypasses the metamethod:

      function declare (name, initval)
        rawset(_G, name, initval or false)
      end

(The or with false ensures that the new global always gets a value different from nil.)

A simpler option is to restrict assignments to new global variables only inside functions, allowing free assignments in the outer level of a chunk.

To check whether an assignment is in the main chunk, we must use the debug library. The call debug.getinfo(2, "S") returns a table whose field what tells whether the function that called the metamethod is a main chunk, a regular Lua function, or a C function. (We will see debug.getinfo in more detail in the section called “Introspective Facilities”.) Using this function, we can rewrite the __newindex metamethod like this:

        __newindex = function (t, n, v)
          local w = debug.getinfo(2, "S").what
          if w ~= "main" and w ~= "C" then
            error("attempt to write to undeclared variable " .. n, 2)
          end
          rawset(t, n, v)
        end

This new version also accepts assignments from C code, as this kind of code usually knows what it is doing.

If we need to test whether a variable exists, we cannot simply compare it to nil because, if it is nil, the access will raise an error. Instead, we use rawget, which avoids the metamethod:

      if rawget(_G, var) == nil then
        -- 'var' is undeclared
        ...
      end

As it is, our scheme does not allow global variables with nil values, as they would be automatically considered undeclared. But it is not difficult to correct this problem. All we need is an auxiliary table that keeps the names of declared variables. Whenever a metamethod is called, it checks in this table whether the variable is undeclared. The code can be like the one in Figure 22.2, “Checking global-variable declaration”.

Now, even an assignment like x = nil is enough to declare a global variable.

The overhead for both solutions is negligible. With the first solution, the metamethods are never called during normal operation. In the second, they can be called, but only when the program accesses a variable holding a nil.

The Lua distribution comes with a module strict.lua that implements a global-variable check that uses essentially the code in Figure 22.2, “Checking global-variable declaration”. It is a good habit to use it when developing Lua code.

In Lua, global variables do not need to be truly global. As I already hinted, Lua does not even have global variables. That may sound strange at first, as we have been using global variables all along this text. As I said, Lua goes to great lengths to give the programmer an illusion of global variables. Now we will see how Lua builds this illusion.[19]

First, let us forget about global variables. Instead, we will start with the concept of free names. A free name is a name that is not bound to an explicit declaration, that is, it does not occur inside the scope of a corresponding local variable. For instance, both x and y are free names in the following chunk, but z is not:

      local z = 10
      x = y + z

Now comes the important part: The Lua compiler translates any free name x in the chunk to _ENV.x. So, the previous chunk is fully equivalent to this one:

      local z = 10
      _ENV.x = _ENV.y + z

But what is this new _ENV variable?

_ENV cannot be a global variable; we just said that Lua has no global variables. Again, the compiler does the trick. I already mentioned that Lua treats any chunk as an anonymous function. Actually, Lua compiles our original chunk as the following code:

      local _ENV = some value
      return function (...)
        local z = 10
        _ENV.x = _ENV.y + z
      end

That is, Lua compiles any chunk in the presence of a predefined upvalue (an external local variable) called _ENV. So, any variable is either local, if it is a bounded name, or a field in _ENV, which itself is a local variable (an upvalue).

The initial value for _ENV can be any table. (Actually, it does not need to be a table; more about that later.) Any such table is called an environment. To preserve the illusion of global variables, Lua keeps internally a table that it uses as a global environment. Usually, when we load a chunk, the function load initializes this predefined upvalue with that global environment. So, our original chunk becomes equivalent to this one:

      local _ENV = the global environment
      return function (...)
        local z = 10
        _ENV.x = _ENV.y + z
      end

The result of all these arrangements is that the x field of the global environment gets the value of the y field plus 10.

At first sight, this may seem a rather convoluted way to manipulate global variables. I will not argue that it is the simplest way, but it offers a flexibility that is difficult to achieve with a simpler implementation.

Before we go on, let us summarize the handling of global variables in Lua:

After all, it is not that complicated.

Some people get confused because they try to infer extra magic from these rules. There is no extra magic. In particular, the first two rules are done entirely by the compiler. Except for being predefined by the compiler, _ENV is a plain regular variable. Outside the compiler, the name _ENV has no special meaning at all to Lua.[20] Similarly, the translation from x to _ENV.x is a plain syntactic translation, with no hidden meanings. In particular, after the translation, _ENV will refer to whatever _ENV variable is visible at that point in the code, following the standard visibility rules.

In this section, we will see some ways to explore the flexibility brought by _ENV. Keep in mind that we must run most examples in this section as a single chunk. If we enter code line by line in interactive mode, each line becomes a different chunk and therefore each will have a distinct _ENV variable. To run a piece of code as a single chunk, we can either run it from a file or enclose it in a doend block.

Because _ENV is a regular variable, we can assign to and access it as any other variable. The assignment _ENV = nil will invalidate any direct access to global variables in the rest of the chunk. This can be useful to control what variables our code uses:

      local print, sin = print, math.sin
      _ENV = nil
      print(13)                 --> 13
      print(sin(13))            --> 0.42016703682664
      print(math.cos(13))       -- error!

Any assignment to a free name (a global variable) will raise a similar error.

We can write the _ENV explicitly to bypass a local declaration:

      a = 13          -- global
      local a = 12
      print(a)        --> 12  (local)
      print(_ENV.a)   --> 13  (global)

We can do the same with _G:

      a = 13          -- global
      local a = 12
      print(a)        --> 12  (local)
      print(_G.a)   --> 13  (global)

Usually, _G and _ENV refer to the same table but, despite that, they are quite different entities. _ENV is a local variable, and all accesses to global variables in reality are accesses to it. _G is a global variable with no special status whatsoever. By definition, _ENV always refers to the current environment; _G usually refers to the global environment, provided it is visible and no one changed its value.

The main use for _ENV is to change the environment used by a piece of code. Once we change the environment, all global accesses will use the new table:

      -- change current environment to a new empty table
      _ENV = {}
      a = 1       -- create a field in _ENV
      print(a)
        --> stdin:4: attempt to call global 'print' (a nil value)

If the new environment is empty, we have lost all our global variables, including print. So, we should first populate it with some useful values, for instance with the global environment:

      a = 15                      -- create a global variable
      _ENV = {g = _G}             -- change current environment
      a = 1                       -- create a field in _ENV
      g.print(_ENV.a, g.a)        --> 1    15

Now, when we access the global g (which lives in _ENV, not in the global environment) we get the global environment, wherein Lua will find the function print.

We can rewrite the previous example using the name _G instead of g:

      a = 15                      -- create a global variable
      _ENV = {_G = _G}            -- change current environment
      a = 1                       -- create a field in _ENV
      _G.print(_ENV.a, _G.a)     --> 1    15

The only special status of _G happens when Lua creates the initial global table and makes its field _G points to itself. Lua does not care about the current value of this variable. Nevertheless, it is customary to use this same name whenever we have a reference to the global environment, as we did in the rewritten example.

Another way to populate our new environment is with inheritance:

      a = 1
      local newgt = {}        -- create new environment
      setmetatable(newgt, {__index = _G})
      _ENV = newgt            -- set it
      print(a)                --> 1

In this code, the new environment inherits both print and a from the global one. However, any assignment goes to the new table. There is no danger of changing a variable in the global environment by mistake, although we still can change them through _G:

      -- continuing the previous chunk
      a = 10
      print(a, _G.a)          --> 10    1
      _G.a = 20
      print(_G.a)             --> 20

Being a regular variable, _ENV follows the usual scoping rules. In particular, functions defined inside a chunk access _ENV as they access any other external variable:

      _ENV = {_G = _G}
      local function foo ()
        _G.print(a)     -- compiled as '_ENV._G.print(_ENV.a)'
      end
      a = 10
      foo()             --> 10
      _ENV = {_G = _G, a = 20}
      foo()             --> 20

If we define a new local variable called _ENV, references to free names will bind to that new variable:

      a = 2
      do
        local _ENV = {print = print, a = 14}
        print(a)     --> 14
      end
      print(a)       --> 2   (back to the original _ENV)

Therefore, it is not difficult to define a function with a private environment:

      function factory (_ENV)
        return function () return a end
      end
      
      f1 = factory{a = 6}
      f2 = factory{a = 7}
      print(f1())        --> 6
      print(f2())        --> 7

The factory function creates simple closures that return the value of their global a. When the closure is created, its visible _ENV variable is the parameter _ENV of the enclosing factory function; therefore, each closure will use its own external variable (as an upvalue) to access its free names.

Using the usual scoping rules, we can manipulate environments in several other ways. For instance, we may have several functions sharing a common environment, or a function that changes the environment that it shares with other functions.

In the section called “The Basic Approach for Writing Modules in Lua”, when we discussed how to write modules, I mentioned that one drawback of those methods was that it was all too easy to pollute the global space, for instance by forgetting a local in a private declaration. Environments offer an interesting technique for solving that problem. Once the module main chunk has an exclusive environment, not only all its functions share this table, but also all its global variables go to this table. We can declare all public functions as global variables and they will go to a separate table automatically. All the module has to do is to assign this table to the _ENV variable. After that, when we declare a function add, it goes to M.add:

      local M = {}
      _ENV = M
      function add (c1, c2)
        return new(c1.r + c2.r, c1.i + c2.i)
      end

Moreover, we can call other functions from the same module without any prefix. In the previous code, add gets new from its environment, that is, it calls M.new.

This method offers a good support for modules, with little extra work for the programmer. It needs no prefixes at all. There is no difference between calling an exported function and a private one. If the programmer forgets a local, he does not pollute the global namespace; instead, a private function simply becomes public.

Nevertheless, currently I still prefer the original basic method. It may need more work, but the resulting code states clearly what it does. To avoid creating a global by mistake, I use the simple method of assigning nil to _ENV. After that, any assignment to a global name will raise an error. This approach has the extra advantage that it works without changes in older versions of Lua. (In Lua 5.1, the assignment to _ENV will not prevent errors, but it will not cause any harm, either.)

To access other modules, we can use one of the methods we discussed in the previous section. For instance, we can declare a local variable that holds the global environment:

      local M = {}
      local _G = _G
      _ENV = nil

We then prefix global names with _G and module names with M.

A more disciplined approach is to declare as locals only the functions we need or, at most, the modules we need:

      -- module setup
      local M = {}
      
      -- Import Section:
      -- declare everything this module needs from outside
      local sqrt = math.sqrt
      local io = io
      
      -- no more external access after this point
      _ENV = nil

This technique demands more work, but it documents the module dependencies better.

As I mentioned earlier, load usually initializes the _ENV upvalue of a loaded chunk with the global environment. However, load has an optional fourth parameter that allows us to give a different initial value for _ENV. (The function loadfile has a similar parameter.)

For an initial example, consider that we have a typical configuration file, defining several constants and functions to be used by a program; it can be something like this:

      -- file 'config.lua'
      width = 200
      height = 300
      ...

We can load it with the following code:

      env = {}
      loadfile("config.lua", "t", env)()

The whole code in the configuration file will run in the empty environment env, which works as a kind of sandbox. In particular, all definitions will go into this environment. The configuration file has no way to affect anything else, even by mistake. Even malicious code cannot do much damage. It can do a denial of service (DoS) attack, by wasting CPU time and memory, but nothing else.

Sometimes, we may want to run a chunk several times, each time with a different environment table. In that case, the extra argument to load is not useful. Instead, we have two other options.

The first option is to use the function debug.setupvalue, from the debug library. As its name implies, setupvalue allows us to change any upvalue of a given function. The next fragment illustrates its use:

      f = load("b = 10; return a")
      env = {a = 20}
      debug.setupvalue(f, 1, env)
      print(f())              --> 20
      print(env.b)            --> 10

The first argument in the call to setupvalue is the function, the second is the upvalue index, and the third is the new value for the upvalue. For this kind of use, the second argument is always one: when a function represents a chunk, Lua assures that it has only one upvalue and that this upvalue is _ENV.

A small drawback of this option is its dependence on the debug library. This library breaks some usual assumptions about programs. For instance, debug.setupvalue breaks Lua’s visibility rules, which ensures that we cannot access a local variable from outside its lexical scope.

Another option to run a chunk with several different environments is to twist the chunk a little when loading it. Imagine that we add the following line just before the chunk:

      _ENV = ...;

Remember that Lua compiles any chunk as a variadic function. So, that extra line of code will assign to the _ENV variable the first argument passed to the chunk, thereby setting that argument as the environment. The following code snippet illustrates the idea, using the function loadwithprefix that you implemented in Exercise 16.1:

      prefix = "_ENV = ...;"
      f = loadwithprefix(prefix, io.lines(filename, "*L"))
      ...
      env1 = {}
      f(env1)
      env2 = {}
      f(env2)

Exercise 22.1: The function getfield that we defined in the beginning of this chapter is too forgiving, as it accepts fields like math?sin or string!!!gsub. Rewrite it so that it accepts only single dots as name separators.

Exercise 22.2: Explain in detail what happens in the following program and what it will print.

      local foo
      do
        local _ENV = _ENV
        function foo ()  print(X) end
      end
      X = 13
      _ENV = nil
      foo()
      X = 0

Exercise 22.3: Explain in detail what happens in the following program and what it will print.

      local print = print
      function foo (_ENV, a)
        print(a + b)
      end
      
      foo({b = 14}, 12)
      foo({b = 10}, 1)



[19] This mechanism was one of the parts of Lua that changed most from version 5.1 to 5.2. Very little of the following discussion applies to Lua 5.1.

[20] To be completely honest, Lua uses that name for error messages, so that it reports an error involving a variable _ENV.x as being about global x.

Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>