7 The External World

Because of its emphasis on portability and embeddability, Lua itself does not offer much in terms of facilities to communicate with the external world. Most I/O in real Lua programs is done either by the host application or through external libraries not included in the main distribution, from graphics to databases and network access. Pure Lua offers only the functionalities that the ISO C standard offers —namely, basic file manipulation plus some extras. In this chapter, we will see how the standard libraries cover these functionalities.

The I/O library offers two different models for file manipulation. The simple model assumes a current input stream and a current output stream, and its I/O operations operate on these streams. The library initializes the current input stream to the process’s standard input (stdin) and the current output stream to the process’s standard output (stdout). Therefore, when we execute something like io.read(), we read a line from the standard input.

We can change these current streams with the functions io.input and io.output. A call like io.input(filename) opens a stream over the given file in read mode and sets it as the current input stream. From this point on, all input will come from this file, until another call to io.input. The function io.output does a similar job for output. In case of error, both functions raise the error. If you want to handle errors directly, you should use the complete I/O model.

As write is simpler than read, we will look at it first. The function io.write simply takes an arbitrary number of strings (or numbers) and writes them to the current output stream. Because we can call it with multiple arguments, we should avoid calls like io.write(a..b..c); the call io.write(a, b, c) accomplishes the same effect with fewer resources, as it avoids the concatenations.

As a rule, you should use print only for quick-and-dirty programs or debugging; always use io.write when you need full control over your output. Unlike print, write adds no extra characters to the output, such as tabs or newlines. Moreover, io.write allows you to redirect your output, whereas print always uses the standard output. Finally, print automatically applies tostring to its arguments; this is handy for debugging, but it also can hide subtle bugs.

The function io.write converts numbers to strings following the usual conversion rules; for full control over this conversion, we should use string.format:

      > io.write("sin(3) = ", math.sin(3), "\n")
        --> sin(3) = 0.14112000805987
      > io.write(string.format("sin(3) = %.4f\n", math.sin(3)))
        --> sin(3) = 0.1411

The function io.read reads strings from the current input stream. Its arguments control what to read:[9]

"a"

reads the whole file

"l"

reads the next line (dropping the newline)

"L"

reads the next line (keeping the newline)

"n"

reads a number

num

reads num characters as a string

The call io.read("a") reads the whole current input file, starting at its current position. If we are at the end of the file, or if the file is empty, the call returns an empty string.

Because Lua handles long strings efficiently, a simple technique for writing filters in Lua is to read the whole file into a string, process the string, and then write the string to the output:

      t = io.read("a")                      -- read the whole file
      t = string.gsub(t, "bad", "good")     -- do the job
      io.write(t)                           -- write the file

As a more concrete example, the following chunk is a complete program to code a file’s content using the MIME quoted-printable encoding. This encoding codes each non-ASCII byte as =xx, where xx is the value of the byte in hexadecimal. To keep the consistency of the encoding, it must encode the equals sign as well:

      t = io.read("all")
      t = string.gsub(t, "([\128-\255=])", function (c)
            return string.format("=%02X", string.byte(c))
          end)
      io.write(t)

The function string.gsub will match all non-ASCII bytes (codes from 128 to 255), plus the equals sign, and call the given function to provide a replacement. (We will discuss pattern matching in detail in Chapter 10, Pattern Matching.)

The call io.read("l") returns the next line from the current input stream, without the newline character; the call io.read("L") is similar, but it keeps the newline (if present in the file). When we reach the end of file, the call returns nil, as there is no next line to return. Option "l" is the default for read. Usually, I use this option only when the algorithm naturally handles the data line by line; otherwise, I favor reading the whole file at once, with option "a", or in blocks, as we will see later.

As a simple example of the use of line-oriented input, the following program copies its current input to the current output, numbering each line:

      for count = 1, math.huge do
        local line = io.read("L")
        if line == nil then break end
        io.write(string.format("%6d  ", count), line)
      end

However, to iterate on a whole file line by line, the io.lines iterator allows a simpler code:

      local count = 0
      for line in io.lines() do
        count = count + 1
        io.write(string.format("%6d  ", count), line, "\n")
      end

As another example of line-oriented input, Figure 7.1, “A program to sort a file” shows a complete program to sort the lines of a file.

The call io.read("n") reads a number from the current input stream. This is the only case where read returns a number (integer or float, following the same rules of the Lua scanner) instead of a string. If, after skipping spaces, io.read cannot find a numeral at the current file position (because of bad format or end of file), it returns nil.

Besides the basic read patterns, we can call read with a number n as an argument: in this case, it tries to read n characters from the input stream. If it cannot read any character (end of file), the call returns nil; otherwise, it returns a string with at most n characters from the stream. As an example of this read pattern, the following program is an efficient way to copy a file from stdin to stdout:

      while true do
        local block = io.read(2^13)          -- block size is 8K
        if not block then break end
        io.write(block)
      end

As a special case, io.read(0) works as a test for end of file: it returns an empty string if there is more to be read or nil otherwise.

We can call read with multiple options; for each argument, the function will return the respective result. Suppose we have a file with three numbers per line:

      6.0       -3.23     15e12
      4.3       234       1000001
      ...

Now we want to print the maximum value of each line. We can read all three numbers of each line with a single call to read:

      while true do
        local n1, n2, n3 = io.read("n", "n", "n")
        if not n1 then break end
        print(math.max(n1, n2, n3))
      end

The simple I/O model is convenient for simple things, but it is not enough for more advanced file manipulation, such as reading from or writing to several files simultaneously. For these manipulations, we need the complete model.

To open a file, we use the function io.open, which mimics the C function fopen. It takes as arguments the name of the file to open plus a mode string. This mode string can contain an r for reading, a w for writing (which also erases any previous content of the file), or an a for appending, plus an optional b to open binary files. The function open returns a new stream over the file. In case of error, open returns nil, plus an error message and a system-dependent error number:

      print(io.open("non-existent-file", "r"))
        --> nil     non-existent-file: No such file or directory    2
      
      print(io.open("/etc/passwd", "w"))
        --> nil     /etc/passwd: Permission denied  13

A typical idiom to check for errors is to use the function assert:

      local f = assert(io.open(filename, mode))

If the open fails, the error message goes as the second argument to assert, which then shows the message.

After we open a file, we can read from or write to the resulting stream with the methods read and write. They are similar to the functions read and write, but we call them as methods on the stream object, using the colon operator. For instance, to open a file and read it all, we can use a fragment like this:

      local f = assert(io.open(filename, "r"))
      local t = f:read("a")
      f:close()

(We will discuss the colon operator in detail in Chapter 21, Object-Oriented Programming.)

The I/O library offers handles for the three predefined C streams, called io.stdin, io.stdout, and io.stderr. For instance, we can send a message directly to the error stream with a code like this:

      io.stderr:write(message)

The functions io.input and io.output allow us to mix the complete model with the simple model. We get the current input stream by calling io.input(), without arguments. We set this stream with the call io.input(handle). (Similar calls are also valid for io.output.) For instance, if we want to change the current input stream temporarily, we can write something like this:

      local temp = io.input()     -- save current stream
      io.input("newinput")        -- open a new current stream
      do something with new input
      io.input():close()          -- close current stream
      io.input(temp)              -- restore previous current stream

Note that io.read(args) is actually a shorthand for io.input():read(args), that is, the read method applied over the current input stream. Similarly, io.write(args) is a shorthand for io.output():write(args).

Instead of io.read, we can also use io.lines to read from a stream. As we saw in previous examples, io.lines gives an iterator that repeatedly reads from a stream. Given a file name, io.lines will open a stream over the file in read mode and will close it after reaching end of file. When called with no arguments, io.lines will read from the current input stream. We can also use lines as a method over handles. Moreover, since Lua 5.2 io.lines accepts the same options that io.read accepts. As an example, the next fragment copies the current input to the current output, iterating over blocks of 8 KB:

      for block in io.input():lines(2^13) do
        io.write(block)
      end

The function io.tmpfile returns a stream over a temporary file, open in read/write mode. This file is automatically removed (deleted) when the program ends.

The function flush executes all pending writes to a file. Like the function write, we can call it as a function —io.flush()— to flush the current output stream, or as a method —f:flush()— to flush the stream f.

The setvbuf method sets the buffering mode of a stream. Its first argument is a string: "no" means no buffering; "full" means that the stream data is only written out when the buffer is full or when we explicitly flush the file; and "line" means that the output is buffered until a newline is output or there is any input from some special files (such as a terminal device). For the last two options, setvbuf accepts an optional second argument with the buffer size.

In most systems, the standard error stream (io.stderr) is not buffered, while the standard output stream (io.stdout) is buffered in line mode. So, if we write incomplete lines to the standard output (e.g., a progress indicator), we may need to flush the stream to see that output.

The seek method can both get and set the current position of a stream in a file. Its general form is f:seek(whence, offset), where the whence parameter is a string that specifies how to interpret the offset. Its valid values are "set", for offsets relative to the beginning of the file; "cur", for offsets relative to the current position in the file; and "end", for offsets relative to the end of the file. Independently of the value of whence, the call returns the new current position of the stream, measured in bytes from the beginning of the file.

The default value for whence is "cur" and for offset is zero. Therefore, the call file:seek() returns the current stream position, without changing it; the call file:seek("set") resets the position to the beginning of the file (and returns zero); and the call file:seek("end") sets the position to the end of the file and returns its size. The following function gets the file size without changing its current position:

      function fsize (file)
        local current = file:seek()      -- save current position
        local size = file:seek("end")    -- get file size
        file:seek("set", current)        -- restore position
        return size
      end

To complete the set, os.rename changes the name of a file and os.remove removes (deletes) a file. Note that these functions come from the os library, not the io library, because they manipulate real files, not streams.

All these functions return nil plus an error message and an error code in case of errors.

The function os.exit terminates the execution of a program. Its optional first argument is the return status of the program. It can be a number (zero means a successful execution) or a Boolean (true means a successful execution). An optional second argument, if true, closes the Lua state, calling all finalizers and releasing all memory used by that state. (Usually this finalization is not necessary, because most operating systems release all resources used by a process when it exits.)

The function os.getenv gets the value of an environment variable. It takes the name of the variable and returns a string with its value:

      print(os.getenv("HOME"))    --> /home/lua

The call returns nil for undefined variables.

The function os.execute runs a system command; it is equivalent to the C function system. It takes a string with the command and returns information regarding how the command terminated. The first result is a Boolean: true means the program exited with no errors. The second result is a string: "exit" if the program terminated normally or "signal" if it was interrupted by a signal. A third result is the return status (if the program terminated normally) or the number of the signal that terminated the program. As an example, both in POSIX and Windows we can use the following function to create new directories:

      function createDir (dirname)
        os.execute("mkdir " .. dirname)
      end

Another quite useful function is io.popen.[10] Like os.execute, it runs a system command, but it also connects the command output (or input) to a new local stream and returns that stream, so that our script can read data from (or write to) the command. For instance, the following script builds a table with the entries in the current directory:

      -- for POSIX systems, use 'ls' instead of 'dir'
      local f = io.popen("dir /B", "r")
      local dir = {}
      for entry in f:lines() do
        dir[#dir + 1] = entry
      end

The second parameter ("r") to io.popen means that we intend to read from the command. The default is to read, so this parameter is optional in the example.

The next example sends an email message:

      local subject = "some news"
      local address = "someone@somewhere.org"
      
      local cmd = string.format("mail -s '%s' '%s'", subject, address)
      local f = io.popen(cmd, "w")
      f:write([[
      Nothing important to say.
      -- me
      ]])
      f:close()

(This script only works on POSIX systems, with the appropriate packages installed.) The second parameter to io.popen now is "w", meaning that we intend to write to the command.

As we can see from those two previous examples, both os.execute and io.popen are powerful functions, but they are also highly system dependent.

For extended OS access, your best option is to use an external Lua library, such as LuaFileSystem, for basic manipulation of directories and file attributes, or luaposix, which offers much of the functionality of the POSIX.1 standard.

Exercise 7.1: Write a program that reads a text file and rewrites it with its lines sorted in alphabetical order. When called with no arguments, it should read from standard input and write to standard output. When called with one file-name argument, it should read from that file and write to standard output. When called with two file-name arguments, it should read from the first file and write to the second.

Exercise 7.2: Change the previous program so that it asks for confirmation if the user gives the name of an existing file for its output.

Exercise 7.3: Compare the performance of Lua programs that copy the standard input stream to the standard output stream in the following ways:

  • byte by byte;

  • line by line;

  • in chunks of 8 kB;

  • the whole file at once.

For the last option, how large can the input file be?

Exercise 7.4: Write a program that prints the last line of a text file. Try to avoid reading the entire file when the file is large and seekable.

Exercise 7.5: Generalize the previous program so that it prints the last n lines of a text file. Again, try to avoid reading the entire file when the file is large and seekable.

Exercise 7.6: Using os.execute and io.popen, write functions to create a directory, to remove a directory, and to collect the entries in a directory.

Exercise 7.7: Can you use os.execute to change the current directory of your Lua script? Why?



[9] In Lua 5.2 and before, all string options should be preceded by an asterisk. Lua 5.3 still accepts the asterisk for compatibility.

[10] This function is not available in all Lua installations, because the corresponding functionality is not part of ISO C. Despite not being standard in C, we included it in the standard libraries due to its generality and presence in major operating systems.

Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>