Our next complete program is an implementation of the Markov chain algorithm, described by Kernighan & Pike in their book The Practice of Programming (Addison-Wesley, 1999).
The program generates pseudo-random text based on what words can follow a sequence of n previous words in a base text. For this implementation, we will assume that n is two.
The first part of the program reads the base text and builds a table that, for each prefix of two words, gives a list of the words that follow that prefix in the text. After building the table, the program uses it to generate random text, wherein each word follows two previous words with the same probability as in the base text. As a result, we have text that is very, but not quite, random. For instance, when applied to this book, the output of the program has pieces like this: “Constructors can also traverse a table constructor, then the parentheses in the following line does the whole file in a field n to store the contents of each function, but to show its only argument. If you want to find the maximum element in an array can return both the maximum value and continues showing the prompt and running the code. The following words are reserved and cannot be used to convert between degrees and radians.”
To use a two-word prefix as a key in tables, we will represent it by the two words concatenated with a space in between:
function prefix (w1, w2) return w1 .. " " .. w2 end
We use the string NOWORD
(a newline)
to initialize the prefix words and
to mark the end of the text.
For instance, for the text
"the more we try the more we do"
the table of following words would be like this:
{ ["\n \n"] = {"the"}, ["\n the"] = {"more"}, ["the more"] = {"we", "we"}, ["more we"] = {"try", "do"}, ["we try"] = {"the"}, ["try the"] = {"more"}, ["we do"] = {"\n"}, }
The program keeps its table in the variable statetab
.
To insert a new word in a list in this table,
we use the following function:
function insert (prefix, value) local list = statetab[prefix] if list == nil then statetab[prefix] = {value} else list[#list + 1] = value end end
It first checks whether that prefix already has a list; if not, it creates a new one with the new value. Otherwise, it inserts the new value at the end of the existing list.
To build the statetab
table, we keep two variables,
w1
and w2
, with the last two words read.
We read the words using the iterator allwords
,
from the section called “Iterators and Closures”,
but we adapted the definition of “word” to include
optional punctuation marks,
such as commas and periods (see Figure 19.1, “Auxiliary definitions for the Markov program”).
For each new word read,
we add it to the list associated with w1
–w2
and then update w1
and w2
.
After building the table,
the program starts to generate a text with MAXGEN
words.
First, it re-initializes variables w1
and w2
.
Then, for each prefix,
it chooses a next word randomly from the list of valid next words,
prints this word, and updates w1
and w2
.
Figure 19.1, “Auxiliary definitions for the Markov program” and Figure 19.2, “The Markov program” show the complete program.
Figure 19.1. Auxiliary definitions for the Markov program
function allwords () local line = io.read() -- current line local pos = 1 -- current position in the line return function () -- iterator function while line do -- repeat while there are lines local w, e = string.match(line, "(%w+[,;.:]?)()", pos) if w then -- found a word? pos = e -- update next position return w -- return the word else line = io.read() -- word not found; try next line pos = 1 -- restart from first position end end return nil -- no more lines: end of traversal end end function prefix (w1, w2) return w1 .. " " .. w2 end local statetab = {} function insert (prefix, value) local list = statetab[prefix] if list == nil then statetab[prefix] = {value} else list[#list + 1] = value end end
Figure 19.2. The Markov program
local MAXGEN = 200 local NOWORD = "\n" -- build table local w1, w2 = NOWORD, NOWORD for nextword in allwords() do insert(prefix(w1, w2), nextword) w1 = w2; w2 = nextword; end insert(prefix(w1, w2), NOWORD) -- generate text w1 = NOWORD; w2 = NOWORD -- reinitialize for i = 1, MAXGEN do local list = statetab[prefix(w1, w2)] -- choose a random item from list local r = math.random(#list) local nextword = list[r] if nextword == NOWORD then return end io.write(nextword, " ") w1 = w2; w2 = nextword end
Personal copy of Eric Taylor <jdslkgjf.iapgjflksfg@yandex.com>