Chapter 12. Fortunate Son

Now I laugh and make a fortune / Off the same ones that I tortured

They Might Be Giants, “Kiss Me, Son of God” (1988)

In this chapter, you will create a Rust version of the fortune program that will print a randomly selected aphorism or bit of trivia or interesting ASCII art 1 from a database of text files. The program gets its name from a fortune cookie, a crisp cookie that contains a small piece of paper printed with a short bit of text that might be a fortune like “You will take a trip soon” or that might be a short joke or saying. When I was first learning to use a Unix terminal in my undergraduate days,² a successful login would often include the output from fortune.

You will learn how to do the following:

Use the Path and PathBuf structs to represent system paths
Parse records of text spanning multiple lines from a file
Use randomness and control it with seeds
Use the OsStr and OsString types to represent filenames

How fortune Works

I will start by describing how fortune works so you will have an idea of what your version will need to do. You may first need to install the program,3 as it is not often present by default on most systems. Here’s a bit of the manual page, which you can read with man fortune:

NAME
       fortune - print a random, hopefully interesting, adage

SYNOPSIS
       fortune [-acefilosuw] [-n length] [ -m pattern] [[n%] file/dir/all]

DESCRIPTION
       When  fortune  is run with no arguments it prints out a random epigram.
       Epigrams are divided into several categories, where  each  category  is
       sub-divided  into those which are potentially offensive and those which
       are not.

The original program has many options, but the challenge program will be concerned only with the following:

  -m pattern
         Print out all fortunes which match the basic regular  expression
         pattern.   The  syntax  of these expressions depends on how your
         system defines re_comp(3) or regcomp(3), but it should neverthe-
         less be similar to the syntax used in grep(1).

         The  fortunes  are output to standard output, while the names of
         the file from which each fortune comes are printed  to  standard
         error.   Either or both can be redirected; if standard output is
         redirected to a file, the result is a  valid  fortunes  database
         file.   If  standard  error is also redirected to this file, the
         result is still valid, but there  will  be  ''bogus''  fortunes,
         i.e. the filenames themselves, in parentheses.  This can be use-
         ful if you wish to remove the gathered matches from their origi-
         nal  files,  since each filename-record will precede the records
         from the file it names.

  -i     Ignore case for -m patterns.

When the fortune program is run with no arguments, it will randomly choose and print some text:

$ fortune
Laughter is the closest distance between two people.
		-- Victor Borge

Whence does this text originate? The manual page notes that you can supply one or more files or directories of the text sources. If no files are given, then the program will read from some default location. On my laptop, this is what the manual page says:

FILES
       Note: these are the defaults as defined at compile time.

       /opt/homebrew/Cellar/fortune/9708/share/games/fortunes
              Directory for inoffensive fortunes.
       /opt/homebrew/Cellar/fortune/9708/share/games/fortunes/off
              Directory for offensive fortunes.

I created a few representative files in the 12_fortuner/tests/inputs directory for testing purposes, along with an empty directory:

$ cd 12_fortuner
$ ls tests/inputs/
ascii-art   empty/      jokes       literature  quotes

Use head to look at the structure of a file. A fortune record can span multiple lines and is terminated with a percent sign (%) on a line by itself:

$ head -n 9 tests/inputs/jokes
Q. What do you call a head of lettuce in a shirt and tie?
A. Collared greens.
%
Q: Why did the gardener quit his job?
A: His celery wasn't high enough.
%
Q. Why did the honeydew couple get married in a church?
A. Their parents told them they cantaloupe.
%

You can tell fortune to read a particular file like tests/inputs/ascii-art, but first you will need to use the program strfile to create index files for randomly selecting the text records. I have provided a bash script called mk-dat.sh in the 12_fortuner directory that will index the files in the tests/inputs directory. After running this program, each input file should have a companion file ending in .dat:

$ ls -1 tests/inputs/
ascii-art
ascii-art.dat
empty/
jokes
jokes.dat
literature
literature.dat
quotes
quotes.dat

Now you should be able to run the following command to, for instance, randomly select a bit of ASCII art. You may or may not see a cute frog:

$ fortune tests/inputs/ascii-art
           .--._.--.
          ( O     O )
          /   . .   \
         .`._______.'.
        /(           )\
      _/  \  \   /  /  \_
   .~   `  \  \ /  /  '   ~.
  {    -.   \  V  /   .-    }
_ _`.    \  |  |  |  /    .'_ _
>_       _} |  |  | {_       _<
 /. - ~ ,_-'  .^.  `-_, ~ - .\
         '-'|/   \|`-`

You can also supply the tests/inputs directory to tell fortune to select a record from any of the files therein:

$ fortune tests/inputs
A classic is something that everyone wants to have read
and nobody wants to read.
		-- Mark Twain, "The Disappearance of Literature"

If a provided path does not exist, fortune will immediately halt with an error. Here I’ll use blargh for a nonexistent file:

$ fortune tests/inputs/jokes blargh tests/inputs/ascii-art
blargh: No such file or directory

Oddly, if the input source exists but is not readable, one version of fortune will complain that the file does not exist and produces no further output:

$ touch hammer && chmod 000 hammer
$ fortune hammer
hammer: No such file or directory

Another version explains that the file is not readable and informs the user that no fortunes were available for choosing:

$ fortune hammer
/home/u20/kyclark/hammer: Permission denied
No fortunes found

Using the -m option, I can search for all the text records matching a given string. The output will include a header printed to STDERR listing the filename that contains the records followed by the records printed to STDOUT. For instance, here are all the quotes by Yogi Berra:

$ fortune -m 'Yogi Berra' tests/inputs/
(quotes)
%
It's like deja vu all over again.
-- Yogi Berra
%
You can observe a lot just by watching.
-- Yogi Berra
%

If I search for Mark Twain and redirect both STDERR and STDOUT to files, I find that quotes of his are found in the literature and quotes files. Note that the headers printed to STDERR include only the basename of the file, like literature, and not the full path, like tests/inputs/literature:

$ fortune -m 'Mark Twain' tests/inputs/ 1>out 2>err
$ cat err
(literature)
%
(quotes)
%

Searching is case-sensitive by default, so searching for lowercase yogi berra will return no results. I must use the -i flag to perform case-insensitive matching:

$ fortune -i -m 'yogi berra' tests/inputs/
(quotes)
%
It's like deja vu all over again.
-- Yogi Berra
%
You can observe a lot just by watching.
-- Yogi Berra
%

While fortune can do a few more things, this is the extent that the challenge program will re-create.

Getting Started

The challenge program for this chapter will be called fortuner (pronounced for-chu-ner) for a Rust version of fortune. You should begin with cargo new fortuner, and then add the following dependencies to your Cargo.toml:

[dependencies]
clap = "2.33"
rand = "0.8"
walkdir = "2"
regex = "1"

[dev-dependencies]
assert_cmd = "2"
predicates = "2"

Copy the book’s 12_fortuner/tests directory into your project. Run cargo test to build the program and run the tests, all of which should fail.

Defining the Arguments

Update your src/main.rs to the following:

fn main() {
    if let Err(e) = fortuner::get_args().and_then(fortuner::run) {
        eprintln!("{}", e);
        std::process::exit(1);
    }
}



Start your src/lib.rs with the following code to define the program’s arguments:

use clap::{App, Arg};
use std::error::Error;
use regex::{Regex, RegexBuilder};

type MyResult<T> = Result<T, Box<dyn Error>>;

#[derive(Debug)]
pub struct Config {
    sources: Vec<String>, 
    pattern: Option<Regex>, 
    seed: Option<u64>, 
}


The sources argument is a list of files or directories.

The pattern to filter fortunes is an optional regular expression.

The seed is an optional u64 value to control random selections.

Note
As in Chapter 9, I use the -i|--insensitive flag with RegexBuilder, so you’ll note that my Config does not have a place for this flag.


Seeding Random Number Generators
The challenge program will randomly choose some text to show, but computers don’t usually make completely random choices.
As Robert R. Coveyou stated, “Random number generation is too important to be left to chance.”⁴
The challenge program will use a pseudorandom number generator (PRNG) that will always make the same selection following from some starting value, often called a seed.
That is, for any given seed, the same “random” choices will follow.
This makes it possible to test pseudorandom programs because we can use a known seed to verify that it produces some expected output.
I’ll be using the rand crate to create a PRNG, optionally using the config.seed value when present.
When no seed is present, then the program will make a different pseudorandom choice based on some other random input and so will appear to actually be random.
For more information, consult “The Rust Rand Book”.


You can start your get_args with the following:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("fortuner")
        .version("0.1.0")
        .author("Ken Youens-Clark <kyclark@gmail.com>")
        .about("Rust fortune")
        // What goes here?
        .get_matches();

    Ok(Config {
        sources: ...,
        seed: ...,
        pattern: ...,
    })
}


I suggest you start your run by printing the config:

pub fn run(config: Config) -> MyResult<()> {
    println!("{:#?}", config);
    Ok(())
}


Your program should be able to print a usage statement like the following:


$ cargo run -- -h
fortuner 0.1.0
Ken Youens-Clark <kyclark@gmail.com>
Rust fortune

USAGE:
    fortuner [FLAGS] [OPTIONS] <FILE>...

FLAGS:
    -h, --help           Prints help information
    -i, --insensitive    Case-insensitive pattern matching
    -V, --version        Prints version information

OPTIONS:
    -m, --pattern <PATTERN>    Pattern
    -s, --seed <SEED>          Random seed

ARGS:
    <FILE>...    Input files or directories

Unlike the original fortune, the challenge program will require one or more input files or directories.
When run with no arguments, it should halt and print the usage:


$ cargo run
error: The following required arguments were not provided:
    <FILE>...

USAGE:
    fortuner [FLAGS] [OPTIONS] <FILE>...

Verify that the arguments are parsed correctly:

$ cargo run -- ./tests/inputs -m 'Yogi Berra' -s 1
Config {
    sources: [
        "./tests/inputs", 
    ],
    pattern: Some( 
        Yogi Berra,
    ),
    seed: Some( 
        1,
    ),
}


Positional arguments should be interpreted as sources.


The -m option should be parsed as a regular expression for the pattern.


The -s option should be parsed as a u64, if present.



An invalid regular expression should be rejected at this point.
As noted in Chapter 9, for instance, a lone asterisk is not a valid regex:

$ cargo run -- ./tests/inputs -m "*"
Invalid --pattern "*"

Likewise, any value for the --seed that cannot be parsed as a u64 should also be 
rejected:


$ cargo run -- ./tests/inputs -s blargh
"blargh" not a valid integer

This means you will once again need some way to parse and validate a command-line argument as an integer.
You’ve written functions like this in several previous chapters, but parse_positive_int from Chapter 4 is probably most similar to what you need.
In this case, however, 0 is an acceptable value.
You might start with this:


fn parse_u64(val: &str) -> MyResult<u64> {
    unimplemented!();
}


Add the following unit test to src/lib.rs:


#[cfg(test)]
mod tests {
    use super::parse_u64;

    #[test]
    fn test_parse_u64() {
        let res = parse_u64("a");
        assert!(res.is_err());
        assert_eq!(res.unwrap_err().to_string(), "\"a\" not a valid integer");

        let res = parse_u64("0");
        assert!(res.is_ok());
        assert_eq!(res.unwrap(), 0);

        let res = parse_u64("4");
        assert!(res.is_ok());
        assert_eq!(res.unwrap(), 4);
    }
}

Note
Stop here and get your code working to this point. Be sure your program can pass cargo test parse_u64.


Here is how I wrote the parse_u64 function:


fn parse_u64(val: &str) -> MyResult<u64> {
    val.parse() 
        .map_err(|_| format!("\"{}\" not a valid integer", val).into()) 
}


Parse the value as a u64, which Rust infers from the return type.

In the event of an error, create a useful error message using the given value.


Following is how I define the arguments in my get_args:


pub fn get_args() -> MyResult<Config> {
    let matches = App::new("fortuner")
        .version("0.1.0")
        .author("Ken Youens-Clark <kyclark@gmail.com>")
        .about("Rust fortune")
        .arg(
            Arg::with_name("sources")
                .value_name("FILE")
                .multiple(true)
                .required(true)
                .help("Input files or directories"),
        )
        .arg(
            Arg::with_name("pattern")
                .value_name("PATTERN")
                .short("m")
                .long("pattern")
                .help("Pattern"),
        )
        .arg(
            Arg::with_name("insensitive")
                .short("i")
                .long("insensitive")
                .help("Case-insensitive pattern matching")
                .takes_value(false),
        )
        .arg(
            Arg::with_name("seed")
                .value_name("SEED")
                .short("s")
                .long("seed")
                .help("Random seed"),
        )
        .get_matches();


I use the --insensitive flag with regex::RegexBuilder to create a regular expression that might be case-insensitive before returning the Config:


    let pattern = matches
        .value_of("pattern") 
        .map(|val| { 
            RegexBuilder::new(val) 
                .case_insensitive(matches.is_present("insensitive")) 
                .build() 
                .map_err(|_| format!("Invalid --pattern \"{}\"", val)) 
        })
        .transpose()?; 


ArgMatches::value_of will return Option<&str>.


Use Option::map to handle Some(val).

Call RegexBuilder::new with the given value.


The RegexBuilder::case_insensitive method will cause the regex to disregard case in comparisons when the insensitive flag is present.


The RegexBuilder::build method will compile the regex.

If build returns an error, use Result::map_err to create an error message stating that the given pattern is invalid.


The result of Option::map will be an Option<Result>, and Option::trans⁠pose will turn this into a Result<Option>. Use ? to fail on an invalid regex.



Finally, I return the Config:

    Ok(Config {
        sources: matches.values_of_lossy("sources").unwrap(), 
        seed: matches.value_of("seed").map(parse_u64).transpose()?, 
        pattern,
    })
}


There should be at least one value in sources, so it is safe to call Option::unwrap.

Attempt to parse the seed value as a u64. Transpose the result and use ? to bail on a bad input.
















Finding the Input Sources

You are free to write your solution however you see fit so long as it passes the integration tests.
This is a rather complicated program, so I’m going to break it into many small, testable functions to help you arrive at a solution.
If you want to follow my lead, then the next order of business is finding the input files from the given sources, which might be filenames or directories.
When a source is a directory, all the files in the directory will be used.
To read the fortune files, the fortune program requires the *.dat files created by strfile.
These are binary files that contain data for randomly accessing the records.
The challenge program will not use these and so should skip them, if present.
If you ran the mk-dat.sh program, you can either remove the *.dat files from tests/inputs or include logic in your program to skip them.


I decided to write a function to find all the files in a list of paths provided by the user.
While I could return the files as strings, I want to introduce you to a couple of useful structs Rust has for representing paths.
The first is Path, which, according to the documentation, “supports a number of operations for inspecting a path, including breaking the path into its components (separated by / on Unix and by either / or \ on Windows), extracting the file name, determining whether the path is absolute, and so on.”
That sounds really useful, so you might think my function should return the results as Path objects, but the documentation notes: “This is an unsized type, 
meaning that it must always be used behind a pointer like & or Box. For an owned version of this type, see PathBuf.”


This leads us to PathBuf, the second useful module for representing paths.
Just as String is an owned, modifiable version of &str, PathBuf is an owned, modifiable version of Path.
Returning a Path from my function would lead to compiler errors, as my code would be trying to reference dropped values, but there will be no such problem returning a PathBuf.
You are not required to use either of these structs, but they will make your program portable across operating systems and will save you a lot of work that’s been done to parse paths correctly.
Following is the signature of my find_files function, which you are welcome to use.
Be sure to add use std::path::PathBuf to your imports:


fn find_files(paths: &[String]) -> MyResult<Vec<PathBuf>> {
    unimplemented!();
}


Here is a unit test called test_find_files that you can add to your tests module:


#[cfg(test)]
mod tests {
    use super::{find_files, parse_u64}; 

    #[test]
    fn test_parse_u64() {} // Same as before

    #[test]
    fn test_find_files() {
        // Verify that the function finds a file known to exist
        let res = find_files(&["./tests/inputs/jokes".to_string()]);
        assert!(res.is_ok());

        let files = res.unwrap();
        assert_eq!(files.len(), 1);
        assert_eq!(
            files.get(0).unwrap().to_string_lossy(),
            "./tests/inputs/jokes"
        );

        // Fails to find a bad file
        let res = find_files(&["/path/does/not/exist".to_string()]);
        assert!(res.is_err());

        // Finds all the input files, excludes ".dat"
        let res = find_files(&["./tests/inputs".to_string()]);
        assert!(res.is_ok());

        // Check number and order of files
        let files = res.unwrap();
        assert_eq!(files.len(), 5); 
        let first = files.get(0).unwrap().display().to_string();
        assert!(first.contains("ascii-art"));
        let last = files.last().unwrap().display().to_string();
        assert!(last.contains("quotes"));

        // Test for multiple sources, path must be unique and sorted
        let res = find_files(&[
            "./tests/inputs/jokes".to_string(),
            "./tests/inputs/ascii-art".to_string(),
            "./tests/inputs/jokes".to_string(),
        ]);
        assert!(res.is_ok());
        let files = res.unwrap();
        assert_eq!(files.len(), 2);
        if let Some(filename) = files.first().unwrap().file_name() {
            assert_eq!(filename.to_string_lossy(), "ascii-art".to_string())
        }
        if let Some(filename) = files.last().unwrap().file_name() {
            assert_eq!(filename.to_string_lossy(), "jokes".to_string())
        }
    }
}


Add find_files to the imports.

The tests/inputs/empty directory contains the empty, hidden file .gitkeep so that Git will track this directory. If you choose to ignore empty files, you can change the expected number of files from five to four.


Note that the find_files function must return the paths in sorted order.
Different operating systems will return the files in different orders, which will lead to the fortunes being in different orders, leading to difficulties in testing.
You will nip the problem in the bud if you return the files in a consistent, sorted order.
Furthermore, the returned paths should be unique, and you can use a combination of Vec::sort and Vec::dedup for this.
Note
Stop reading and write the function that will satisfy cargo test find_files.


Next, update your run function to print the found files:

pub fn run(config: Config) -> MyResult<()> {
    let files = find_files(&config.sources)?;
    println!("{:#?}", files);
    Ok(())
}


When given a list of existing, readable files, it should print them in order:

$ cargo run tests/inputs/jokes tests/inputs/ascii-art
[
    "tests/inputs/ascii-art",
    "tests/inputs/jokes",
]

Test your program to see if it will find the files (that don’t end with .dat) in the tests/⁠inputs directory:

$ cargo run tests/inputs/
[
    "tests/inputs/ascii-art",
    "tests/inputs/empty/.gitkeep",
    "tests/inputs/jokes",
    "tests/inputs/literature",
    "tests/inputs/quotes",
]

Previous challenge programs in this book would note unreadable or nonexistent files and move on, but fortune dies immediately when given even one file it can’t use.
Be sure your program does the same if you provide an invalid file, such as the nonexistent blargh:


$ cargo run tests/inputs/jokes blargh tests/inputs/ascii-art
blargh: No such file or directory (os error 2)

Note that my version of find_files tries only to find files and does not try to open them, which means an unreadable file does not trigger a failure at this point:

$ touch hammer && chmod 000 hammer
$ cargo run -- hammer
[
    "hammer",
]















Reading the Fortune Files

Once you have found the input files, the next step is to read the records of text from them.
I wrote a function that accepts the list of found files and possibly returns a list of the contained fortunes.
When the program is run with the -m option to find all the matching fortunes for a given pattern, I will need both the fortune text and the source filename, so I decided to create a struct called Fortune to contain these.
If you want to use this idea, add the following to src/lib.rs, perhaps just after the Config struct:


#[derive(Debug)]
struct Fortune {
    source: String, 
    text: String, 
}


The source is the filename containing the record.

The text is the contents of the record up to but not including the terminating percent sign (%).


My read_fortunes function accepts a list of input paths and possibly returns a vector of Fortune structs.
In the event of a problem such as an unreadable file, the function will return an error.
If you would like to write this function, here is the signature you can use:


fn read_fortunes(paths: &[PathBuf]) -> MyResult<Vec<Fortune>> {
    unimplemented!();
}


Following is a test_read_fortunes unit test you can add to the tests module:


#[cfg(test)]
mod tests {
    use super::{find_files, parse_u64, read_fortunes, Fortune}; 
    use std::path::PathBuf;

    #[test]
    fn test_parse_u64() {} // Same as before

    #[test]
    fn test_find_files() {} // Same as before

    #[test]
    fn test_read_fortunes() {
        // One input file
        let res = read_fortunes(&[PathBuf::from("./tests/inputs/jokes")]);
        assert!(res.is_ok());

        if let Ok(fortunes) = res {
            // Correct number and sorting
            assert_eq!(fortunes.len(), 6); 
            assert_eq!(
                fortunes.first().unwrap().text,
                "Q. What do you call a head of lettuce in a shirt and tie?\n\
                A. Collared greens."
            );
            assert_eq!(
                fortunes.last().unwrap().text,
                "Q: What do you call a deer wearing an eye patch?\n\
                A: A bad idea (bad-eye deer)."
            );
        }

        // Multiple input files
        let res = read_fortunes(&[
            PathBuf::from("./tests/inputs/jokes"),
            PathBuf::from("./tests/inputs/quotes"),
        ]);
        assert!(res.is_ok());
        assert_eq!(res.unwrap().len(), 11);
    }
}


Import read_fortunes, Fortune, and PathBuf for testing.

The tests/inputs/jokes file contains an empty fortune that is expected to be removed.

Note
Stop here and implement a version of the function that passes cargo test read_fortunes.


Update run to print, for instance, one of the found records:

pub fn run(config: Config) -> MyResult<()> {
    let files = find_files(&config.sources)?;
    let fortunes = read_fortunes(&files)?;
    println!("{:#?}", fortunes.last());
    Ok(())
}


When passed good input sources, the program should print a fortune like so:

$ cargo run tests/inputs
Some(
    Fortune {
        source: "quotes",
        text: "You can observe a lot just by watching.\n-- Yogi Berra",
    },
)

When provided an unreadable file, such as the previously created hammer file, the program should die with a useful error message:

$ cargo run hammer
hammer: Permission denied (os error 13)















Randomly Selecting a Fortune

The program will have two possible outputs.
When the user supplies a pattern, the program should print all the fortunes matching the pattern; otherwise, the program should randomly select one fortune to print.
For the latter option, I wrote a 
pick_fortune function that takes some fortunes and an optional seed and returns an optional string:


fn pick_fortune(fortunes: &[Fortune], seed: Option<u64>) -> Option<String> {
    unimplemented!();
}


My function uses the rand crate to select the fortune using a random number generator (RNG), as described earlier in the chapter.
When there is no seed value, I use rand::thread_rng to create an RNG that is seeded by the system.
When there is a seed value, I use rand::rngs::StdRng::seed_from_u64.
Finally, I use SliceRan⁠dom::choose with the RNG to select a fortune.


Following is how you can expand your tests module to include the test_read_for⁠tunes unit test:


#[cfg(test)]
mod tests {
    use super::{
        find_files, parse_u64, pick_fortune, read_fortunes, Fortune, 
    };
    use std::path::PathBuf;

    #[test]
    fn test_parse_u64() {} // Same as before

    #[test]
    fn test_find_files() {} // Same as before

    #[test]
    fn test_read_fortunes() {} // Same as before

    #[test]
    fn test_pick_fortune() {
        // Create a slice of fortunes
        let fortunes = &[
            Fortune {
                source: "fortunes".to_string(),
                text: "You cannot achieve the impossible without \
                      attempting the absurd."
                    .to_string(),
            },
            Fortune {
                source: "fortunes".to_string(),
                text: "Assumption is the mother of all screw-ups."
                    .to_string(),
            },
            Fortune {
                source: "fortunes".to_string(),
                text: "Neckties strangle clear thinking.".to_string(),
            },
        ];

        // Pick a fortune with a seed
        assert_eq!(
            pick_fortune(fortunes, Some(1)).unwrap(), 
            "Neckties strangle clear thinking.".to_string()
        );
    }
}


Import the pick_fortune function for testing.

Supply a seed in order to verify that the pseudorandom selection is reproducible.

Note
Stop reading and write the function that will pass cargo test pick_fortune.


You can integrate this function into your run like so:

pub fn run(config: Config) -> MyResult<()> {
    let files = find_files(&config.sources)?;
    let fortunes = read_fortunes(&files)?;
    println!("{:#?}", pick_fortune(&fortunes, config.seed));
    Ok(())
}


Run your program with no seed and revel in the ensuing chaos of randomness:

$ cargo run tests/inputs/
Some(
    "Q: Why did the gardener quit his job?\nA: His celery wasn't high enough.",
)

When provided a seed, the program should always select the same fortune:

$ cargo run tests/inputs/ -s 1
Some(
    "You can observe a lot just by watching.\n-- Yogi Berra",
)
Tip
The tests I wrote are predicated on the fortunes being in a particular order. I wrote find_files to return the files in sorted order, which means the list of fortunes passed to pick_fortune are ordered first by their source filename and then by their order inside the file. If you use a different data structure to represent the fortunes or parse them in a different order, then you’ll need to change the tests to reflect your decisions. The key is to find a way to make your pseudorandom choices be predictable and testable.

















Printing Records Matching a Pattern

You now have all the pieces for finishing the program.
The last step is to decide whether to print all the fortunes that match a given regular expression or to randomly select one fortune.
You can expand your run function like so:


pub fn run(config: Config) -> MyResult<()> {
    let files = find_files(&config.sources)?;
    let fortunes = read_fortunes(&files)?;

    if let Some(pattern) = config.pattern {
        for fortune in fortunes {
            // Print all the fortunes matching the pattern
        }
    } else {
        // Select and print one fortune
    }

    Ok(())
}


Remember that the program should let the user know when there are no fortunes, such as when using the tests/inputs/empty directory:

$ cargo run tests/inputs/empty
No fortunes found
Note
That should be enough information for you to finish this program using the provided tests. This is a tough problem, but don’t give up.


Solution

For the following code, you will need to expand your src/lib.rs with the following imports and definitions:


use clap::{App, Arg};
use rand::prelude::SliceRandom;
use rand::{rngs::StdRng, SeedableRng};
use regex::{Regex, RegexBuilder};
use std::{
    error::Error,
    ffi::OsStr,
    fs::{self, File},
    io::{BufRead, BufReader},
    path::PathBuf,
};
use walkdir::WalkDir;

type MyResult<T> = Result<T, Box<dyn Error>>;

#[derive(Debug)]
pub struct Config {
    sources: Vec<String>,
    pattern: Option<Regex>,
    seed: Option<u64>,
}

#[derive(Debug)]
pub struct Fortune {
    source: String,
    text: String,
}


I’ll show you how I wrote each of the functions I described in the previous section, starting with the find_files function.
You will notice that it filters out files that have the extension .dat using the type OsStr, which is a Rust type for an operating system’s preferred representation of a string that might not be a valid UTF-8 string.
The type OsStr is borrowed, and the owned version is OsString.
These are similar to the Path and PathBuf distinctions.
Both versions encapsulate the complexities of dealing with filenames on both Windows and Unix platforms.
In the following code, you’ll see that I use Path::extension, which returns Option<&OsStr>:

fn find_files(paths: &[String]) -> MyResult<Vec<PathBuf>> {
    let dat = OsStr::new("dat"); 
    let mut files = vec![]; 

    for path in paths {
        match fs::metadata(path) {
            Err(e) => return Err(format!("{}: {}", path, e).into()), 
            Ok(_) => files.extend( 
                WalkDir::new(path)  
                    .into_iter()
                    .filter_map(Result::ok) 
                    .filter(|e| {
                        e.file_type().is_file() 
                            && e.path().extension() != Some(dat)
                    })
                    .map(|e| e.path().into()), 
            ),
        }
    }

    files.sort(); 
    files.dedup(); 
    Ok(files) 
}


Create an OsStr value for the string dat.

Create a mutable vector for the results.

If fs::metadata fails, return a useful error message.


Use Vec::extend to add the results from WalkDir to the results.


Use walkdir::WalkDir to find all the entries from the starting path.


This will ignore any errors for unreadable files or directories, which is the behavior of the original program.

Take only regular files that do not have the .dat extension.

The walkdir::DirEntry::path function returns a Path, so convert it into a 
PathBuf.


Use Vec::sort to sort the entries in place.

Use Vec::dedup to remove consecutive repeated values.


Return the sorted, unique files.


The files found by the preceding function are the inputs to the read_fortunes 
function:


fn read_fortunes(paths: &[PathBuf]) -> MyResult<Vec<Fortune>> {
    let mut fortunes = vec![]; 
    let mut buffer = vec![];

    for path in paths { 
        let basename = 
            path.file_name().unwrap().to_string_lossy().into_owned();
        let file = File::open(path).map_err(|e| {
            format!("{}: {}", path.to_string_lossy().into_owned(), e)
        })?; 

        for line in BufReader::new(file).lines().filter_map(Result::ok) 
        {
            if line == "%" { 
                if !buffer.is_empty() { 
                    fortunes.push(Fortune {
                        source: basename.clone(),
                        text: buffer.join("\n"),
                    });
                    buffer.clear();
                }
            } else {
                buffer.push(line.to_string()); 
            }
        }
    }

    Ok(fortunes)
}


Create mutable vectors for the fortunes and a record buffer.

Iterate through the given filenames.


Convert Path::file_name from OsStr to String, using the lossy version in case this is not valid UTF-8. The result is a clone-on-write smart pointer, so use Cow::into_owned to clone the data if it is not already owned.


Open the file or return an error message.

Iterate through the lines of the file.


A sole percent sign (%) indicates the end of a record.

If the buffer is not empty, set the text to the buffer lines joined on newlines and then clear the buffer.

Otherwise, add the current line to the buffer.


Here is how I wrote the pick_fortune function:


fn pick_fortune(fortunes: &[Fortune], seed: Option<u64>) -> Option<String> {
    if let Some(val) = seed { 
        let mut rng = StdRng::seed_from_u64(val); 
        fortunes.choose(&mut rng).map(|f| f.text.to_string()) 
    } else {
        let mut rng = rand::thread_rng(); 
        fortunes.choose(&mut rng).map(|f| f.text.to_string())
    }
}


Check if the user has supplied a seed.

If so, create a PRNG using the provided seed.

Use the PRNG to select one of the fortunes.

Otherwise, use a PRNG seeded by the system.


I can bring all these ideas together in my run like so:

pub fn run(config: Config) -> MyResult<()> {
    let files = find_files(&config.sources)?;
    let fortunes = read_fortunes(&files)?;
    if let Some(pattern) = config.pattern { 
        let mut prev_source = None; 
        for fortune in fortunes 
            .iter()
            .filter(|fortune| pattern.is_match(&fortune.text))
        {
            if prev_source.as_ref().map_or(true, |s| s != &fortune.source) { 
                eprintln!("({})\n%", fortune.source);
                prev_source = Some(fortune.source.clone()); 
            }
            println!("{}\n%", fortune.text); 
        }
    } else {
        println!( 
            "{}",
            pick_fortune(&fortunes, config.seed)
                .or_else(|| Some("No fortunes found".to_string()))
                .unwrap()
        );
    }
    Ok(())
}


Check if the user has provided a pattern option.


Initialize a mutable variable to remember the last fortune source.

Iterate over the found fortunes and filter for those matching the provided regular expression.

Print the source header if the current source is not the same as the previous one seen.

Store the current fortune source.

Print the text of the fortune.

Print a random fortune or a message that states that there are no fortunes to be found.

Note
The fortunes are stored with embedded newlines that may cause the regular expression matching to fail if the sought-after phrase spans multiple lines. This mimics how the original fortune works but may not match the expectations of the user.


At this point, the program passes all the provided tests.
I provided more guidance on this challenge because of the many steps involved in finding and reading files and then printing all the matching records or using a PRNG to randomly select one.
I hope you enjoyed that as much as I did.
















Going Further

Read the fortune manual page to learn about other options your program can implement.
For instance, you could add the -n length option to restrict fortunes to those less than the given length.
Knowing the lengths of the fortunes would be handy for implementing the -s option, which picks only short fortunes.
As noted in the final solution, the regular expression matching may fail because of the embedded newlines in the fortunes.
Can you find a way around this limitation?


Randomness is a key aspect to many games that you could try to write.
Perhaps start with a game where the user must guess a randomly selected number in a range; then you could move on to a more difficult game like “Wheel of Fortune,” where the user guesses letters in a randomly selected word or phrase.
Many systems have the file /usr/share/dict/words that contains many thousands of English words; you could use that as a source, or you could create your own input file of words and phrases.















Summary

Programs that incorporate randomness are some of my favorites.
Random events are very useful for creating games as well as machine learning programs, so it’s important to understand how to control and test randomness.
Here’s some of what you learned in this chapter:



The fortune records span multiple lines and use a lone percent sign to indicate the end of the record. You learned to read the lines into a buffer and dump the buffer when the record or file terminator is found.


You can use the rand crate to make pseudorandom choices that can be controlled using a seed value.


The Path (borrowed) and PathBuf (owned) types are useful abstractions for dealing with system paths on both Windows and Unix. They are similar to the &str and String types for dealing with borrowed and owned strings.


The names of files and directories may be invalid UTF-8, so Rust uses the types OsStr (borrowed) and OsString (owned) to represent these strings.


Using abstractions like Path and OsStr makes your Rust code more portable across operating systems.




In the next chapter, you’ll learn to manipulate dates as you create a terminal-based calendar program.








1 ASCII art is a term for graphics that use only ASCII text values.
² This was in the 1990s, which I believe the kids nowadays refer to as “the late 1900s.”
³ On Ubuntu, sudo apt install fortune-mod; on macOS, brew install fortune.
⁴ Robert R. Coveyou, “Random Number Generation Is Too Important to Be Left to Chance,” Studies in Applied Mathematics 3(1969): 70–111.