Chapter 6. Den of Uniquity

There’s only one everything

They Might Be Giants, “One Everything” (2008)

In this chapter, you will write a Rust version of the uniq program (pronounced unique), which will find the distinct lines of text from either a file or STDIN. Among its many uses, it is often employed to count how many times each unique string is found.

Along the way, you will learn how to do the following:

  • Write to a file or STDOUT

  • Use a closure to capture a variable

  • Apply the don’t repeat yourself (DRY) concept

  • Use the Write trait and the write! and writeln! macros

  • Use temporary files

  • Indicate the lifetime of a variable

How uniq Works

As usual, I’ll start by explaining how uniq works so that you understand what is expected of your program. Following is part of the manual page for the BSD version of uniq. The challenge program in this chapter will only implement the reading of a file or STDIN, writing to a file or STDOUT, and counting the lines for the -c flag, but I include more of the documentation so that you can see the full scope of the program:

UNIQ(1)                   BSD General Commands Manual                  UNIQ(1)

NAME
     uniq -- report or filter out repeated lines in a file

SYNOPSIS
     uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]]

DESCRIPTION
     The uniq utility reads the specified input_file comparing adjacent lines,
     and writes a copy of each unique input line to the output_file.  If
     input_file is a single dash ('-') or absent, the standard input is read.
     If output_file is absent, standard output is used for output.  The second
     and succeeding copies of identical adjacent input lines are not written.
     Repeated lines in the input will not be detected if they are not adja-
     cent, so it may be necessary to sort the files first.

     The following options are available:

     -c      Precede each output line with the count of the number of times
             the line occurred in the input, followed by a single space.

     -d      Only output lines that are repeated in the input.

     -f num  Ignore the first num fields in each input line when doing compar-
             isons.  A field is a string of non-blank characters separated
             from adjacent fields by blanks.  Field numbers are one based,
             i.e., the first field is field one.

     -s chars
             Ignore the first chars characters in each input line when doing
             comparisons.  If specified in conjunction with the -f option, the
             first chars characters after the first num fields will be
             ignored.  Character numbers are one based, i.e., the first char-
             acter is character one.

     -u      Only output lines that are not repeated in the input.

     -i      Case insensitive comparison of lines.

In the 06_uniqr/tests/inputs directory of the book’s Git repository, you will find the following input files I’ll use for testing:

  • empty.txt: an empty file

  • one.txt: a file with one line of text

  • two.txt: a file with two lines of the same text

  • three.txt: a file with 13 lines of 4 unique values

  • skip.txt: a file with four lines of two unique values plus an empty line

The other files t[1–6].txt are examples from a Perl program used to test the GNU version. These are generated by the mk-outs.sh file:

$ cat mk-outs.sh
#!/usr/bin/env bash

ROOT="tests/inputs"
OUT_DIR="tests/expected"

[[ ! -d "$OUT_DIR" ]] && mkdir -p "$OUT_DIR"

# Cf https://github.com/coreutils/coreutils/blob/master/tests/misc/uniq.pl
echo -ne "a\na\n"    > $ROOT/t1.txt 1
echo -ne "a\na"      > $ROOT/t2.txt 2
echo -ne "a\nb"      > $ROOT/t3.txt 3
echo -ne "a\na\nb"   > $ROOT/t4.txt 4
echo -ne "b\na\na\n" > $ROOT/t5.txt 5
echo -ne "a\nb\nc\n" > $ROOT/t6.txt 6

for FILE in $ROOT/*.txt; do
    BASENAME=$(basename "$FILE")
    uniq      $FILE > ${OUT_DIR}/${BASENAME}.out
    uniq -c   $FILE > ${OUT_DIR}/${BASENAME}.c.out
    uniq    < $FILE > ${OUT_DIR}/${BASENAME}.stdin.out
    uniq -c < $FILE > ${OUT_DIR}/${BASENAME}.stdin.c.out
done
1

Two lines each ending with a newline

2

No trailing newline on last line

3

Two different lines, no trailing newline

4

Two lines the same; last is different with no trailing newline

5

Two different values with newlines on each

6

Three different values with newlines on each

To demonstrate uniq, note that it will print nothing when given an empty file:

$ uniq tests/inputs/empty.txt

Given a file with just one line, the one line will be printed:

$ uniq tests/inputs/one.txt
a

It will also print the number of times a line occurs before the line when run with the -c option. The count is right-justified in a field four characters wide and is followed by a single space and then the line of text:

$ uniq -c tests/inputs/one.txt
   1 a

The file tests/inputs/two.txt contains two duplicate lines:

$ cat tests/inputs/two.txt
a
a

Given this input, uniq will emit one line:

$ uniq tests/inputs/two.txt
a

With the -c option, uniq will also include the count of unique lines:

$ uniq -c tests/inputs/two.txt
   2 a

A longer input file shows that uniq only considers the lines in order and not globally. For example, the value a appears four times in this input file:

$ cat tests/inputs/three.txt
a
a
b
b
a
c
c
c
a
d
d
d
d

When counting, uniq starts over at 1 each time it sees a new string. Since a occurs in three different places in the input file, it will also appear three times in the output:

$ uniq -c tests/inputs/three.txt
   2 a
   2 b
   1 a
   3 c
   1 a
   4 d

If you want the actual unique values, you must first sort the input, which can be done with the aptly named sort command. In the following output, you’ll finally see that a occurs a total of four times in the input file:

$ sort tests/inputs/three.txt | uniq -c
   4 a
   2 b
   3 c
   4 d

The file tests/inputs/skip.txt contains a blank line:

$ cat tests/inputs/skip.txt
a

a
b

The blank line acts just like any other value, and so it will reset the counter:

$ uniq -c tests/inputs/skip.txt
   1 a
   1
   1 a
   1 b

If you study the Synopsis of the usage closely, you’ll see a very subtle indication of how to write the output to a file. Notice how input_file and output_file in the following are grouped inside square brackets to indicate that they are optional as a pair. That is, if you provide input_file, you may also optionally provide output​_file:

uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]]

For example, I can count tests/inputs/two.txt and place the output into out:

$ uniq -c tests/inputs/two.txt out
$ cat out
      2 a

With no positional arguments, uniq will read from STDIN by default:

$ cat tests/inputs/two.txt | uniq -c
      2 a

If you want to read from STDIN and indicate the output filename, you must use a dash (-) for the input filename:

$ cat tests/inputs/two.txt | uniq -c - out
$ cat out
      2 a

The GNU version works basically the same while also providing many more options:

$ uniq --help
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
Filter adjacent matching lines from INPUT (or standard input),
writing to OUTPUT (or standard output).

With no options, matching lines are merged to the first occurrence.

Mandatory arguments to long options are mandatory for short options too.
  -c, --count           prefix lines by the number of occurrences
  -d, --repeated        only print duplicate lines, one for each group
  -D, --all-repeated[=METHOD]  print all duplicate lines
                          groups can be delimited with an empty line
                          METHOD={none(default),prepend,separate}
  -f, --skip-fields=N   avoid comparing the first N fields
      --group[=METHOD]  show all items, separating groups with an empty line
                          METHOD={separate(default),prepend,append,both}
  -i, --ignore-case     ignore differences in case when comparing
  -s, --skip-chars=N    avoid comparing the first N characters
  -u, --unique          only print unique lines
  -z, --zero-terminated  end lines with 0 byte, not newline
  -w, --check-chars=N   compare no more than N characters in lines
      --help     display this help and exit
      --version  output version information and exit

A field is a run of blanks (usually spaces and/or TABs), then nonblank
characters.  Fields are skipped before chars.

Note: 'uniq' does not detect repeated lines unless they are adjacent.
You may want to sort the input first, or use 'sort -u' without 'uniq'.
Also, comparisons honor the rules specified by 'LC_COLLATE'.

As you can see, both the BSD and GNU versions have many more options, but this is as much as the challenge program is expected to implement.

Getting Started

This chapter’s challenge program should be called uniqr (pronounced you-neek-er) for a Rust version of uniq. Start by running cargo new uniqr, then modify your Cargo.toml to add the following dependencies:

[dependencies]
clap = "2.33"

[dev-dependencies]
assert_cmd = "2"
predicates = "2"
tempfile = "3" 1
rand = "0.8"
1

The tests will create temporary files using the tempfile crate.

Copy the book’s 06_uniqr/tests directory into your project, and then run cargo test to ensure that the program compiles and the tests run and fail.

Defining the Arguments

Update your src/main.rs to the following:

fn main() {
    if let Err(e) = uniqr::get_args().and_then(uniqr::run) {
        eprintln!("{}", e);
        std::process::exit(1);
    }
}

I suggest you start src/lib.rs with the following:

use clap::{App, Arg};
use std::error::Error;

type MyResult<T> = Result<T, Box<dyn Error>>;

#[derive(Debug)]
pub struct Config {
    in_file: String, 1
    out_file: Option<String>, 2
    count: bool, 3
}
1

This is the input filename to read, which may be STDIN if the filename is a dash.

2

The output will be written either to an optional output filename or STDOUT.

3

count is a Boolean for whether or not to print the counts of each line.

Here is an outline for get_args:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("uniqr")
        .version("0.1.0")
        .author("Ken Youens-Clark <kyclark@gmail.com>")
        .about("Rust uniq")
        // What goes here?
        .get_matches();

    Ok(Config {
        in_file: ...
        out_file: ...
        count: ...
    })
}

I suggest you start your run by printing the config:

pub fn run(config: Config) -> MyResult<()> {
    println!("{:?}", config);
    Ok(())
}

Your program should be able to produce the following usage:

$ cargo run -- -h
uniqr 0.1.0
Ken Youens-Clark <kyclark@gmail.com>
Rust uniq

USAGE:
    uniqr [FLAGS] [ARGS]

FLAGS:
    -c, --count      Show counts 1
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <IN_FILE>     Input file [default: -] 2
    <OUT_FILE>    Output file 3
1

The -c|--count flag is optional.

2

The input file is the first positional argument and defaults to a dash (-).

3

The output file is the second positional argument and is optional.

By default the program will read from STDIN, which can be represented using a dash:

$ cargo run
Config { in_file: "-", out_file: None, count: false }

The first positional argument should be interpreted as the input file and the second positional argument as the output file.1 Note that clap can handle options either before or after positional arguments:

$ cargo run -- tests/inputs/one.txt out --count
Config { in_file: "tests/inputs/one.txt", out_file: Some("out"), count: true }
Note

Take a moment to finish get_args before reading further.

I assume you are an upright and moral person who figured out the preceding function on your own, so I will now share my solution:

pub fn get_args() -> MyResult<Config> {
    let matches = App::new("uniq")
        .version("0.1.0")
        .author("Ken Youens-Clark <kyclark@gmail.com>")
        .about("Rust uniq")
        .arg(
            Arg::with_name("in_file")
                .value_name("IN_FILE")
                .help("Input file")
                .default_value("-"),
        )
        .arg(
            Arg::with_name("out_file")
                .value_name("OUT_FILE")
                .help("Output file"),
        )
        .arg(
            Arg::with_name("count")
                .short("c")
                .help("Show counts")
                .long("count")
                .takes_value(false),
        )
        .get_matches();

    Ok(Config {
        in_file: matches.value_of_lossy("in_file").unwrap().to_string(), 1
        out_file: matches.value_of("out_file").map(String::from), 2
        count: matches.is_present("count"), 3
    })
}
1

Convert the in_file argument to a String.

2

Convert the out_file argument to an Option<String>.

3

The count is either present or not, so convert this to a bool.

Because the in_file argument has a default value, it is safe to call Option::unwrap and convert the value to a String. There are several other ways to get the same result, none of which is necessarily superior. You could use Option::map to feed the value to String::from and then unwrap it:

    in_file: matches.value_of_lossy("in_file").map(String::from).unwrap(),

You could also use a closure that calls Into::into to convert the value into a String because Rust can infer the type:

    in_file: matches.value_of_lossy("in_file").map(|v| v.into()).unwrap(),

The preceding can also be expressed using the Into::into function directly because functions are first-class values that can be passed as arguments:

    in_file: matches.value_of_lossy("in_file").map(Into::into).unwrap(),

The out_file is optional, but if there is an option, you can use Option::map to convert a Some value to a String:

    out_file: matches.value_of("out_file").map(|v| v.to_string()),

Testing the Program

The test suite in tests/cli.rs is fairly large, containing 78 tests that check the program under the following conditions:

Given how large and complicated the tests became, you may be interested to see how I structured tests/cli.rs, which starts with the following:

use assert_cmd::Command;
use predicates::prelude::*;
use rand::{distributions::Alphanumeric, Rng};
use std::fs;
use tempfile::NamedTempFile; 1

type TestResult = Result<(), Box<dyn std::error::Error>>;

struct Test { 2
    input: &'static str,
    out: &'static str,
    out_count: &'static str,
}
1

This is used to create temporary output files.

2

A struct to define the input files and expected output values with and without the counts.

Note the use of 'static to denote the lifetime of the values. I want to define structs with &str values, and the Rust compiler would like to know exactly how long the values are expected to stick around relative to one another. The 'static annotation shows that this data will live for the entire lifetime of the program. If you remove it and run the tests, you’ll see similar errors from the compiler, as shown in the previous section, along with a suggestion of how to fix it:

error[E0106]: missing lifetime specifier
 --> tests/cli.rs:8:12
  |
8 |     input: &str,
  |            ^ expected named lifetime parameter
  |
help: consider introducing a named lifetime parameter
  |
7 | struct Test<'a> {
8 |     input: &'a str,

Next, I define some constant values I need for testing:

const PRG: &str = "uniqr"; 1

const EMPTY: Test = Test {
    input: "tests/inputs/empty.txt", 2
    out: "tests/inputs/empty.txt.out", 3
    out_count: "tests/inputs/empty.txt.c.out", 4
};
1

The name of the program being tested

2

The location of the input file for this test

3

The location of the output file without the counts

4

The location of the output file with the counts

After the declaration of EMPTY, there are many more Test structures followed by several helper functions. The run function will use Test.input as an input file and will compare STDOUT to the contents of the Test.out file:

fn run(test: &Test) -> TestResult { 1
    let expected = fs::read_to_string(test.out)?; 2
    Command::cargo_bin(PRG)? 3
        .arg(test.input)
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

The function accepts a Test and returns a TestResult.

2

Try to read the expected output file.

3

Try to run the program with the input file as an argument, verify it ran successfully, and compare STDOUT to the expected value.

The run_count helper function works very similarly, but this time it tests for the counting:

fn run_count(test: &Test) -> TestResult {
    let expected = fs::read_to_string(test.out_count)?; 1
    Command::cargo_bin(PRG)?
        .args(&[test.input, "-c"]) 2
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

Read the Test.out_count file for the expected output.

2

Pass both the Test.input value and the flag -c to count the lines.

The run_stdin function will supply the input to the program through STDIN:

fn run_stdin(test: &Test) -> TestResult {
    let input = fs::read_to_string(test.input)?; 1
    let expected = fs::read_to_string(test.out)?; 2
    Command::cargo_bin(PRG)? 3
        .write_stdin(input)
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

Try to read the Test.input file.

2

Try to read the Test.out file.

3

Pass the input through STDIN and verify that STDOUT is the expected value.

The run_stdin_count function tests both reading from STDIN and counting the lines:

fn run_stdin_count(test: &Test) -> TestResult {
    let input = fs::read_to_string(test.input)?;
    let expected = fs::read_to_string(test.out_count)?;
    Command::cargo_bin(PRG)? 1
        .arg("--count")
        .write_stdin(input)
        .assert()
        .success()
        .stdout(expected);
    Ok(())
}
1

Run the program with the long --count flag, feed the input to STDIN, and verify that STDOUT is correct.

The run_outfile function checks that the program accepts both the input and output files as positional arguments. This is somewhat more interesting as I needed to use temporary files in the testing because, as you have seen repeatedly, Rust will run the tests in parallel. If I were to use the same dummy filename like blargh to write all the output files, the tests would overwrite one another’s output. To get around this, I use the tempfile::NamedTempFile to get a dynamically generated temporary filename that will automatically be removed when I finish:

fn run_outfile(test: &Test) -> TestResult {
    let expected = fs::read_to_string(test.out)?;
    let outfile = NamedTempFile::new()?; 1
    let outpath = &outfile.path().to_str().unwrap(); 2

    Command::cargo_bin(PRG)? 3
        .args(&[test.input, outpath])
        .assert()
        .success()
        .stdout("");
    let contents = fs::read_to_string(&outpath)?; 4
    assert_eq!(&expected, &contents); 5

    Ok(())
}
1

Try to get a named temporary file.

2

Get the path to the file.

3

Run the program with the input and output filenames as arguments, then verify there is nothing in STDOUT.

4

Try to read the output file.

5

Check that the contents of the output file match the expected value.

The next two functions are variations on what I’ve already shown, adding in the --count flag and finally asking the program to read from STDIN when the input filename is a dash. The rest of the module calls these helpers using the various structs to run all the tests.

Processing the Input Files

I would suggest you start in src/lib.rs by reading the input file, so it makes sense to use the open function from previous chapters:

fn open(filename: &str) -> MyResult<Box<dyn BufRead>> {
    match filename {
        "-" => Ok(Box::new(BufReader::new(io::stdin()))),
        _ => Ok(Box::new(BufReader::new(File::open(filename)?))),
    }
}

Be sure you expand your imports to include the following:

use clap::{App, Arg};
use std::{ 1
    error::Error,
    fs::File,
    io::{self, BufRead, BufReader},
};
1

This syntax will group imports by common prefixes, so all the following come from std.

You can borrow quite a bit of code from Chapter 3 that reads lines of text from an input file or STDIN while preserving the line endings:

pub fn run(config: Config) -> MyResult<()> {
    let mut file = open(&config.in_file)
        .map_err(|e| format!("{}: {}", config.in_file, e))?; 1
    let mut line = String::new(); 2
    loop { 3
        let bytes = file.read_line(&mut line)?; 4
        if bytes == 0 { 5
            break;
        }
        print!("{}", line); 6
        line.clear(); 7
    }
    Ok(())
}
1

Either read STDIN if the input file is a dash or open the given filename. Create an informative error message when this fails.

2

Create a new, empty mutable String buffer to hold each line.

3

Create an infinite loop.

4

Read a line of text while preserving the line endings.

5

If no bytes were read, break out of the loop.

6

Print the line buffer.

7

Clear the line buffer.

Run your program with an input file to ensure it works:

$ cargo run -- tests/inputs/one.txt
a

It should also work for reading STDIN:

$ cargo run -- - < tests/inputs/one.txt
a

Next, make your program iterate the lines of input and count each unique run of lines, then print the lines with and without the counts. Once you are able to create the correct output, you will need to handle printing it either to STDOUT or a given filename. I suggest that you copy ideas from the open function and use File::create.

Solution

I’ll step you through how I arrived at a solution. Your version may be different, but it’s fine as long as it passes the test suite. I decided to create two additional mutable variables to hold the previous line of text and the running count. For now, I will always print the count to make sure it’s working correctly:

pub fn run(config: Config) -> MyResult<()> {
    let mut file = open(&config.in_file)
        .map_err(|e| format!("{}: {}", config.in_file, e))?;
    let mut line = String::new();
    let mut previous = String::new(); 1
    let mut count: u64 = 0; 2

    loop {
        let bytes = file.read_line(&mut line)?;
        if bytes == 0 {
            break;
        }

        if line.trim_end() != previous.trim_end() { 3
            if count > 0 { 4
                print!("{:>4} {}", count, previous); 5
            }
            previous = line.clone(); 6
            count = 0; 7
        }

        count += 1; 8
        line.clear();
    }

    if count > 0 { 9
        print!("{:>4} {}", count, previous);
    }

    Ok(())
}
1

Create a mutable variable to hold the previous line of text.

2

Create a mutable variable to hold the count.

3

Compare the current line to the previous line, both trimmed of any possible trailing whitespace.

4

Print the output only when count is greater than 0.

5

Print the count right-justified in a column four characters wide followed by a space and the previous value.

6

Set the previous variable to a copy of the current line.

7

Reset the counter to 0.

8

Increment the counter by 1.

9

Handle the last line of the file.

Note

I didn’t have to indicate the type u64 for the count variable. Rust will happily infer a type. On a 32-bit system, Rust would use an i32, which would limit the maximum number of duplicates to i32::MAX, or 2,147,483,647. That’s a big number that’s likely to be adequate, but I think it’s better to have the program work consistently by specifying u64.

If I run cargo test, this will pass a fair number of tests. This code is clunky, though. I don’t like having to check if count > 0 twice, as it violates the don’t repeat yourself (DRY) principle, where you isolate a common idea into a single abstraction like a function rather than copying and pasting the same lines of code throughout a program. Also, my code always prints the count, but it should print the count only when config.count is true. I can put all of this logic into a function, and I will specifically use a closure to close around the config.count value:

let print = |count: u64, text: &str| { 1
    if count > 0 { 2
        if config.count { 3
            print!("{:>4} {}", count, text); 4
        } else {
            print!("{}", text); 5
        }
    };
};
1

The print closure will accept count and text values.

2

Print only if count is greater than 0.

3

Check if the config.count value is true.

4

Use the print! macro to print the count and text to STDOUT.

5

Otherwise, print the text to STDOUT.

I can update the rest of the function to use this closure:

loop {
    let bytes = file.read_line(&mut line)?;
    if bytes == 0 {
        break;
    }

    if line.trim_end() != previous.trim_end() {
        print(count, &previous);
        previous = line.clone();
        count = 0;
    }

    count += 1;
    line.clear();
}

print(count, &previous);

At this point, the program will pass several more tests. All the failed test names have the string outfile because the program fails to write a named output file. To add this last feature, you can open the output file in the same way as the input file, either by creating a named output file using File::create or by using std::io::stdout. Be sure to add use std::io::Write for the following code, which you can place just after the file variable:

let mut out_file: Box<dyn Write> = match &config.out_file { 1
    Some(out_name) => Box::new(File::create(out_name)?), 2
    _ => Box::new(io::stdout()), 3
};
1

The mutable out_file will be a boxed value that implements the std::io​::Write trait.

2

When config.out_file is Some filename, use File::create to try to create the file.

3

Otherwise, use std::io::stdout.

If you look at the documentation for File::create and io::stdout, you’ll see both have a “Traits” section showing the various traits they implement. Both show that they implement Write, so they satisfy the type requirement Box<dyn Write>, which says that the value inside the Box must implement this trait.

The second change I need to make is to use out_file for the output. I will replace the print! macro with write! to write the output to a stream like a filehandle or STDOUT. The first argument to write! must be a mutable value that implements the Write trait. The documentation shows that write! will return a std::io::Result because it might fail. As such, I changed my print closure to return MyResult. Here is the final version of my run function that passes all the tests:

pub fn run(config: Config) -> MyResult<()> {
    let mut file = open(&config.in_file)
        .map_err(|e| format!("{}: {}", config.in_file, e))?; 1

    let mut out_file: Box<dyn Write> = match &config.out_file { 2
        Some(out_name) => Box::new(File::create(out_name)?),
        _ => Box::new(io::stdout()),
    };

    let mut print = |count: u64, text: &str| -> MyResult<()> { 3
        if count > 0 {
            if config.count {
                write!(out_file, "{:>4} {}", count, text)?;
            } else {
                write!(out_file, "{}", text)?;
            }
        };
        Ok(())
    };

    let mut line = String::new();
    let mut previous = String::new();
    let mut count: u64 = 0;
    loop {
        let bytes = file.read_line(&mut line)?;
        if bytes == 0 {
            break;
        }

        if line.trim_end() != previous.trim_end() {
            print(count, &previous)?; 4
            previous = line.clone();
            count = 0;
        }

        count += 1;
        line.clear();
    }
    print(count, &previous)?; 5

    Ok(())
}
1

Open either STDIN or the given input filename.

2

Open either STDOUT or the given output filename.

3

Create a mutable print closure to format the output.

4

Use the print closure to possibly print output. Use ? to propagate potential errors.

5

Handle the last line of the file.

Note that the print closure must be declared with the mut keyword to make it mutable because the out_file filehandle is borrowed. Without this, the compiler will show the following error:

error[E0596]: cannot borrow `print` as mutable, as it is not declared as mutable
  --> src/lib.rs:84:13
   |
63 |     let print = |count: u64, text: &str| -> MyResult<()> {
   |         ----- help: consider changing this to be mutable: `mut print`
...
66 |                 write!(out_file, "{:>4} {}", count, text)?;
   |                        -------- calling `print` requires mutable binding
   |                                 due to mutable borrow of `out_file`

Again, it’s okay if your solution is different from mine, as long as it passes the tests. Part of what I like about writing with tests is that there is an objective determination of when a program meets some level of specifications. As Louis Srygley once said, “Without requirements or design, programming is the art of adding bugs to an empty text file.”2 I would say that tests are the requirements made incarnate. Without tests, you simply have no way to know when a change to your program strays from the requirements or breaks the design.

Summary

In about 100 lines of Rust, the uniqr program manages to replicate a reasonable subset of features from the original uniq program. Compare this to the GNU C source code, which has more than 600 lines of code. I would feel much more confident extending uniqr than I would using C due to the Rust compiler’s use of types and useful error messages.

Let’s review some of the things you learned in this chapter:

In the next chapter, I’ll introduce Rust’s enumerated enum type and how to use regular expressions.

1 While the goal is to mimic the original versions as much as possible, I would note that I do not like optional positional parameters. In my opinion, it would be better to have an -o|--output option that defaults to STDOUT and have only one optional positional argument for the input file that defaults to STDIN.

2 Programming Wisdom (@CodeWisdom), “‘Without requirements or design, programming is the art of adding bugs to an empty text file.’ - Louis Srygley,” Twitter, January 24, 2018, 1:00 p.m., https://oreil.ly/FC6aS.