All hail the dirt bike / Philosopher dirt bike /
Silence as we gathered round / We saw the word and were on our wayThey Might Be Giants, “Dirt Bike” (1994)
For this chapter’s challenge, you will create a version of the venerable wc
(word count) program, which dates back to version 1 of AT&T Unix.
This program will display the number of lines, words, and bytes found in text from STDIN
or one or more files.
I often use it to count the number of lines returned by some other process.
In this chapter, you will learn how to do the following:
Use the Iterator::all
function
Create a module for tests
Fake a filehandle for testing
Conditionally format and print a value
Conditionally compile a module when testing
Break a line of text into words, bytes, and characters
Use Iterator::collect
to turn an iterator into a vector
I’ll start by showing how wc
works so you know what is expected by the tests.
Following is an excerpt from the BSD wc
manual page that describes the elements that the challenge program will implement:
WC(1) BSD General Commands Manual WC(1) NAME wc -- word, line, character, and byte count SYNOPSIS wc [-clmw] [file ...] DESCRIPTION The wc utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard output. A line is defined as a string of characters delimited by a <newline> character. Characters beyond the final <newline> charac- ter will not be included in the line count. A word is defined as a string of characters delimited by white space characters. White space characters are the set of characters for which the iswspace(3) function returns true. If more than one input file is specified, a line of cumulative counts for all the files is displayed on a separate line after the output for the last file. The following options are available: -c The number of bytes in each input file is written to the standard output. This will cancel out any prior usage of the -m option. -l The number of lines in each input file is written to the standard output. -m The number of characters in each input file is written to the standard output. If the current locale does not support multi- byte characters, this is equivalent to the -c option. This will cancel out any prior usage of the -c option. -w The number of words in each input file is written to the standard output. When an option is specified, wc only reports the information requested by that option. The order of output always takes the form of line, word, byte, and file name. The default action is equivalent to specifying the -c, -l and -w options. If no files are specified, the standard input is used and no file name is displayed. The prompt will accept input until receiving EOF, or [^D] in most environments.
A picture is worth a kilobyte of words, so I’ll show you some examples using the following test files in the 05_wcr/tests/inputs directory:
empty.txt: an empty file
fox.txt: a file with one line of text
atlamal.txt: a file with the first stanza from “Atlamál hin groenlenzku” or “The Greenland Ballad of Atli,” an Old Norse poem
When run with an empty file, the program reports zero lines, words, and bytes in three right-justified columns eight characters wide:
$ cd 05_wcr $ wc tests/inputs/empty.txt 0 0 0 tests/inputs/empty.txt
Next, consider a file with one line of text with varying spaces between words and a tab character.
Let’s take a look at it before running wc
on it.
Here I’m using cat
with the flag -t
to display the tab character as ^I
and -e
to display $
for the end of the line:
$ cat -te tests/inputs/fox.txt The quick brown fox^Ijumps over the lazy dog.$
This example is short enough that I can manually count all the lines, words, and bytes as shown in Figure 5-1, where spaces are noted with raised dots, the tab character with \t
, and the end of the line as $
.
I find that wc
is in agreement:
$ wc tests/inputs/fox.txt 1 9 48 tests/inputs/fox.txt
As mentioned in Chapter 3, bytes may equate to characters for ASCII, but Unicode characters may require multiple bytes. The file tests/inputs/atlamal.txt contains many such examples:1
$ cat tests/inputs/atlamal.txt Frétt hefir öld óvu, þá er endr of gerðu seggir samkundu, sú var nýt fæstum, æxtu einmæli, yggr var þeim síðan ok it sama sonum Gjúka, er váru sannráðnir.
According to wc
, this file contains 4 lines, 29 words, and 177 bytes:
$ wc tests/inputs/atlamal.txt 4 29 177 tests/inputs/atlamal.txt
If I want only the number of lines, I can use the -l
flag and only that column will be shown:
$ wc -l tests/inputs/atlamal.txt 4 tests/inputs/atlamal.txt
I can similarly request only the number of bytes with -c
and words with -w
, and only those two columns will be shown:
$ wc -w -c tests/inputs/atlamal.txt 29 177 tests/inputs/atlamal.txt
I can request the number of characters using the -m
flag:
$ wc -m tests/inputs/atlamal.txt 159 tests/inputs/atlamal.txt
The GNU version of wc
will show both character and byte counts if you provide both the flags -m
and -c
, but the BSD version will show only one or the other, with the latter flag taking precedence:
$ wc -cm tests/inputs/atlamal.txt159 tests/inputs/atlamal.txt $ wc -mc tests/inputs/atlamal.txt
177 tests/inputs/atlamal.txt
Note that no matter the order of the flags, like -wc
or -cw
, the output columns are always ordered by lines, words, and bytes/characters:
$ wc -cw tests/inputs/atlamal.txt 29 177 tests/inputs/atlamal.txt
If no positional arguments are provided, wc
will read from STDIN
and will not print a filename:
$ cat tests/inputs/atlamal.txt | wc -lc 4 177
The GNU version of wc
will understand a filename consisting of a dash (-
) to mean STDIN
, and it also provides long flag names as well as some other options:
$ wc --help Usage: wc [OPTION]... [FILE]... or: wc [OPTION]... --files0-from=F Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. With no FILE, or when FILE is -, read standard input. A word is a non-zero-length sequence of characters delimited by white space. The options below may be used to select which counts are printed, always in the following order: newline, word, character, byte, maximum line length. -c, --bytes print the byte counts -m, --chars print the character counts -l, --lines print the newline counts --files0-from=F read input from the files specified by NUL-terminated names in file F; If F is - then read names from standard input -L, --max-line-length print the length of the longest line -w, --words print the word counts --help display this help and exit --version output version information and exit
If processing more than one file, both versions will finish with a total line showing the number of lines, words, and bytes for all the inputs:
$ wc tests/inputs/*.txt 4 29 177 tests/inputs/atlamal.txt 0 0 0 tests/inputs/empty.txt 1 9 48 tests/inputs/fox.txt 5 38 225 total
Nonexistent files are noted with a warning to STDERR
as the files are being processed.
In the following example, blargh represents a nonexistent file:
$ wc tests/inputs/fox.txt blargh tests/inputs/atlamal.txt 1 9 48 tests/inputs/fox.txt wc: blargh: open: No such file or directory 4 29 177 tests/inputs/atlamal.txt 5 38 225 total
As I first showed in Chapter 2, I can redirect the STDERR
filehandle 2
in bash
to verify that wc
prints the warnings to that channel:
$ wc tests/inputs/fox.txt blargh tests/inputs/atlamal.txt 2>err1 9 48 tests/inputs/fox.txt 4 29 177 tests/inputs/atlamal.txt 5 38 225 total $ cat err
wc: blargh: open: No such file or directory
There is an extensive test suite to verify that your program implements all these options.
The challenge program should be called wcr
(pronounced wick-er) for our Rust version of wc
.
Use cargo new wcr
to start, then modify your Cargo.toml to add the following dependencies:
[dependencies]
clap
=
"2.33"
[dev-dependencies]
assert_cmd
=
"2"
predicates
=
"2"
rand
=
"0.8"
Copy the 05_wcr/tests directory into your new project and run cargo test
to perform an initial build and run the tests, all of which should fail.
Use the same structure for src/main.rs from previous programs:
fn
main
()
{
if
let
Err
(
e
)
=
wcr
::get_args
().
and_then
(
wcr
::run
)
{
eprintln
!
(
"{}"
,
e
);
std
::process
::exit
(
1
);
}
}
Following is a skeleton for src/lib.rs you can copy.
First, here is how I would define the Config
to represent the command-line parameters:
use
clap
:
:
{
App
,
Arg
}
;
use
std
::
error
::
Error
;
type
MyResult
<
T
>
=
Result
<
T
,
Box
<
dyn
Error
>
>
;
#[
derive(Debug)
]
pub
struct
Config
{
files
:
Vec
<
String
>
,
lines
:
bool
,
words
:
bool
,
bytes
:
bool
,
chars
:
bool
,
}
The files
parameter will be a vector of strings.
The lines
parameter is a Boolean for whether or not to print the line count.
The words
parameter is a Boolean for whether or not to print the word count.
The bytes
parameter is a Boolean for whether or not to print the byte count.
The chars
parameter is a Boolean for whether or not to print the character count.
The main
function assumes you will create a get_args
function to process the command-line arguments.
Here is an outline you can use:
pub
fn
get_args
()
->
MyResult
<
Config
>
{
let
matches
=
App
::new
(
"wcr"
)
.
version
(
"0.1.0"
)
.
author
(
"Ken Youens-Clark <kyclark@gmail.com>"
)
.
about
(
"Rust wc"
)
// What goes here?
.
get_matches
();
Ok
(
Config
{
files
:...
lines
:...
words
:...
bytes
:...
chars
:...
})
}
You will also need a run
function, and you can start by printing the configuration:
pub
fn
run
(
config
:Config
)
->
MyResult
<
()
>
{
println
!
(
"{:#?}"
,
config
);
Ok
(())
}
Try to get your program to generate --help
output similar to the following:
$ cargo run -- --help wcr 0.1.0 Ken Youens-Clark <kyclark@gmail.com> Rust wc USAGE: wcr [FLAGS] [FILE]... FLAGS: -c, --bytes Show byte count -m, --chars Show character count -h, --help Prints help information -l, --lines Show line count -V, --version Prints version information -w, --words Show word count ARGS: <FILE>... Input file(s) [default: -]
The challenge program will mimic the BSD wc
in disallowing both the -m
(character) and -c
(bytes) flags:
$ cargo run -- -cm tests/inputs/fox.txt error: The argument '--bytes' cannot be used with '--chars' USAGE: wcr --bytes --chars
The default behavior will be to print lines, words, and bytes from STDIN
, which means those values in the configuration should be true
when none have been explicitly requested by the user:
$ cargo run Config { files: [ "-",], lines: true, words: true, bytes: true, chars: false,
}
The default value for files
should be a dash (-
) for STDIN
.
The chars
value should be false
unless the -m|--chars
flag is present.
If any single flag is present, then all the other flags not mentioned should be false
:
$ cargo run -- -l tests/inputs/*.txtConfig { files: [ "tests/inputs/atlamal.txt", "tests/inputs/empty.txt", "tests/inputs/fox.txt", ], lines: true,
words: false, bytes: false, chars: false, }
The -l
flag indicates only the line count is wanted, and bash
will expand the file glob tests/inputs/*.txt
into all the filenames in that directory.
Because the -l
flag is present, the lines
value is the only one that is true
.
Stop here and get this much working. My dog needs a bath, so I’ll be right back.
Following is the first part of my get_args
.
There’s nothing new to how I declare the parameters, so I’ll not comment on this:
pub
fn
get_args
()
->
MyResult
<
Config
>
{
let
matches
=
App
::new
(
"wcr"
)
.
version
(
"0.1.0"
)
.
author
(
"Ken Youens-Clark <kyclark@gmail.com>"
)
.
about
(
"Rust wc"
)
.
arg
(
Arg
::with_name
(
"files"
)
.
value_name
(
"FILE"
)
.
help
(
"Input file(s)"
)
.
default_value
(
"-"
)
.
multiple
(
true
),
)
.
arg
(
Arg
::with_name
(
"words"
)
.
short
(
"w"
)
.
long
(
"words"
)
.
help
(
"Show word count"
)
.
takes_value
(
false
),
)
.
arg
(
Arg
::with_name
(
"bytes"
)
.
short
(
"c"
)
.
long
(
"bytes"
)
.
help
(
"Show byte count"
)
.
takes_value
(
false
),
)
.
arg
(
Arg
::with_name
(
"chars"
)
.
short
(
"m"
)
.
long
(
"chars"
)
.
help
(
"Show character count"
)
.
takes_value
(
false
)
.
conflicts_with
(
"bytes"
),
)
.
arg
(
Arg
::with_name
(
"lines"
)
.
short
(
"l"
)
.
long
(
"lines"
)
.
help
(
"Show line count"
)
.
takes_value
(
false
),
)
.
get_matches
();
After clap
parses the arguments, I unpack them and try to figure out the default
values:
let
mut
lines
=
matches
.
is_present
(
"
lines
"
)
;
let
mut
words
=
matches
.
is_present
(
"
words
"
)
;
let
mut
bytes
=
matches
.
is_present
(
"
bytes
"
)
;
let
chars
=
matches
.
is_present
(
"
chars
"
)
;
if
[
lines
,
words
,
bytes
,
chars
]
.
iter
(
)
.
all
(
|
v
|
v
=
=
&
false
)
{
lines
=
true
;
words
=
true
;
bytes
=
true
;
}
Ok
(
Config
{
files
:
matches
.
values_of_lossy
(
"
files
"
)
.
unwrap
(
)
,
lines
,
words
,
bytes
,
chars
,
}
)
}
Unpack all the flags.
If all the flags are false
, then set lines
, words
, and bytes
to true
.
Use the struct field initialization shorthand to set the values.
I want to highlight how I create a temporary list using a slice
with all the flags.
I then call the slice::iter
method to create an iterator so I can use the Iterator::all
function to find if all the values are false
.
This method expects a closure, which is an anonymous function that can be passed as an argument to another function.
Here, the closure is a predicate or a test that figures out if an element is false
.
The values are references, so I compare each value to &false
, which is a reference to a Boolean value.
If all the evaluations are true
, then Iterator::all
will return true
.2
A slightly shorter but possibly less obvious way to write this would be:
if
[
lines
,
words
,
bytes
,
chars
]
.
iter
(
)
.
all
(
|
v
|
!
v
)
{
Negate each Boolean value v
using std::ops::Not
, which is written using a prefix exclamation point (!
).
Now to work on the counting part of the program.
This will require iterating over the file arguments and trying to open them, and I suggest you use the open
function from Chapter 2 for this:
fn
open
(
filename
:&
str
)
->
MyResult
<
Box
<
dyn
BufRead
>>
{
match
filename
{
"-"
=>
Ok
(
Box
::new
(
BufReader
::new
(
io
::stdin
()))),
_
=>
Ok
(
Box
::new
(
BufReader
::new
(
File
::open
(
filename
)
?
))),
}
}
Be sure to expand your imports to the following:
use
clap
::{
App
,
Arg
};
use
std
::error
::Error
;
use
std
::fs
::File
;
use
std
::io
::{
self
,
BufRead
,
BufReader
};
Here is a run
function to get you going:
pub
fn
run
(
config
:
Config
)
->
MyResult
<
(
)
>
{
for
filename
in
&
config
.
files
{
match
open
(
filename
)
{
Err
(
err
)
=
>
eprintln
!
(
"
{}: {}
"
,
filename
,
err
)
,
Ok
(
_
)
=
>
println
!
(
"
Opened {}
"
,
filename
)
,
}
}
Ok
(
(
)
)
}
You are welcome to write your solution however you like, but I decided to create a function called count
that would take a filehandle and possibly return a struct called FileInfo
containing the number of lines, words, bytes, and characters, each represented as a usize
.
I say that the function will possibly return this struct because the function will involve I/O, which could go sideways.
I put the following definition in src/lib.rs just after the Config
struct.
For reasons I will explain shortly, this must derive the PartialEq
trait in addition to Debug
:
#[derive(Debug, PartialEq)]
pub
struct
FileInfo
{
num_lines
:usize
,
num_words
:usize
,
num_bytes
:usize
,
num_chars
:usize
,
}
My count
function might succeed or fail, so it will return a MyResult<FileInfo>
, meaning that on success it will have a FileInfo
in the Ok
variant or else will have an Err
.
To start this function, I will initialize some mutable variables to count all the elements and will return a FileInfo
struct:
pub
fn
count
(
mut
file
:
impl
BufRead
)
->
MyResult
<
FileInfo
>
{
let
mut
num_lines
=
0
;
let
mut
num_words
=
0
;
let
mut
num_bytes
=
0
;
let
mut
num_chars
=
0
;
Ok
(
FileInfo
{
num_lines
,
num_words
,
num_bytes
,
num_chars
,
}
)
}
The count
function will accept a mutable file
value, and it might return a
FileInfo
struct.
Initialize mutable variables to count the lines, words, bytes, and characters.
For now, return a FileInfo
with all zeros.
I’m introducing the impl
keyword to indicate that the file
value must implement the BufRead
trait. Recall that open
returns a value that meets this criterion. You’ll shortly see how this makes the function flexible.
In Chapter 4, I showed you how to write a unit test, placing it just after the function it was testing.
I’m going to create a unit test for the count
function, but this time I’m going to place it inside a module called tests
.
This is a tidy way to group unit tests, and I can use the #[cfg(test)]
configuration option to tell Rust to compile the module only during testing.
This is especially useful because I want to use std::io::Cursor
in my test to fake a filehandle for the count
function.
According to the documentation, a Cursor
is “used with in-memory buffers, anything implementing AsRef<[u8]>
, to allow them to implement Read
and/or Write
, allowing these buffers to be used anywhere you might use a reader or writer that does actual I/O.”
Placing this dependency inside the tests
module ensures that it will be included only when I test the program.
The following is how I create the tests
module and then import and test the count
function:
#[cfg(test)]mod tests {
use super::{count, FileInfo};
use std::io::Cursor;
#[test] fn test_count() { let text = "I don't want the world. I just want your half.\r\n"; let info = count(Cursor::new(text));
assert!(info.is_ok());
let expected = FileInfo { num_lines: 1, num_words: 10, num_chars: 48, num_bytes: 48, }; assert_eq!(info.unwrap(), expected);
} }
The cfg
enables conditional compilation, so this module will be compiled only when testing.
Define a new module (mod
) called tests
to contain test code.
Import the count
function and FileInfo
struct from the parent module super
, meaning next above and referring to the module above tests
that contains it.
Import std::io::Cursor
.
Run count
with the Cursor
.
Ensure the result is Ok
.
Compare the result to the expected value. This comparison requires FileInfo
to implement the PartialEq
trait, which is why I added derive(PartialEq)
earlier.
Run this test using cargo test test_count
.
You will see lots of warnings from the Rust compiler about unused variables or variables that do not need to be mutable.
The most important result is that the test fails:
failures: ---- tests::test_count stdout ---- thread 'tests::test_count' panicked at 'assertion failed: `(left == right)` left: `FileInfo { num_lines: 0, num_words: 0, num_bytes: 0, num_chars: 0 }`, right: `FileInfo { num_lines: 1, num_words: 10, num_bytes: 48, num_chars: 48 }`', src/lib.rs:146:9
This is an example of test-driven development, where you write a test to define the expected behavior of your function and then write the function that passes the unit test.
Once you have some reasonable assurance that the function is correct, use the returned FileInfo
to print the expected output.
Start as simply as possible using the empty file, and make sure your program prints zeros for the three columns of lines, words, and bytes:
$ cargo run -- tests/inputs/empty.txt 0 0 0 tests/inputs/empty.txt
Next, use tests/inputs/fox.txt and make sure you get the following counts. I specifically added various kinds and numbers of whitespace to challenge you on how to split the text into words:
$ cargo run -- tests/inputs/fox.txt 1 9 48 tests/inputs/fox.txt
Be sure your program can handle the Unicode in tests/inputs/atlamal.txt correctly:
$ cargo run -- tests/inputs/atlamal.txt 4 29 177 tests/inputs/atlamal.txt
And that you correctly count the characters:
$ cargo run -- tests/inputs/atlamal.txt -wml 4 29 159 tests/inputs/atlamal.txt
Next, use multiple input files to check that your program prints the correct total column:
$ cargo run -- tests/inputs/*.txt 4 29 177 tests/inputs/atlamal.txt 0 0 0 tests/inputs/empty.txt 1 9 48 tests/inputs/fox.txt 5 38 225 total
When all that works correctly, try reading from STDIN
:
$ cat tests/inputs/atlamal.txt | cargo run 4 29 177
Now, I’ll walk you through how I went about writing the wcr
program.
Bear in mind that you could have solved this many different ways.
As long as your code passes the tests and produces the same output as the BSD version of wc
, then it works well and you should be proud of your accomplishments.
I left you with an unfinished count
function, so I’ll start there.
As we discussed in Chapter 3, BufRead::lines
will remove the line endings, and I don’t want that because newlines in Windows files are two bytes (\r\n
) but Unix newlines are just one byte (\n
).
I can copy some code from Chapter 3 that uses BufRead::read_line
to read each line into a buffer.
Conveniently, this function tells me how many bytes have been read from the file:
pub
fn
count
(
mut
file
:
impl
BufRead
)
->
MyResult
<
FileInfo
>
{
let
mut
num_lines
=
0
;
let
mut
num_words
=
0
;
let
mut
num_bytes
=
0
;
let
mut
num_chars
=
0
;
let
mut
line
=
String
::
new
(
)
;
loop
{
let
line_bytes
=
file
.
read_line
(
&
mut
line
)
?
;
if
line_bytes
=
=
0
{
break
;
}
num_bytes
+
=
line_bytes
;
num_lines
+
=
1
;
num_words
+
=
line
.
split_whitespace
(
)
.
count
(
)
;
num_chars
+
=
line
.
chars
(
)
.
count
(
)
;
line
.
clear
(
)
;
}
Ok
(
FileInfo
{
num_lines
,
num_words
,
num_bytes
,
num_chars
,
}
)
}
Create a mutable buffer to hold each line
of text.
Create an infinite loop
for reading the filehandle.
Try to read a line from the filehandle.
End of file (EOF) has been reached when zero bytes are read, so break
out of the loop.
Add the number of bytes from this line to the num_bytes
variable.
Each time through the loop is a line, so increment num_lines
.
Use the str::split_whitespace
method to break the string on whitespace and use Iterator::count
to find the number of words.
Use the str::chars
method to break the string into Unicode characters and use Iterator::count
to count the characters.
With these changes, the test_count
test will pass.
To integrate this into my code, I will first change run
to simply print the FileInfo
struct or print a warning to STDERR
when the file can’t be opened:
pub
fn
run
(
config
:
Config
)
->
MyResult
<
(
)
>
{
for
filename
in
&
config
.
files
{
match
open
(
filename
)
{
Err
(
err
)
=
>
eprintln
!
(
"
{}: {}
"
,
filename
,
err
)
,
Ok
(
file
)
=
>
{
if
let
Ok
(
info
)
=
count
(
file
)
{
println
!
(
"
{:?}
"
,
info
)
;
}
}
}
}
Ok
(
(
)
)
}
When I run it on one of the test inputs, it appears to work for a valid file:
$ cargo run -- tests/inputs/fox.txt FileInfo { num_lines: 1, num_words: 9, num_bytes: 48, num_chars: 48 }
It even handles reading from STDIN
:
$ cat tests/inputs/fox.txt | cargo run FileInfo { num_lines: 1, num_words: 9, num_bytes: 48, num_chars: 48 }
Next, I need to format the output to meet the specifications.
To create the expected output, I can start by changing run
to always print the lines, words, and bytes followed by the filename:
pub
fn
run
(
config
:
Config
)
->
MyResult
<
(
)
>
{
for
filename
in
&
config
.
files
{
match
open
(
filename
)
{
Err
(
err
)
=
>
eprintln
!
(
"
{}: {}
"
,
filename
,
err
)
,
Ok
(
file
)
=
>
{
if
let
Ok
(
info
)
=
count
(
file
)
{
println
!
(
"
{:>8}{:>8}{:>8} {}
"
,
info
.
num_lines
,
info
.
num_words
,
info
.
num_bytes
,
filename
)
;
}
}
}
}
Ok
(
(
)
)
}
If I run it with one input file, it’s already looking pretty sweet:
$ cargo run -- tests/inputs/fox.txt 1 9 48 tests/inputs/fox.txt
If I run cargo test fox
to run all the tests with the word fox in the name, I pass one out of eight tests.
Huzzah!
running 8 tests test fox ... ok test fox_bytes ... FAILED test fox_chars ... FAILED test fox_bytes_lines ... FAILED test fox_words_bytes ... FAILED test fox_words ... FAILED test fox_words_lines ... FAILED test fox_lines ... FAILED
I can inspect tests/cli.rs to see what the passing test looks like. Note that the tests reference constant values declared at the top of the module:
const
PRG
:&
str
=
"wcr"
;
const
EMPTY
:&
str
=
"tests/inputs/empty.txt"
;
const
FOX
:&
str
=
"tests/inputs/fox.txt"
;
const
ATLAMAL
:&
str
=
"tests/inputs/atlamal.txt"
;
Again I have a run
helper function to run my tests:
fn
run
(
args
:
&
[
&
str
]
,
expected_file
:
&
str
)
->
TestResult
{
let
expected
=
fs
::
read_to_string
(
expected_file
)
?
;
Command
::
cargo_bin
(
PRG
)
?
.
args
(
args
)
.
assert
(
)
.
success
(
)
.
stdout
(
expected
)
;
Ok
(
(
)
)
}
Try to read the expected
output for this command.
Run the wcr
program with the given arguments. Assert that the program succeeds and that STDOUT
matches the expected
value.
The fox
test is running wcr
with the FOX
input file and no options, comparing it to the contents of the expected output file that was generated using 05_wcr/mk-outs.sh:
#[test]
fn
fox
()
->
TestResult
{
run
(
&
[
FOX
],
"tests/expected/fox.txt.out"
)
}
Look at the next function in the file to see a failing test:
#[
test
]
fn
fox_bytes
(
)
->
TestResult
{
run
(
&
[
"
--bytes
"
,
FOX
]
,
"
tests/expected/fox.txt.c.out
"
)
}
When run with --bytes
, my program should print only that column of output, but it always prints lines, words, and bytes.
So I decided to write a function called format_field
in src/lib.rs that would conditionally return a formatted string or the empty string depending on a Boolean value:
fn
format_field
(
value
:
usize
,
show
:
bool
)
->
String
{
if
show
{
format
!
(
"
{:>8}
"
,
value
)
}
else
{
"
"
.
to_string
(
)
}
}
The function accepts a usize
value and a Boolean and returns a String
.
Check if the show
value is true
.
Return a new string by formatting the number into a string eight characters wide.
Otherwise, return the empty string.
Why does this function return a String
and not a str
? They’re both strings, but a str
is an immutable, fixed-length string. The value that will be returned from the function is dynamically generated at runtime, so I must use String
, which is a growable, heap-allocated structure.
I can expand my tests
module to add a unit test for this:
#[
cfg(test)
]
mod
tests
{
use
super
:
:
{
count
,
format_field
,
FileInfo
}
;
use
std
::
io
::
Cursor
;
#[
test
]
fn
test_count
(
)
{
}
// Same as before
#[
test
]
fn
test_format_field
(
)
{
assert_eq
!
(
format_field
(
1
,
false
)
,
"
"
)
;
assert_eq
!
(
format_field
(
3
,
true
)
,
"
3
"
)
;
assert_eq
!
(
format_field
(
10
,
true
)
,
"
10
"
)
;
}
}
Add format_field
to the imports.
The function should return the empty string when show
is false
.
Check width for a single-digit number.
Check width for a double-digit number.
Here is how I use the format_field
function in context, where I also handle printing the empty string when reading from STDIN
:
pub
fn
run
(
config
:
Config
)
->
MyResult
<
(
)
>
{
for
filename
in
&
config
.
files
{
match
open
(
filename
)
{
Err
(
err
)
=
>
eprintln
!
(
"
{}: {}
"
,
filename
,
err
)
,
Ok
(
file
)
=
>
{
if
let
Ok
(
info
)
=
count
(
file
)
{
println
!
(
"
{}{}{}{}{}
"
,
format_field
(
info
.
num_lines
,
config
.
lines
)
,
format_field
(
info
.
num_words
,
config
.
words
)
,
format_field
(
info
.
num_bytes
,
config
.
bytes
)
,
format_field
(
info
.
num_chars
,
config
.
chars
)
,
if
filename
=
=
"
-
"
{
"
"
.
to_string
(
)
}
else
{
format
!
(
"
{}
"
,
filename
)
}
)
;
}
}
}
}
Ok
(
(
)
)
}
Format the output for each of the columns using the format_field
function.
When the filename is a dash, print the empty string; otherwise, print a space and the filename.
With these changes, all the tests for cargo test fox
pass.
But if I run the entire test suite, I see that my program is still failing the tests with names that include the word all:
failures: test_all test_all_bytes test_all_bytes_lines test_all_lines test_all_words test_all_words_bytes test_all_words_lines
Looking at the test_all
function in tests/cli.rs confirms that the test is using all the input files as arguments:
#[
test
]
fn
test_all
(
)
->
TestResult
{
run
(
&
[
EMPTY
,
FOX
,
ATLAMAL
]
,
"
tests/expected/all.out
"
)
}
If I run my current program with all the input files, I can see that I’m missing the total line:
$ cargo run -- tests/inputs/*.txt 4 29 177 tests/inputs/atlamal.txt 0 0 0 tests/inputs/empty.txt 1 9 48 tests/inputs/fox.txt
Here is my final run
function that keeps a running total and prints those values when there is more than one input:
pub
fn
run
(
config
:
Config
)
->
MyResult
<
(
)
>
{
let
mut
total_lines
=
0
;
let
mut
total_words
=
0
;
let
mut
total_bytes
=
0
;
let
mut
total_chars
=
0
;
for
filename
in
&
config
.
files
{
match
open
(
filename
)
{
Err
(
err
)
=
>
eprintln
!
(
"
{}: {}
"
,
filename
,
err
)
,
Ok
(
file
)
=
>
{
if
let
Ok
(
info
)
=
count
(
file
)
{
println
!
(
"
{}{}{}{}{}
"
,
format_field
(
info
.
num_lines
,
config
.
lines
)
,
format_field
(
info
.
num_words
,
config
.
words
)
,
format_field
(
info
.
num_bytes
,
config
.
bytes
)
,
format_field
(
info
.
num_chars
,
config
.
chars
)
,
if
filename
.
as_str
(
)
=
=
"
-
"
{
"
"
.
to_string
(
)
}
else
{
format
!
(
"
{}
"
,
filename
)
}
)
;
total_lines
+
=
info
.
num_lines
;
total_words
+
=
info
.
num_words
;
total_bytes
+
=
info
.
num_bytes
;
total_chars
+
=
info
.
num_chars
;
}
}
}
}
if
config
.
files
.
len
(
)
>
1
{
println
!
(
"
{}{}{}{} total
"
,
format_field
(
total_lines
,
config
.
lines
)
,
format_field
(
total_words
,
config
.
words
)
,
format_field
(
total_bytes
,
config
.
bytes
)
,
format_field
(
total_chars
,
config
.
chars
)
)
;
}
Ok
(
(
)
)
}
Create mutable variables to track the total number of lines, words, bytes, and characters.
Update the totals using the values from this file.
Print the totals if there is more than one input.
This appears to work well:
$ cargo run -- tests/inputs/*.txt 4 29 177 tests/inputs/atlamal.txt 0 0 0 tests/inputs/empty.txt 1 9 48 tests/inputs/fox.txt 5 38 225 total
I can count characters instead of bytes:
$ cargo run -- -m tests/inputs/atlamal.txt 159 tests/inputs/atlamal.txt
And I can show and hide any columns I want:
$ cargo run -- -wc tests/inputs/atlamal.txt 29 177 tests/inputs/atlamal.txt
Write a version that mimics the output from the GNU wc
instead of the BSD version.
If your system already has the GNU version, run the mk-outs.sh program to generate the expected outputs for the given input files.
Modify the program to create the
correct output according to the tests.
Then expand the program to handle the additional options like --files0-from
for reading the input filenames from a file and
--max-line-length
to print the length of the longest line.
Add tests for the new
functionality.
Next, ponder the mysteries of the iswspace
function mentioned in the BSD manual page noted at the beginning of the chapter.
What if you ran the program on the spiders.txt file of the Issa haiku from Chapter 2, but it used Japanese characters?3
隅の蜘案じな煤はとらぬぞよ
What would the output be? If I place this into a file called spiders.txt, BSD wc
thinks there are three words:
$ wc spiders.txt 1 3 40 spiders.txt
The GNU version says there is only one word:
$ wc spiders.txt 1 1 40 spiders.txt
I didn’t want to open that can of worms (or spiders?), but if you were creating a version of this program to release to the public, how could you replicate the BSD and GNU versions?
Well, that was certainly fun. In about 200 lines of Rust, we wrote a pretty passable replacement for one of the most widely used Unix programs. Compare your version to the 1,000 lines of C in the GNU source code. Reflect upon your progress in this chapter:
You learned that the Iterator::all
function will return true
if all the elements evaluate to true
for the given predicate, which is a closure accepting an element. Many similar Iterator
methods accept a closure as an argument for testing, selecting, and transforming the elements.
You used the str::split_whitespace
and str::chars
methods to break text into words and characters.
You used the Iterator::count
method to count the number of items.
You wrote a function to conditionally format a value or the empty string to support the printing or omission of information according to the flag arguments.
You organized your unit tests into a tests
module and imported functions from the parent module, called super
.
You used the #[cfg(test)]
configuration option to tell Rust to compile the tests
module only when testing.
You saw how to use std::io::Cursor
to create a fake filehandle for testing a function that expects something that implements BufRead
.
You’ve learned quite a bit about reading files with Rust, and in the next chapter, you’ll learn how to write files.
1 The text shown in this example translates to: “There are many who know how of old did men, in counsel gather / little good did they get / in secret they plotted, it was sore for them later / and for Gjuki’s sons, whose trust they deceived.”
2 When my youngest first started brushing his own teeth before bed, I would ask if he’d brushed and flossed. The problem was that he was prone to fibbing, so it was hard to trust him. In an actual exchange one night, I asked, “Did you brush and floss your teeth?” Yes, he replied. “Did you brush your teeth?” Yes, he replied. “Did you floss your teeth?” No, he replied. So clearly he failed to properly combine Boolean values because a true
statement and a false
statement should result in a false
outcome.
3 A more literal translation might be “Corner spider, rest easy, my soot-broom is idle.”