There’s only one everything
They Might Be Giants, “One Everything” (2008)
In this chapter, you will write a Rust version of the uniq
program (pronounced unique), which will find the distinct lines of text from either a file or STDIN
.
Among its many uses, it is often employed to count how many times each unique string is found.
Along the way, you will learn how to do the following:
Write to a file or STDOUT
Use a closure to capture a variable
Apply the don’t repeat yourself (DRY) concept
Use the Write
trait and the write!
and writeln!
macros
Use temporary files
Indicate the lifetime of a variable
As usual, I’ll start by explaining how uniq
works so that you understand what is expected of your program.
Following is part of the manual page for the BSD version of uniq
.
The challenge program in this chapter will only implement the reading of a file or STDIN
, writing to a file or STDOUT
, and counting the lines for the -c
flag, but I include more of the documentation so that you can see the full scope of the program:
UNIQ(1) BSD General Commands Manual UNIQ(1) NAME uniq -- report or filter out repeated lines in a file SYNOPSIS uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]] DESCRIPTION The uniq utility reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file. If input_file is a single dash ('-') or absent, the standard input is read. If output_file is absent, standard output is used for output. The second and succeeding copies of identical adjacent input lines are not written. Repeated lines in the input will not be detected if they are not adja- cent, so it may be necessary to sort the files first. The following options are available: -c Precede each output line with the count of the number of times the line occurred in the input, followed by a single space. -d Only output lines that are repeated in the input. -f num Ignore the first num fields in each input line when doing compar- isons. A field is a string of non-blank characters separated from adjacent fields by blanks. Field numbers are one based, i.e., the first field is field one. -s chars Ignore the first chars characters in each input line when doing comparisons. If specified in conjunction with the -f option, the first chars characters after the first num fields will be ignored. Character numbers are one based, i.e., the first char- acter is character one. -u Only output lines that are not repeated in the input. -i Case insensitive comparison of lines.
In the 06_uniqr/tests/inputs directory of the book’s Git repository, you will find the following input files I’ll use for testing:
empty.txt: an empty file
one.txt: a file with one line of text
two.txt: a file with two lines of the same text
three.txt: a file with 13 lines of 4 unique values
skip.txt: a file with four lines of two unique values plus an empty line
The other files t[1–6].txt are examples from a Perl program used to test the GNU version. These are generated by the mk-outs.sh file:
$ cat mk-outs.sh #!/usr/bin/env bash ROOT="tests/inputs" OUT_DIR="tests/expected" [[ ! -d "$OUT_DIR" ]] && mkdir -p "$OUT_DIR" # Cf https://github.com/coreutils/coreutils/blob/master/tests/misc/uniq.pl echo -ne "a\na\n" > $ROOT/t1.txtecho -ne "a\na" > $ROOT/t2.txt
echo -ne "a\nb" > $ROOT/t3.txt
echo -ne "a\na\nb" > $ROOT/t4.txt
echo -ne "b\na\na\n" > $ROOT/t5.txt
echo -ne "a\nb\nc\n" > $ROOT/t6.txt
for FILE in $ROOT/*.txt; do BASENAME=$(basename "$FILE") uniq $FILE > ${OUT_DIR}/${BASENAME}.out uniq -c $FILE > ${OUT_DIR}/${BASENAME}.c.out uniq < $FILE > ${OUT_DIR}/${BASENAME}.stdin.out uniq -c < $FILE > ${OUT_DIR}/${BASENAME}.stdin.c.out done
Two lines each ending with a newline
No trailing newline on last line
Two different lines, no trailing newline
Two lines the same; last is different with no trailing newline
Two different values with newlines on each
Three different values with newlines on each
To demonstrate uniq
, note that it will print nothing when given an empty file:
$ uniq tests/inputs/empty.txt
Given a file with just one line, the one line will be printed:
$ uniq tests/inputs/one.txt a
It will also print the number of times a line occurs before the line when run with the
-c
option.
The count is right-justified in a field four characters wide and is followed by a single space and then the line of text:
$ uniq -c tests/inputs/one.txt 1 a
The file tests/inputs/two.txt contains two duplicate lines:
$ cat tests/inputs/two.txt a a
Given this input, uniq
will emit one line:
$ uniq tests/inputs/two.txt a
With the -c
option, uniq
will also include the count of unique lines:
$ uniq -c tests/inputs/two.txt 2 a
A longer input file shows that uniq
only considers the lines in order and not globally.
For example, the value a appears four times in this input file:
$ cat tests/inputs/three.txt a a b b a c c c a d d d d
When counting, uniq
starts over at 1 each time it sees a new string.
Since a occurs in three different places in the input file, it will also appear three times in the output:
$ uniq -c tests/inputs/three.txt 2 a 2 b 1 a 3 c 1 a 4 d
If you want the actual unique values, you must first sort the input, which can be done with the aptly named sort
command.
In the following output, you’ll finally see that a occurs a total of four times in the input file:
$ sort tests/inputs/three.txt | uniq -c 4 a 2 b 3 c 4 d
The file tests/inputs/skip.txt contains a blank line:
$ cat tests/inputs/skip.txt a a b
The blank line acts just like any other value, and so it will reset the counter:
$ uniq -c tests/inputs/skip.txt 1 a 1 1 a 1 b
If you study the Synopsis of the usage closely, you’ll see a very subtle indication of how to write the output to a file.
Notice how input_file
and output_file
in the following are grouped inside square brackets to indicate that they are optional as a pair.
That is, if you provide input_file
, you may also optionally provide output_file
:
uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]]
For example, I can count tests/inputs/two.txt and place the output into out:
$ uniq -c tests/inputs/two.txt out $ cat out 2 a
With no positional arguments, uniq
will read from STDIN
by default:
$ cat tests/inputs/two.txt | uniq -c 2 a
If you want to read from STDIN
and indicate the output filename, you must use a dash (-
) for the input filename:
$ cat tests/inputs/two.txt | uniq -c - out $ cat out 2 a
The GNU version works basically the same while also providing many more options:
$ uniq --help Usage: uniq [OPTION]... [INPUT [OUTPUT]] Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output). With no options, matching lines are merged to the first occurrence. Mandatory arguments to long options are mandatory for short options too. -c, --count prefix lines by the number of occurrences -d, --repeated only print duplicate lines, one for each group -D, --all-repeated[=METHOD] print all duplicate lines groups can be delimited with an empty line METHOD={none(default),prepend,separate} -f, --skip-fields=N avoid comparing the first N fields --group[=METHOD] show all items, separating groups with an empty line METHOD={separate(default),prepend,append,both} -i, --ignore-case ignore differences in case when comparing -s, --skip-chars=N avoid comparing the first N characters -u, --unique only print unique lines -z, --zero-terminated end lines with 0 byte, not newline -w, --check-chars=N compare no more than N characters in lines --help display this help and exit --version output version information and exit A field is a run of blanks (usually spaces and/or TABs), then nonblank characters. Fields are skipped before chars. Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'.
As you can see, both the BSD and GNU versions have many more options, but this is as much as the challenge program is expected to implement.
This chapter’s challenge program should be called uniqr
(pronounced you-neek-er) for a Rust version of uniq
.
Start by running cargo new uniqr
, then modify your Cargo.toml to add the following dependencies:
[dependencies]
clap
=
"2.33"
[dev-dependencies]
assert_cmd
=
"2"
predicates
=
"2"
tempfile
=
"3"
rand
=
"0.8"
The tests will create temporary files using the tempfile
crate.
Copy the book’s 06_uniqr/tests directory into your project, and then run cargo test
to ensure that the program compiles and the tests run and fail.
Update your src/main.rs to the following:
fn
main
()
{
if
let
Err
(
e
)
=
uniqr
::get_args
().
and_then
(
uniqr
::run
)
{
eprintln
!
(
"{}"
,
e
);
std
::process
::exit
(
1
);
}
}
I suggest you start src/lib.rs with the following:
use
clap
:
:
{
App
,
Arg
}
;
use
std
::
error
::
Error
;
type
MyResult
<
T
>
=
Result
<
T
,
Box
<
dyn
Error
>
>
;
#[
derive(Debug)
]
pub
struct
Config
{
in_file
:
String
,
out_file
:
Option
<
String
>
,
count
:
bool
,
}
This is the input filename to read, which may be STDIN
if the filename is a dash.
The output will be written either to an optional output filename or STDOUT
.
count
is a Boolean for whether or not to print the counts of each line.
Here is an outline for get_args
:
pub
fn
get_args
()
->
MyResult
<
Config
>
{
let
matches
=
App
::new
(
"uniqr"
)
.
version
(
"0.1.0"
)
.
author
(
"Ken Youens-Clark <kyclark@gmail.com>"
)
.
about
(
"Rust uniq"
)
// What goes here?
.
get_matches
();
Ok
(
Config
{
in_file
:...
out_file
:...
count
:...
})
}
I suggest you start your run
by printing the config
:
pub
fn
run
(
config
:Config
)
->
MyResult
<
()
>
{
println
!
(
"{:?}"
,
config
);
Ok
(())
}
Your program should be able to produce the following usage:
$ cargo run -- -h uniqr 0.1.0 Ken Youens-Clark <kyclark@gmail.com> Rust uniq USAGE: uniqr [FLAGS] [ARGS] FLAGS: -c, --count Show counts-h, --help Prints help information -V, --version Prints version information ARGS: <IN_FILE> Input file [default: -]
<OUT_FILE> Output file
The -c|--count
flag is optional.
The input file is the first positional argument and defaults to a dash (-
).
The output file is the second positional argument and is optional.
By default the program will read from STDIN
, which can be represented using a dash:
$ cargo run Config { in_file: "-", out_file: None, count: false }
The first positional argument should be interpreted as the input file and the second positional argument as the output file.1
Note that clap
can handle options either before or after positional arguments:
$ cargo run -- tests/inputs/one.txt out --count Config { in_file: "tests/inputs/one.txt", out_file: Some("out"), count: true }
Take a moment to finish get_args
before reading further.
I assume you are an upright and moral person who figured out the preceding function on your own, so I will now share my solution:
pub
fn
get_args
(
)
->
MyResult
<
Config
>
{
let
matches
=
App
::
new
(
"
uniq
"
)
.
version
(
"
0.1.0
"
)
.
author
(
"
Ken Youens-Clark <kyclark@gmail.com>
"
)
.
about
(
"
Rust uniq
"
)
.
arg
(
Arg
::
with_name
(
"
in_file
"
)
.
value_name
(
"
IN_FILE
"
)
.
help
(
"
Input file
"
)
.
default_value
(
"
-
"
)
,
)
.
arg
(
Arg
::
with_name
(
"
out_file
"
)
.
value_name
(
"
OUT_FILE
"
)
.
help
(
"
Output file
"
)
,
)
.
arg
(
Arg
::
with_name
(
"
count
"
)
.
short
(
"
c
"
)
.
help
(
"
Show counts
"
)
.
long
(
"
count
"
)
.
takes_value
(
false
)
,
)
.
get_matches
(
)
;
Ok
(
Config
{
in_file
:
matches
.
value_of_lossy
(
"
in_file
"
)
.
unwrap
(
)
.
to_string
(
)
,
out_file
:
matches
.
value_of
(
"
out_file
"
)
.
map
(
String
::
from
)
,
count
:
matches
.
is_present
(
"
count
"
)
,
}
)
}
Convert the in_file
argument to a String
.
Convert the out_file
argument to an Option<String>
.
The count
is either present or not, so convert this to a bool
.
Because the in_file
argument has a default value, it is safe to call Option::unwrap
and convert the value to a String
.
There are several other ways to get the same result, none of which is necessarily superior.
You could use Option::map
to feed the value to String::from
and then unwrap it:
in_file
:matches
.
value_of_lossy
(
"in_file"
).
map
(
String
::from
).
unwrap
(),
You could also use a closure that calls Into::into
to convert the value into a String
because Rust can infer the type:
in_file
:matches
.
value_of_lossy
(
"in_file"
).
map
(
|
v
|
v
.
into
()).
unwrap
(),
The preceding can also be expressed using the Into::into
function directly because functions are first-class values that can be passed as arguments:
in_file
:matches
.
value_of_lossy
(
"in_file"
).
map
(
Into
::into
).
unwrap
(),
The out_file
is optional, but if there is an option, you can use Option::map
to convert a Some
value to a String
:
out_file
:matches
.
value_of
(
"out_file"
).
map
(
|
v
|
v
.
to_string
()),
The test suite in tests/cli.rs is fairly large, containing 78 tests that check the program under the following conditions:
Input file as the only positional argument, check STDOUT
Input file as a positional argument with --count
option, check STDOUT
Input from STDIN
with no positional arguments, check STDOUT
Input from STDIN
with --count
and no positional arguments, check STDOUT
Input and output files as positional arguments, check output file
Input and output files as positional arguments with --count
, check output file
Input from STDIN
and output files as positional arguments with --count
, check output file
Given how large and complicated the tests became, you may be interested to see how I structured tests/cli.rs, which starts with the following:
use
assert_cmd
::
Command
;
use
predicates
::
prelude
:
:
*
;
use
rand
:
:
{
distributions
::
Alphanumeric
,
Rng
}
;
use
std
::
fs
;
use
tempfile
::
NamedTempFile
;
type
TestResult
=
Result
<
(
)
,
Box
<
dyn
std
::
error
::
Error
>
>
;
struct
Test
{
input
:
&
'static
str
,
out
:
&
'static
str
,
out_count
:
&
'static
str
,
}
This is used to create temporary output files.
A struct to define the input files and expected output values with and without the counts.
Note the use of 'static
to denote the lifetime of the values.
I want to define structs with &str
values, and the Rust compiler would like to know exactly how long the values are expected to stick around relative to one another.
The 'static
annotation shows that this data will live for the entire lifetime of the program.
If you remove it and run the tests, you’ll see similar errors from the compiler, as shown in the previous section, along with a suggestion of how to fix it:
error[E0106]: missing lifetime specifier --> tests/cli.rs:8:12 | 8 | input: &str, | ^ expected named lifetime parameter | help: consider introducing a named lifetime parameter | 7 | struct Test<'a> { 8 | input: &'a str,
Next, I define some constant values I need for testing:
const
PRG
:
&
str
=
"
uniqr
"
;
const
EMPTY
:
Test
=
Test
{
input
:
"
tests/inputs/empty.txt
"
,
out
:
"
tests/inputs/empty.txt.out
"
,
out_count
:
"
tests/inputs/empty.txt.c.out
"
,
}
;
The name of the program being tested
The location of the input file for this test
The location of the output file without the counts
The location of the output file with the counts
After the declaration of EMPTY
, there are many more Test
structures followed by several helper functions.
The run
function will use Test.input
as an input file and will compare STDOUT
to the contents of the Test.out
file:
fn
run
(
test
:
&
Test
)
->
TestResult
{
let
expected
=
fs
::
read_to_string
(
test
.
out
)
?
;
Command
::
cargo_bin
(
PRG
)
?
.
arg
(
test
.
input
)
.
assert
(
)
.
success
(
)
.
stdout
(
expected
)
;
Ok
(
(
)
)
}
The function accepts a Test
and returns a TestResult
.
Try to read the expected output file.
Try to run the program with the input file as an argument, verify it ran successfully, and compare STDOUT
to the expected value.
The run_count
helper function works very similarly, but this time it tests for the counting:
fn
run_count
(
test
:
&
Test
)
->
TestResult
{
let
expected
=
fs
::
read_to_string
(
test
.
out_count
)
?
;
Command
::
cargo_bin
(
PRG
)
?
.
args
(
&
[
test
.
input
,
"
-c
"
]
)
.
assert
(
)
.
success
(
)
.
stdout
(
expected
)
;
Ok
(
(
)
)
}
Read the Test.out_count
file for the expected output.
Pass both the Test.input
value and the flag -c
to count the lines.
The run_stdin
function will supply the input to the program through STDIN
:
fn
run_stdin
(
test
:
&
Test
)
->
TestResult
{
let
input
=
fs
::
read_to_string
(
test
.
input
)
?
;
let
expected
=
fs
::
read_to_string
(
test
.
out
)
?
;
Command
::
cargo_bin
(
PRG
)
?
.
write_stdin
(
input
)
.
assert
(
)
.
success
(
)
.
stdout
(
expected
)
;
Ok
(
(
)
)
}
Try to read the Test.input
file.
Try to read the Test.out
file.
Pass the input
through STDIN
and verify that STDOUT
is the expected value.
The run_stdin_count
function tests both reading from STDIN
and counting the lines:
fn
run_stdin_count
(
test
:
&
Test
)
->
TestResult
{
let
input
=
fs
::
read_to_string
(
test
.
input
)
?
;
let
expected
=
fs
::
read_to_string
(
test
.
out_count
)
?
;
Command
::
cargo_bin
(
PRG
)
?
.
arg
(
"
--count
"
)
.
write_stdin
(
input
)
.
assert
(
)
.
success
(
)
.
stdout
(
expected
)
;
Ok
(
(
)
)
}
Run the program with the long --count
flag, feed the input to STDIN
, and verify that STDOUT
is correct.
The run_outfile
function checks that the program accepts both the input and output files as positional arguments.
This is somewhat more interesting as I needed to use temporary files in the testing because, as you have seen repeatedly, Rust will run the tests in parallel.
If I were to use the same dummy filename like blargh to write all the output files, the tests would overwrite one another’s output.
To get around this, I use the tempfile::NamedTempFile
to get a dynamically generated temporary filename that will automatically be removed when I finish:
fn
run_outfile
(
test
:
&
Test
)
->
TestResult
{
let
expected
=
fs
::
read_to_string
(
test
.
out
)
?
;
let
outfile
=
NamedTempFile
::
new
(
)
?
;
let
outpath
=
&
outfile
.
path
(
)
.
to_str
(
)
.
unwrap
(
)
;
Command
::
cargo_bin
(
PRG
)
?
.
args
(
&
[
test
.
input
,
outpath
]
)
.
assert
(
)
.
success
(
)
.
stdout
(
"
"
)
;
let
contents
=
fs
::
read_to_string
(
&
outpath
)
?
;
assert_eq
!
(
&
expected
,
&
contents
)
;
Ok
(
(
)
)
}
Try to get a named temporary file.
Get the path
to the file.
Run the program with the input and output filenames as arguments, then verify there is nothing in STDOUT
.
Try to read the output file.
Check that the contents of the output file match the expected value.
The next two functions are variations on what I’ve already shown, adding in the
--count
flag and finally asking the program to read from STDIN
when the input filename is a dash.
The rest of the module calls these helpers using the various structs to run all the tests.
I would suggest you start in src/lib.rs by reading the input file, so it makes sense to use the open
function from previous chapters:
fn
open
(
filename
:&
str
)
->
MyResult
<
Box
<
dyn
BufRead
>>
{
match
filename
{
"-"
=>
Ok
(
Box
::new
(
BufReader
::new
(
io
::stdin
()))),
_
=>
Ok
(
Box
::new
(
BufReader
::new
(
File
::open
(
filename
)
?
))),
}
}
Be sure you expand your imports to include the following:
use
clap
:
:
{
App
,
Arg
}
;
use
std
:
:
{
error
::
Error
,
fs
::
File
,
io
:
:
{
self
,
BufRead
,
BufReader
}
,
}
;
You can borrow quite a bit of code from Chapter 3 that reads lines of text from an input file or STDIN
while preserving the line endings:
pub
fn
run
(
config
:
Config
)
->
MyResult
<
(
)
>
{
let
mut
file
=
open
(
&
config
.
in_file
)
.
map_err
(
|
e
|
format
!
(
"
{}: {}
"
,
config
.
in_file
,
e
)
)
?
;
let
mut
line
=
String
::
new
(
)
;
loop
{
let
bytes
=
file
.
read_line
(
&
mut
line
)
?
;
if
bytes
=
=
0
{
break
;
}
!
(
"
{}
"
,
line
)
;
line
.
clear
(
)
;
}
Ok
(
(
)
)
}
Either read STDIN
if the input file is a dash or open the given filename. Create an informative error message when this fails.
Create a new, empty mutable String
buffer to hold each line.
Create an infinite loop.
Read a line of text while preserving the line endings.
If no bytes were read, break out of the loop.
Print the line buffer.
Clear the line buffer.
Run your program with an input file to ensure it works:
$ cargo run -- tests/inputs/one.txt a
It should also work for reading STDIN
:
$ cargo run -- - < tests/inputs/one.txt a
Next, make your program iterate the lines of input and count each unique run of lines, then print the lines with and without the counts.
Once you are able to create the correct output, you will need to handle printing it either to STDOUT
or a given filename.
I suggest that you copy ideas from the open
function and use File::create
.
I’ll step you through how I arrived at a solution. Your version may be different, but it’s fine as long as it passes the test suite. I decided to create two additional mutable variables to hold the previous line of text and the running count. For now, I will always print the count to make sure it’s working correctly:
pub
fn
run
(
config
:
Config
)
->
MyResult
<
(
)
>
{
let
mut
file
=
open
(
&
config
.
in_file
)
.
map_err
(
|
e
|
format
!
(
"
{}: {}
"
,
config
.
in_file
,
e
)
)
?
;
let
mut
line
=
String
::
new
(
)
;
let
mut
previous
=
String
::
new
(
)
;
let
mut
count
:
u64
=
0
;
loop
{
let
bytes
=
file
.
read_line
(
&
mut
line
)
?
;
if
bytes
=
=
0
{
break
;
}
if
line
.
trim_end
(
)
!
=
previous
.
trim_end
(
)
{
if
count
>
0
{
!
(
"
{:>4} {}
"
,
count
,
previous
)
;
}
previous
=
line
.
clone
(
)
;
count
=
0
;
}
count
+
=
1
;
line
.
clear
(
)
;
}
if
count
>
0
{
!
(
"
{:>4} {}
"
,
count
,
previous
)
;
}
Ok
(
(
)
)
}
Create a mutable variable to hold the previous line of text.
Create a mutable variable to hold the count.
Compare the current line to the previous line, both trimmed of any possible trailing whitespace.
Print the output only when count
is greater than 0
.
Print the count
right-justified in a column four characters wide followed by a space and the previous
value.
Set the previous
variable to a copy of the current line
.
Reset the counter to 0.
Increment the counter by 1.
Handle the last line of the file.
I didn’t have to indicate the type u64
for the count
variable. Rust will happily infer a type. On a 32-bit system, Rust would use an i32
, which would limit the maximum number of duplicates to i32::MAX
, or 2,147,483,647. That’s a big number that’s likely to be adequate, but I think it’s better to have the program work consistently by specifying u64
.
If I run cargo test
, this will pass a fair number of tests.
This code is clunky, though.
I don’t like having to check if count > 0
twice, as it violates the don’t repeat yourself (DRY) principle, where you isolate a common idea into a single abstraction like a function rather than copying and pasting the same lines of code throughout a program.
Also, my code always prints the count, but it should print the count only when config.count
is true
.
I can put all of this logic into a function, and I will specifically use a closure to close around the config.count
value:
let
=
|
count
:
u64
,
text
:
&
str
|
{
if
count
>
0
{
if
config
.
count
{
!
(
"
{:>4} {}
"
,
count
,
text
)
;
}
else
{
!
(
"
{}
"
,
text
)
;
}
}
;
}
;
The print
closure will accept count
and text
values.
Print only if count
is greater than 0
.
Check if the config.count
value is true
.
Use the print!
macro to print the count
and text
to STDOUT
.
Otherwise, print the text
to STDOUT
.
I can update the rest of the function to use this closure:
loop
{
let
bytes
=
file
.
read_line
(
&
mut
line
)
?
;
if
bytes
=
=
0
{
break
;
}
if
line
.
trim_end
(
)
!
=
previous
.
trim_end
(
)
{
(
count
,
&
previous
)
;
previous
=
line
.
clone
(
)
;
count
=
0
;
}
count
+
=
1
;
line
.
clear
(
)
;
}
(
count
,
&
previous
)
;
At this point, the program will pass several more tests.
All the failed test names have the string outfile because the program fails to write a named output file.
To add this last feature, you can open the output file in the same way as the input file, either by creating a named output file using File::create
or by using std::io::stdout
.
Be sure to add use std::io::Write
for the following code, which you can place just after the file
variable:
let
mut
out_file
:
Box
<
dyn
Write
>
=
match
&
config
.
out_file
{
Some
(
out_name
)
=
>
Box
::
new
(
File
::
create
(
out_name
)
?
)
,
_
=
>
Box
::
new
(
io
::
stdout
(
)
)
,
}
;
The mutable out_file
will be a boxed value that implements the std::io::Write
trait.
When config.out_file
is Some
filename, use File::create
to try to create the file.
Otherwise, use std::io::stdout
.
If you look at the documentation for File::create
and io::stdout
, you’ll see both have a “Traits” section showing the various traits they implement.
Both show that they implement Write
, so they satisfy the type requirement Box<dyn Write>
, which says that the value inside the Box
must implement this trait.
The second change I need to make is to use out_file
for the output.
I will replace the print!
macro with write!
to write the output to a stream like a filehandle or STDOUT
.
The first argument to write!
must be a mutable value that implements the Write
trait.
The documentation shows that write!
will return a std::io::Result
because it might fail.
As such, I changed my print
closure to return MyResult
.
Here is the final version of my run
function that passes all the tests:
pub
fn
run
(
config
:
Config
)
->
MyResult
<
(
)
>
{
let
mut
file
=
open
(
&
config
.
in_file
)
.
map_err
(
|
e
|
format
!
(
"
{}: {}
"
,
config
.
in_file
,
e
)
)
?
;
let
mut
out_file
:
Box
<
dyn
Write
>
=
match
&
config
.
out_file
{
Some
(
out_name
)
=
>
Box
::
new
(
File
::
create
(
out_name
)
?
)
,
_
=
>
Box
::
new
(
io
::
stdout
(
)
)
,
}
;
let
mut
=
|
count
:
u64
,
text
:
&
str
|
->
MyResult
<
(
)
>
{
if
count
>
0
{
if
config
.
count
{
write
!
(
out_file
,
"
{:>4} {}
"
,
count
,
text
)
?
;
}
else
{
write
!
(
out_file
,
"
{}
"
,
text
)
?
;
}
}
;
Ok
(
(
)
)
}
;
let
mut
line
=
String
::
new
(
)
;
let
mut
previous
=
String
::
new
(
)
;
let
mut
count
:
u64
=
0
;
loop
{
let
bytes
=
file
.
read_line
(
&
mut
line
)
?
;
if
bytes
=
=
0
{
break
;
}
if
line
.
trim_end
(
)
!
=
previous
.
trim_end
(
)
{
(
count
,
&
previous
)
?
;
previous
=
line
.
clone
(
)
;
count
=
0
;
}
count
+
=
1
;
line
.
clear
(
)
;
}
(
count
,
&
previous
)
?
;
Ok
(
(
)
)
}
Open either STDIN
or the given input filename.
Open either STDOUT
or the given output filename.
Create a mutable print
closure to format the output.
Use the print
closure to possibly print output. Use ?
to propagate potential errors.
Handle the last line of the file.
Note that the print
closure must be declared with the mut
keyword to make it mutable because the out_file
filehandle is borrowed.
Without this, the compiler will show the following error:
error[E0596]: cannot borrow `print` as mutable, as it is not declared as mutable --> src/lib.rs:84:13 | 63 | let print = |count: u64, text: &str| -> MyResult<()> { | ----- help: consider changing this to be mutable: `mut print` ... 66 | write!(out_file, "{:>4} {}", count, text)?; | -------- calling `print` requires mutable binding | due to mutable borrow of `out_file`
Again, it’s okay if your solution is different from mine, as long as it passes the tests. Part of what I like about writing with tests is that there is an objective determination of when a program meets some level of specifications. As Louis Srygley once said, “Without requirements or design, programming is the art of adding bugs to an empty text file.”2 I would say that tests are the requirements made incarnate. Without tests, you simply have no way to know when a change to your program strays from the requirements or breaks the design.
Can you find other ways to write this algorithm?
For instance, I tried another method that read all the lines of the input file into a vector and used Vec::windows
to look at pairs of lines.
This was interesting but could fail if the size of the input file exceeded the available memory on my machine.
The solution presented here will only ever allocate memory for the current and previous lines and so should scale to any size file.
As usual, the BSD and GNU versions of uniq
both have many more features than I chose to include in the challenge.
I would encourage you to add all the features you would like to have in your version.
Be sure to add tests for each feature, and always run the entire test suite to verify that all previous features still work.
In my mind, uniq
is closely tied with sort
, as I often use them together.
Consider implementing your own version of sort
, at least to the point of sorting values lexicographically (in dictionary order) or numerically.
In about 100 lines of Rust, the uniqr
program manages to replicate a reasonable subset of features from the original uniq
program.
Compare this to the GNU C source code, which has more than 600 lines of code.
I would feel much more confident extending uniqr
than I would using C due to the Rust compiler’s use of types and useful error messages.
Let’s review some of the things you learned in this chapter:
You can now open a new file for writing or print to STDOUT
.
DRY says that any duplicated code should be moved into a single abstraction like a function or a closure.
A closure must be used to capture values from the enclosing scope.
When a value implements the Write
trait, it can be used with the write!
and writeln!
macros.
The tempfile
crate helps you create and remove temporary files.
The Rust compiler may sometimes require you to indicate the lifetime of a variable, which is how long it lives in relation to other variables.
In the next chapter, I’ll introduce Rust’s enumerated enum
type and how to use regular expressions.
1 While the goal is to mimic the original versions as much as possible, I would note that I do not like optional positional parameters. In my opinion, it would be better to have an -o|--output
option that defaults to STDOUT
and have only one optional positional argument for the input file that defaults to STDIN
.
2 Programming Wisdom (@CodeWisdom), “‘Without requirements or design, programming is the art of adding bugs to an empty text file.’ - Louis Srygley,” Twitter, January 24, 2018, 1:00 p.m., https://oreil.ly/FC6aS.