Bash practices - Part 2: CQS and return values

Posted on by Matthias Noback

As I promised in my previous "Bash practices" post, I would discuss a query function in this article. Here you have it:

function create_temporary_directory() {
    directory=$(mktemp -d "$1/XXXXX")
}

create_temporary_directory

echo "$directory"

That's a bad query function!

This function is supposed to return the path of a temporary directory. It accepts one argument: an existing directory we want to create the temporary directory in. It uses the mktemp utility to do the "heavy" lifting. It accepts as an argument a kind of a template for directory names (XXXXX will be replaced by 5 random characters). After creating the directory, mktemp echos the full path of the directory to stdout, meaning we can copy it into a variable by using the $(...) syntax.

Two major issues with this function:

  1. This isn't a query function after all. It has observable side-effects (namely: a directory will be created). Since we don't know in advance the name of the directory that will be created, we need the return value, but that doesn't make it a query function.
  2. Equally bad is the fact that we don't actually return a value from the function. It just populates a global variable called 'directory'.

Splitting the query function

To fix the code design and at least make it adhere to the Command/query separation principle, we need to split the function into two new functions:

  1. A function that generates only the name for the temporary directory.
  2. A function that creates the directory.

This is okay, just not optimal: the function that generates the name for the directory will be a query function, as it won't change observable state, but it won't be referentially transparent: the result will be different (almost) every time we call it.

Generating a random string can be done in all sorts of ways. For simplicity's sake, let's use the built-in Bash function $RANDOM (which looks like a variable though). It yields another random number every time we use it. Just because we can, we convert the decimal number we get from $RANDOM into a hexadecimal number (also because it looks nicer, with some letters in it). Using the built-in printf function we can capture the result in a variable, called RANDOM_NAME. Once the function has been called, this variable will be globally available:

function random_name() {
    printf -v RANDOM_NAME "%x" $RANDOM
}

In the create_temporary_directory function we first call random_name, after which we can use the RANDOM_NAME variable to as the last part of the directory path. Note that we don't use mktemp anymore, since the CQS violation originated from this exact command:

function create_temporary_directory() {
    declare -r base_dir="$1"

    random_name

    mkdir -p "$base_dir/$RANDOM_NAME"
}

By the way, just like we did in the previous article, you might want to verify that $1 is a non-empty string and possibly that it's a directory (even though mkdir with the -p option will try to generate it for you if it doesn't exist yet).

HOWTO: Return values

The problem with relying on global variables like RANDOM_NAME is that it introduces some strange temporal coupling in your code base and in the long run this will become an unmaintainable mess. As you probably already know anyway. So we need a better way to return values.

Bash does have a return statement, but you can only use it to return integers (a function's return value is the equivalence of a process's exit code).

This leaves us with two other options. First, we can make random_name print the random name. Then create_temporary_directory can catch it and put it in a variable:

function random_name() {
    printf "%x" $RANDOM
}

function create_temporary_directory() {
    declare -r base_dir="$1"
    declare -r name=$(random_name)

    mkdir -p "$base_dir/$name"
}

In my opinion, this is the most elegant solution, as it mimics return values as you may be used to in most other programming languages. However, using the $(...) will actually create a new process, so it is a relatively slow operation.

A faster alternative would be to instruct random_name to put the return value in a variable with a name that's selected by the calling function, in this case create_temporary_directory. To do so, we use declare again, with the -n option. Then the caller needs to provide an extra argument when calling random_name: the name of the variable where random_name should store its return value:

function random_name() {
    declare -n _return_value="$1"

    printf -v _return_value "%x" $RANDOM
}

function create_temporary_directory() {
    declare -r base_dir="$1"
    declare name

    random_name "name"

    mkdir -p "$base_dir/$name"
}

This brings us a nice bit of encapsulation too, since both name and _return_value won't be available in other scopes than the functions in which they were declared.

Quoting variables

Just a quick note on the need to quote variables. Say you write mkdir -p $base_dir, without the quotes. Everything will be fine, unless the base_dir variable contains a string with a space character in it:

base_dir="build/some build/test"
mkdir -p $base_dir

Bash splits the string and interprets the mkdir command as follows:

mkdir -p "build/some" "build/test"

And this creates both build/some and build/test directories, not what we might have expected. To make sure that Bash considers base_dir as one argument, simply quote it.

Conclusion

I hope you've learned a few things about Bash and Bash functions. when I'm a bit further down the road I'll make sure to share more findings here. Also:

Bash Bash
Comments
This website uses MailComments: you can send your comments to this post by email. Read more about MailComments, including suggestions for writing your comments (in HTML or Markdown).
Bruce Weirdan

And how would you *use* those two functions? create* does not return the dir name, and random_name returns different name on each call. So caller doesn't have a way to actually get the dir name created, so they cannot use it.

That's probably bad example to carry your point across. Unfortunately you've took pretty reasonable (apart from the global variable part, and missing trap) *resource acquisition* function, and made it bad resource acquisition function, vulnerable to race conditions. And, as you noted yourself, you didn't get referential transparency either.

mktemp is there for a reason, you cannot replicate it that easily. It guaranties that the directory was created (and not merely existed before), thus preventing race conditions. It also sets permissions appropriate for a temporary dir/file.

Matthias Noback

Thanks for some interesting insights. I was hoping to explain some things I learned about Bash programming, but I totally understand that I overlooked some big issues here. Besides from it just being a plain mistake that create_temporary_directory shouldn't ask for a random name itself.