Adventures with Bash

Posted on by Matthias Noback

A bit of reminiscing

When I was a kid, MS Windows was still a program called WIN.COM which you needed to start from the MS-DOS command prompt. You could also add it to AUTOEXEC.BAT which was a so-called batch file. You could write these .BAT files yourself. They were basically just command-line scripts. You could make them execute commands, print things, collect input, and make simple decisions. It wasn't much, and I remember that you often needed some helper .COM or .EXE programs to accomplish anything useful. The most advanced thing I ever wrote was a nice little ASCII-art menu program, spread across multiple .BAT files (with GOTOs and all), which allowed me to easily start my favorite games, like Super Tetris, known to me as SUPERTET.EXE, or Prince of Persia.

From .BAT to .php to .sh

Several years later I learned a bit of PHP and immediately felt at home. PHP was a scripting language back then. Of course it still is, but it doesn't feel like one anymore. It shared (and still shares) some basic characteristics with scripting languages like the MS-DOS batch programming "language".

Several more years later, working on a Mac, I encountered something called "shell scripts"—files with an .sh extension that you can run, if you have the right permissions. These scripts often start with #!/usr/bin/env bash, known as shebang, telling the shell which interpreter should be used to run this script. By the way, if you run this env program without an argument, you get a list of all the environment variables that are available to you. This can be quite useful.

Learning Bash

Until recently I haven't felt the need to learn more about the Bash programming language. When I started researching Docker though, I encountered a lot of examples written in Bash. Most of those examples I didn't understand completely. Bash has a lot of crazy syntax, and people don't often put very informative comments in their scripts. I hate it when I don't understand what's going on in a piece of code that I use in a project, so one day I decided to dive into Bash and learn enough about it to let myself get away with what I don't know yet. I started reading the Bash Academy guide, but unfortunately it's an unfinished project. Next up was "Pro Bash Programming : Scripting the GNU/Linux Shell, Second Edition", by Jayant Varma and Chris F. A. Johnson. A very interesting book, which I never finished, but keep open as a reference. In terms of reference material, I sometimes find Google a useful source (which often leads me to Stack Overflow). This to me demonstrates that I don't really know what I'm doing, as I simply try out several of the answers I find. A better reference book is "Bash Pocket Reference, 2nd Edition" by Arnold Robbins.

About Bash

Bash is everywhere. It's pre-installed on Linux and Mac OS X, and since last year it's even possible to use Bash on Windows. Without a need to compile your code, this means that you can run your Bash script on many machines already. Two potential problems though:

  • There are differences between Bash versions (this isn't any different from code written in any other programming language by the way).
  • The power of a Bash script usually lies in the programs it runs. Not every runtime environment comes with the same programs installed (like git, mktemp, read, etc.).

Both of these potentially problematic situations could make your script fail, or—maybe worse—behave in subtly different ways. With simple scripts it's certainly possible to navigate safely around these problems, but in most cases I recommend running a Bash script inside a known-stable environment, with pinned dependency versions, for example inside a carefully prepared Docker container.

Bash script characteristics

As a programming language, Bash isn't a strictly typed language. With regard to the types of values, this most often results in pretty sloppy programming. Apparently, that's how it's supposed to be, but it might make you feel a bit insecure from time to time.

Another reason to feel insecure is the fact that functions have no predefined parameters. In fact, running a program, running a built-in command, or calling a function all have the same syntax, allowing a variable number of arguments (and options, if applicable). It's up to the function or program to verify that any required argument has been provided.

Just like programs produce output and exit codes, functions can have a numeric return value and optionally print something to stdout or stderr. This is a very different approach to functions than most programmers might expect, but it actually makes a lot of sense in the environment in which these scripts run.

Many Bash functions will have side-effects, like changing the current working directory, creating directories, copying files, exit-ing the process, etc. When designing these functions, I often feel like I'm doing something unnatural, sometimes even dirty.

A big reason for "feeling dirty" is that besides function arguments, functions in-the-wild often use environment variables to base their decisions on. "Environment variables" is another word for "global variables", which I've vowed to never use again in my programs. Still, I find myself writing Bash code like this:

export GIT_CLONE_URL="$(git remote get-url origin)"
export COMMIT_HASH="$(git rev-parse --short --verify HEAD)"

#...

PROJECT_DIR=$(pwd)

function fresh_checkout() {
    cd "$PROJECT_DIR"
    mkdir -p "$PROJECT_DIR/build"
    BUILD_DIR=$(mktemp -d "$PROJECT_DIR/build/$COMMIT_HASH-XXXXXXX")
    git clone "$GIT_CLONE_URL" "$BUILD_DIR"
    cd "$BUILD_DIR"
    git checkout "$COMMIT_HASH"
}

function clean_up() {
    rm -rf "$BUILD_DIR" || true
}

fresh_checkout()

This does the trick, but it isn't particularly well-designed code.

Determined to improve this awful situation, I set out to refactor the clean_up() function. Along the way I learned quite a lot of interesting things about Bash programming, which I'll explain to you in my next post.

Bash Bash