Adventures with Bash

Posted on by Matthias Noback

A bit of reminiscing

When I was a kid, MS Windows was still a program called WIN.COM which you needed to start from the MS-DOS command prompt. You could also add it to AUTOEXEC.BAT which was a so-called batch file. You could write these .BAT files yourself. They were basically just command-line scripts. You could make them execute commands, print things, collect input, and make simple decisions. It wasn't much, and I remember that you often needed some helper .COM or .EXE programs to accomplish anything useful. The most advanced thing I ever wrote was a nice little ASCII-art menu program, spread across multiple .BAT files (with GOTOs and all), which allowed me to easily start my favorite games, like Super Tetris, known to me as SUPERTET.EXE, or Prince of Persia.

From .BAT to .php to .sh

Several years later I learned a bit of PHP and immediately felt at home. PHP was a scripting language back then. Of course it still is, but it doesn't feel like one anymore. It shared (and still shares) some basic characteristics with scripting languages like the MS-DOS batch programming "language".

Several more years later, working on a Mac, I encountered something called "shell scripts"—files with an .sh extension that you can run, if you have the right permissions. These scripts often start with #!/usr/bin/env bash, known as shebang, telling the shell which interpreter should be used to run this script. By the way, if you run this env program without an argument, you get a list of all the environment variables that are available to you. This can be quite useful.

Learning Bash

Until recently I haven't felt the need to learn more about the Bash programming language. When I started researching Docker though, I encountered a lot of examples written in Bash. Most of those examples I didn't understand completely. Bash has a lot of crazy syntax, and people don't often put very informative comments in their scripts. I hate it when I don't understand what's going on in a piece of code that I use in a project, so one day I decided to dive into Bash and learn enough about it to let myself get away with what I don't know yet. I started reading the Bash Academy guide, but unfortunately it's an unfinished project. Next up was "Pro Bash Programming : Scripting the GNU/Linux Shell, Second Edition", by Jayant Varma and Chris F. A. Johnson. A very interesting book, which I never finished, but keep open as a reference. In terms of reference material, I sometimes find Google a useful source (which often leads me to Stack Overflow). This to me demonstrates that I don't really know what I'm doing, as I simply try out several of the answers I find. A better reference book is "Bash Pocket Reference, 2nd Edition" by Arnold Robbins.

About Bash

Bash is everywhere. It's pre-installed on Linux and Mac OS X, and since last year it's even possible to use Bash on Windows. Without a need to compile your code, this means that you can run your Bash script on many machines already. Two potential problems though:

  • There are differences between Bash versions (this isn't any different from code written in any other programming language by the way).
  • The power of a Bash script usually lies in the programs it runs. Not every runtime environment comes with the same programs installed (like git, mktemp, read, etc.).

Both of these potentially problematic situations could make your script fail, or—maybe worse—behave in subtly different ways. With simple scripts it's certainly possible to navigate safely around these problems, but in most cases I recommend running a Bash script inside a known-stable environment, with pinned dependency versions, for example inside a carefully prepared Docker container.

Bash script characteristics

As a programming language, Bash isn't a strictly typed language. With regard to the types of values, this most often results in pretty sloppy programming. Apparently, that's how it's supposed to be, but it might make you feel a bit insecure from time to time.

Another reason to feel insecure is the fact that functions have no predefined parameters. In fact, running a program, running a built-in command, or calling a function all have the same syntax, allowing a variable number of arguments (and options, if applicable). It's up to the function or program to verify that any required argument has been provided.

Just like programs produce output and exit codes, functions can have a numeric return value and optionally print something to stdout or stderr. This is a very different approach to functions than most programmers might expect, but it actually makes a lot of sense in the environment in which these scripts run.

Many Bash functions will have side-effects, like changing the current working directory, creating directories, copying files, exit-ing the process, etc. When designing these functions, I often feel like I'm doing something unnatural, sometimes even dirty.

A big reason for "feeling dirty" is that besides function arguments, functions in-the-wild often use environment variables to base their decisions on. "Environment variables" is another word for "global variables", which I've vowed to never use again in my programs. Still, I find myself writing Bash code like this:

export GIT_CLONE_URL="$(git remote get-url origin)"
export COMMIT_HASH="$(git rev-parse --short --verify HEAD)"

#...

PROJECT_DIR=$(pwd)

function fresh_checkout() {
    cd "$PROJECT_DIR"
    mkdir -p "$PROJECT_DIR/build"
    BUILD_DIR=$(mktemp -d "$PROJECT_DIR/build/$COMMIT_HASH-XXXXXXX")
    git clone "$GIT_CLONE_URL" "$BUILD_DIR"
    cd "$BUILD_DIR"
    git checkout "$COMMIT_HASH"
}

function clean_up() {
    rm -rf "$BUILD_DIR" || true
}

fresh_checkout()

This does the trick, but it isn't particularly well-designed code.

Determined to improve this awful situation, I set out to refactor the clean_up() function. Along the way I learned quite a lot of interesting things about Bash programming, which I'll explain to you in my next post.

Bash Bash Comments

Duck-typing in PHP

Posted on by Matthias Noback

For quite some time now the PHP community has becoming more and more professional. "More professional" in part means that we use more types in our PHP code. Though it took years to introduce more or less decent types in the programming language itself, it took some more time to really appreciate the fact that by adding parameter and return types to our code, we can verify its correctness in better ways than we could before. And although all the type checks still happen at runtime, it feels as if those type checks already happen at compile time, because our editor validates most of our code before actually running it.

To make it perfectly clear: this is all very awesome. In fact, I hope that PHP will change to become more of a static language than a dynamic one. I can very well remember the times when we actually relied on PHP doing the type juggling for us, but I'm happy we've left that phase behind. I think that nowadays many PHP developers agree that silent type conversions is not something which is very useful, nor safe.

But sometimes it's good to remember what's possible with PHP, due to it being a dynamic scripting language. I recently encountered a situation where I wanted to build a generic repository, which would be able to keep track of entities, allowing the user to store and retrieve them by their ID.

class SomeEntity
{
    public function id()
    {
        return $this->id;
    }
}

class GenericRepository
{
    public function store($object)
    {
        $id = $object->id();
        ...
    }

    public function getById($id)
    {
        return ...;
    }
}

So, what are the types we should introduce in this scenario? $id might be a simple string, although these days identifier strings will often get wrapped in their own dedicated value object. Maybe we could enforce an interface for Id type of objects? But then people won't be able to use a simple string anymore. Do I want to force that upon them? The same goes for the objects that our repository is going to store. $object might be typed as an Entity interface (since an object with identity is basically what we call "entity"), which has a method id(), which returns an identifier:

interface Id
{
    public function __toString() : string;
}

interface Entity
{
    public function id() : Id
}

Do we want to force the term Entity onto the user's code? Do we want to force users to implement the Id interface? What if there is no user we can force? What if the "entity" we want to store in our repository is defined in a third-party library?

It doesn't have to be that way. Hey, it's PHP! We only want the user to provide an object which we can use in the following way:

public function store($object) { 
    $id = $object->id();

    /*
     * $id should be a string, or usable as a string (i.e. it has a __toString() method)
     *
     * In fact, we might as well just cast it to a string to be sure:
     */

    $id = (string) $id;

    ...
}

The funny thing is, whatever value the user provides, we can already do this. As long as the method id() exists on the object and PHP can successfully cast its return value to a string, we're fine. As long as we don't define any type at all for the $object parameter, PHP will do no type checking and will just try to do whatever you ask it to do, and throw warnings/errors/exceptions whenever it fails.

The only problem, one that many of us including myself will find a very big problem: our IDE isn't able to help us anymore. It won't be able to verify that methods exist or that passed function argument types are correct. It won't let us click to class definitions, etc. In other words, we loose our ability to do a little bit of the type-checking before runtime.

How to fix this? By helping your IDE to figure it out. PhpStorm for example allows you to define @var or @param annotations to make intended types explicit.

public function store($object) {
    /** @var Entity $object */
    ...
}

// or (this might show some IDE warnings in the user's code):

/**
 * @param Entity $object
 */
public function store($object) {
    ...
}

So, even when $object doesn't actually implement, it will still be treated by the IDE as if it does.

This, by the way, is known as duck typing. Type checks happen at runtime:

With normal typing, suitability is assumed to be determined by an object's type only. In duck typing, an object's suitability is determined by the presence of certain methods and properties (with appropriate meaning), rather than the actual type of the object.

Introducing the php-duck-typing library

The only problem of simply adding a type hint to a value like this is that PHP will simply crash at some point if the passed value doesn't meet our expectations. When we call store() with an object that doesn't really match with the Entity interface, we would like to give the user some insight into what might be wrong. We'd like to know what was wrong about the object we passed to store(), e.g.:

  • The object doesn't implement the Entity interface.
  • It does offer the method id().
  • id() doesn't return an object with a __toString() method though.

In other words: we need some proper validation!

Let me introduce you to my new, highly experimental open source library: php-duck-typing. It allows you to run checks like this:

public function store($object) {
    // this will throw an exception if the object is not usable as Entity:
    Object($object)->shouldBeUsableAs(Entity::class);

    ...
}

Just wanted to let you know that this exists. I had some fun exploring the options. Some open issues:

  • Could an object with a __toString() method be used as an actual string value?
  • What about defining other types which we can use as pseudo-types, e.g. arrays as traversables, arrays as maps, etc.?

I'd be interested to hear your thoughts about this.

For now, this library at least supports the use case I described in this article. I'm not sure if it has a real future, to be honest. Consider it an experiment.

PHP duck-typing Comments

Convincing developers to write tests

Posted on by Matthias Noback

Unbalanced test suites

Having spoken to many developers and development teams so far, I've recognized several patterns when it comes to software testing. For example:

  1. When the developers use a framework that encourages or sometimes even forces them to marry their code to the framework code, they only write functional tests - with a real database, making (pseudo) HTTP requests, etc. using the test tools bundled with the framework (if you're lucky, anyway). For these teams it's often too hard to write proper unit tests. It takes too much time, or is too difficult to set up.
  2. When the developers use a framework that enables or encourages them to write code that is decoupled from the framework, they have all these nice, often pretty abstract, units of code. Those units are easy to test. What often happens is that these teams end up writing only unit tests, and don't supply any tests or "executable specifications" proving the correctness of the behavior of the application at large.
  3. Almost nobody writes proper acceptance tests. That is, most tests that use a BDD framework, are still solely focusing on the technical aspects (verifying URLs, HTML elements, rows in a database, etc.), while to be truly beneficial they should be concerned about application behavior, defined in a ubiquitous language that is used by both the technical staff and the project's stakeholders.

Please note that these are just some very rough conclusions, there's nothing scientific about them. It would be interesting to do some actual research though. And probably someone has already done this.

I think I've never really encountered what you would call a "well-balanced test suite", with a reasonable number of unit tests (mainly used to support development), integration tests (proving that your code integrates well with external dependencies including external services) and acceptance tests (proving that your application does what its stakeholders expect it to do). [Edit: in fact, former co-workers at Ibuildings - Reinier Kip and Scato Eggen - have created a project with a well-balanced testsuite. It was a technically sound project, delivered within budget, well on time. If you see them, please ask them to write a blog post about their experiences.]

Psychological impediments

I have also seen many developers who still didn't write any (or a sufficient number of) tests.

When I was still working as CTO at Ibuildings I wrote several articles to explain to developers that you can't get away with not writing tests anymore. Though I think it's sad that a lot of software still gets released without any tests at all, I wrote my articles in a very kind manner as to not offend anyone. I really don't like scaring people into testing, or pushing feelings of guilt on them. We all need to understand that to start testing should not be a matter of peer pressure. You need the insight that it will make your life and that of your team members easier, and you need to be internally motivated to do it. Also, you need some strong arguments for it, in case your environment (co-workers, managers, customers) try to convince you to stop "wasting time" by writing tests.

If you recognize yourself or your team in any of the things I've said above, please use the following articles as a way to revive the discussion about testing inside your team or your company. Keep the testing spirit alive. And for now and ever, may this day be remembered as Testing Awareness Day! Just kidding.

1. Why write tests?

The world of "software testing" is quite a confusing one and it takes several years to understand what's going on and how to do things "right". In no particular order:

  • Developers use different words for different types of tests, but also for different types of test doubles.
  • Developers are looking to achieve widely varying goals by writing tests.
  • Developers have divergent views on the cost of writing and maintaining a test suite.
  • Developers don't agree on when to test software (before, during, after writing the code).

If you're a programmer it's your inherent responsibility to prove that the production software you deliver:

  • ... is functioning correctly (now and forever)
  • ... provides the functionality (exactly, and only) that was requested

I don't think anybody would not agree with this. So taking this as a given, I'd like to discuss the above points in this article and its sequel.

Continue reading Why write tests? »

2. Does writing tests makes you go faster or slower?

The price you pay for writing automated tests consists of the following parts (and probably more):

  • You need to become familiar with the test frameworks (test runners as well as tools for generating test doubles). This will initially take some time or make you slow.
  • If your code is not well designed already, or if you're not producing clean code yet, writing the tests will be a pain. This will slow you down and may even lead you to giving it up.
  • If you're not very familiar with writing tests, you'll likely end up with tests that are hard to maintain. This will slow you down during later stages of the project.

Continue reading Does writing tests makes you go faster or slower? »

3. When to write tests?

Now we need to answer one other important question that should guide you in your daily quest to improve the quality of the code you deliver: when should you write a test?

Continue reading When to write tests? »

PHP testing unit testing functional testing Comments