Book review: Modernizing Legacy Applications in PHP

Legacy code

I’m happy that I’ve discovered the work of P.M. Jones recently. His and mine interests seem to align at several interesting points. Though I don’t personally enjoy “putting .308 holes in targets at 400 yards” (as quoted by Phil Sturgeon), I do care a great deal about package coupling (and cohesion for that matter) and I’m also lightly inflammable when it comes to the use of service locators. It also appears that Paul, just like myself and many others in this business, has felt the pain of working on a legacy PHP application, trying to add features to it, or change existing behavior. This is a particularly hard thing to do and because so many incompetent developers have been creating PHP applications since the dawn of PHP, chances are that a big part of the job of competent PHP developers nowadays consists of maintaining these dumpsites of include statements and superglobals.

To be completely honest: I myself have been there. Recently I’ve come to the conclusion that this is not bad at all. We all have to start at some point, and the logical sequence of concepts by which you learn the PHP programming language is this: first you learn about statements (echo something, write an if statement), then you learn about include files and you use them as functions, then you learn about functions, but you put them all in one big file and they only have coincidental cohesion (you also use arguments by reference). But then you learn about classes and you put your functions in them. You learn to reduce global state by introducing class variables. Finally, you learn how to modularize things, by combining classes into packages inside your project (this does not automatically mean you are going to open source them, which involves one extra learning step).

In my new book (Principles of PHP Package Design) I discuss a bad-ass “legacy class” I wrote myself once, called Page. It did so many things, it was ridiculous. But,

Although I would never write such code today, when I look at the “Page“ class now, I don’t feel ashamed about what I did back then. It’s clear to me that I wasn’t so much struggling to get things *working* (because everything just worked, I always did a good job in that respect). Rather, I was struggling to get things *organized*.

Modernizing Legacy Applications

This is what Paul Jones knows too: you need not be mad at your predecessors, you need not throw their code away and start all over, you only need to re-organize things. Paul calls this “modernizing” the application and he wrote this book Modernizing Legacy Applications in PHP about it. It’s a good name, because “modernizing” here means getting an application up to todays standards. It’s very well possible that in just a couple of years, “modernizing” would mean something else entirely.

To me the book was a real page-turner actually. I read it in just a couple of days. It was of great interest to me. Some time ago I have refactored a large legacy application and replaced uses of mysql_* functions by PDO (wrapped by Doctrine DBAL), Smarty by Twig, my own crappy Page class by the Symfony Routing and HttpFoundation component. Well, this was quite a lot of work (I wrote some things about it here and here). The big mistake I made was to try and do all these things at once! Looking back it was a really dangerous operation. My salvation was my own knowledge of the application and the business domain. I couldn’t have verified the correctness of my ruthless refactoring otherwise.

Paul warns us for these kinds of endeavors: he has written this book to show us step by step how we can get what we want: a class-based application, with one front controller, no page scripts, no superglobals, no instantiation, no functions, no source code in the document root, a clear separation of concerns (in this case model, view and controller layers), etc. He takes us by the hand and shows us all the little steps toward this idealistic goal. And who would have guessed, looking at the initial spaghetti mess: this goal is entirely reachable!

I did not often have such an emotional response to programming books like I had to this one. The steps that are explained are so simple, yet you instantly feel that they are in the direction of the light. This feeling of hope that you will be saved is very encouraging. After reading the last pages I actually felt happy that things had turned out so well for this ugly little legacy application. You could call this catharsis: reading this book allowed to wash away my negative feelings with regard to legacy applications and the way I failed to turn them into something better in the past.

No need for a framework

The first thing that I’ve taken from this book is: you don’t need all those fancy framework things to improve a legacy application. First, you have to reorganize the code in the application. Once you have decoupled things, introduced controllers, routing and dependency injection, it will be much easier to bring in some cool (framework) libraries. But before that, you will be fine using your own classes for those purposes. After all, in essence you don’t need as much as frameworks tend to offer. When you fix your own problems, you will also learn what routing, templating, controllers and a front controller are actually for, instead of blindly using them as they are provided by your framework vendor.

Some small points of disagreement

Before I continue pointing out some things I would have done differently, please remember: this is a great book, and the things I’m going to mention now are really small issues, which are sometimes even a matter of personal taste and can thus be easily fixed, without being in conflict with the rest of the book.

To me it was a bit strange to see Request and Response objects being injected in controllers as constructor arguments. Request data should be seen as input arguments, response data should be considered return values of the constructor. So it does not make sense to provide them as constructor arguments, instead request data should be provided as an argument of the actual controller method (in this case __invoke()) and response data should simply be returned by the controller method. Creating a new Response object could optionally be left to a ResponseFactory object, which could be injected as a constructor argument of the controller.

Also, when near the end of the book a simple dependency injection container is being introduced, services get the fully qualified class name as their name. I would not choose this convention, since when the class name changes, the service name stays the same, which would be awkward. Funny enough, yesterday Paul himself tweeted a question about this exact same subject.

One last thing I didn’t like that much were some code examples in which arguments were passed by reference. It was like this:

class UserValidator
{
    public function validate($user, array &$messages, &isValid)
    {
        if (...) {
            $isValid = false;
            $messages[] = 'A user is never valid';
        } else {
            $isValid = true;
        }
    }
}

Using reference arguments does make sense when it’s the first step in the refactoring process, but the final step of converting these reference arguments into complex return values was missing in the book. I would suggest something like this:

class ValidationResult
{
    private $isValid = false;
    private $messages = array();

    public function valid()
    {
        $this->isValid = true;
    }

    public function invalid()
    {
        $this->isValid = false;
    }

    public function addMessage($message)
    {
        $this->messages[] = $message;
    }

    public function getMessages()
    {
        return $this->messages;
    }
}

class UserValidator
{
    public function validate($user)
    {
        $result = new ValidationResult();

        if (...) {
            $result->invalid();
            $result->addMessage('A user is never valid');
        } else {
            $result->valid();
        }

        return $result;
    }
}

Finally, at some point we are encouraged to convert all uses of echo and print to string return values. A good thing of course! But legacy code is famous for its complicated mixing of PHP and HTML, like this:

function render_users($users) {
    ?>
    <table>
    <?php
    foreach ($users as $user) {
        ?>
        <tr>
            <td><?php echo $user->getName(); ?></td>
            <td><?php echo $user->isActive() ? 'Yes' : 'No'; ?></td>
        </tr>
        <?php
    }
    ?>
    </table>
    <?php
}

In these situations it’s really difficult to convert all the foreach loops and echo statements into building up a string and returning that string, without introducing errors. In such cases I would recommend to leave everything as it is and instead use output buffering:

function render_users($users) {
    ob_start();
    ?>
    <table>
    ...
    </table>
    <?php

    $html = ob_get_contents();
    ob_end_clean();

    return $html;
}

This technique could be nice to quickly demonstrate in the book.

Conclusion

All in all “Modernizing Legacy Applications in PHP” is a really great book, I highly recommend it to you. So even though I found the price quite high (39 dollars), I think it’s definitely worth it. Next time you face a legacy application, you will be happy to get to work and turn it into a beautiful modern application, which still does what it did before, but is much better organized and can easily be modified.

Posted in Book review, PHP | Tagged , , | 6 Comments


Principles of PHP Package Design – First part of the book is now available

Seven months, two presentations and three blog posts later, I’ve published the first installment of my new book, Principles of PHP Package Design.

From the introduction of the book:

Naturally the biggest part of this book covers package design principles. But before we come to that, we first take a close look at what constitutes packages: classes and interfaces. The way you design them has great consequences for the characteristics of the package in which they will eventually reside. So before considering package design principles themselves, we first need to take a look at the principles that govern class design. These are the so-called SOLID principles. Each letter of this acronym stands for a different principle, each of which we will briefly (re)visit in the first part of this book.

In the second part of the book (the main part) I explain all about the six major principles of package design. The first three are about cohesion. While class cohesion is about which methods belong inside a class, package cohesion is about which classes belong inside a package. The package cohesion principles tell you which classes should be put together in a package, when to split packages, and if a combination of classes may be considered a “package” in the first place.

The second three package design principles are about coupling. Coupling is important at the class level, since most classes are pretty useless on their own. They need other classes to collaborate with. Class design principles like the dependency inversion principle help you write nicely decoupled classes. But when the dependencies of a class lie outside its own package, you need a way to determine if it’s safe to couple the package to another package. The package coupling principles will help you choose the right dependencies. They will also help you recognize and prevent wrong directions in the dependency graph of your packages.

The last part of the book combines the acquired knowledge from the first two more theoretical parts. We create a couple of packages around the Gearman PHP extension, which will fix some of the shortcomings of its API and add some useful functionality on top of it. We pay great attention to which code we put in which package. And we are also very careful about the dependencies we choose for these packages.

When you buy the book now, you can already read the first part of the book (about 95 pages). Over the next months I will gradually release new chapters, covering in great detail the “Principles of Package Design”, tailored to the world of PHP. The biggest part of the text has been written already, but still needs to be reviewed by myself and a couple of proof-readers.

The estimated total number of pages will be between 250 and 300 pages. I guess it will take a couple of months before I’m done, primarily depending on how family life evolves (since as you may have heard, our little girl was born just a couple of weeks ago).

Before I get back to work, let me take the time to thank some people who have helped me get here: Luis Cordova, Matthijs van Rietschoten and Mark Badolato for proof-reading the book so far, and Robert Martin for his encouragements and being a great source of inspiration.

Posted in Book, PHP | Tagged , , | 6 Comments


There’s no such thing as an optional dependency

On several occasions I have tried to explain my opinion about “optional dependencies” (also known as “suggested dependencies” or “dev requirements”) and I’m doing it again:

There’s no such thing as an optional dependency.

I’m talking about PHP packages here and specifically those defined by a composer.json file.

What is a dependency?

First let’s make sure we all agree about what a dependency is. Take a look at the following piece of code:

namespace Gaufrette\Adapter;

use Gaufrette\Adapter;
use \MongoGridFS;

class GridFS implements Adapter
{
    private $gridFS;

    public function __construct(MongoGridFS $gridFS)
    {
        $this->gridFS = $gridFS;
    }

    public function read($key)
    {
        $file = $this->find($key);

        return ($file) ? $file->getBytes() : false;
    }
}

This GridFS class is part of the Gaufrette filesystem abstraction library, though I heavily modified it.

To determine all the dependencies of this code we can ask the following question:

What is needed to run this code?

You need to think of several things:

  1. Which PHP version is needed to run the code without getting a syntax error? Maybe you even need a specific patch version (like 5.3.6) because of a bug in older 5.3 versions that could interfere with your code.
  2. Which PHP extensions should be installed?
  3. Which PEAR libraries should be installed?
  4. Which other packages should be installed?

In the case of the GridFS class the PHP version should be at least PHP 5.3, because of the use of namespace. Also the \MongoGridFS class should be available. This class is part of the mongo PECL extension for PHP. The \MongoGridFS class is only available since version 0.9.0 of that PHP extension, so we have to make sure that we explicitly mention this version constraint. Finally, it appears there are no other packages needed to be able to use the GridFS class. So when we would create a composer.json file for a package that contains the GridFS file, it would look like this:

{
    ...,
    "require": {
        "php": ">=5.3",
        "ext-mongo": ">=0.9.0"
    }
    ..
}

Now this is an exhaustive list of the dependencies of package that contains the GridFS class: when these dependencies are installed, nothing stands in the way of using this class in your application.

The actual list of dependencies of knplabs/gaufrette

As I already mentioned the GridFS class is part of the Gaufrette library which provides a filesystem abstraction layer so you can store files on different types of filesystems without worrying about the details of those filesystems. Let’s take a look at the composer.json file of this library:

{
    "name": "knplabs/gaufrette",
    "require": {
        "php": ">=5.3.2"
    },
    "require-dev": {
        ...
    },
    "suggest": {
        ...
        "amazonwebservices/aws-sdk-for-php": "to use the legacy Amazon S3 adapters",
        "phpseclib/phpseclib": "to use the SFTP",
        "doctrine/dbal": "to use the Doctrine DBAL adapter",
        "microsoft/windowsazure": "to use Microsoft Azure Blob Storage adapter",
        "ext-zip": "to use the Zip adapter",
        "ext-apc": "to use the APC adapter",
        "ext-curl": "*",
        "ext-mbstring": "*",
        "ext-mongo": "*",
        "ext-fileinfo": "*"
    },
    ...
}

After what we’ve discussed above, this is quite a surprise: the library says it has only one actual dependency: a PHP version that is at least 5.3.2. Everything else is either a “dev” requirement or a “suggested” requirement.

Of course people who use Composer and Packagist for some time now (like myself) have become quite used to this way of advertising the dependencies of a package. But it is just wrong. As we concluded earlier, ext-mongo is a true dependency of the GridFS class, yet looking at the composer.json file it is only a suggested dependency.

This means that if I want to use the class in my project, it is not sufficient to require just the knplabs/gaufrette package. I also have to add ext-mongo as a requirement to my own project. Which is semantically wrong: it is not my project that needs the mongo extension, it is the knplabs/gaufrette package that actually needs it. Besides, how do I know which version of ext-mongo I have to choose? Dependencies listed under the suggest key in composer.json don’t come with version constraints, so I have to figure them out myself.

Not just this package

knplabs/gaufrette is not the only package out there that advertises actual, required dependencies as “suggested” dependencies. It is a convenient way for package maintainers to put a lot of different classes in a package that may or may not be needed by users. Since using those classes is optional, their dependencies are made optional too. But package maintainers forget that dependencies never are optional. They are always required, since the code would not be executable without them.

The solution

What package maintainers should do is split their packages. In the case of knplabs/gaufrette this means there should be a knplabs/gaufrette package containing all the generic code for filesystem abstraction. Then each specific adapter, like the GridFS class, should live in its own package, e.g. knplabs/gaufrette-mongo-gridfs. This package itself has the following dependencies:

{
    ...,
    "require": {
        "php": ">=5.3",
        "knplabs/gaufrette": "~0.1"
        "ext-mongo": ">=0.9.0"
    }
}

No hidden dependencies there: everything is truly required for using the code in this package.

On the other hand the knplabs/gaufrette package has almost no dependencies anymore, and the “adapter” packages are listed under the suggested key:

{
    "require": {
        "php": ">=5.3.2"
    },
    "suggested": {
        "knplabs/gaufrette-mongo-gridfs": "For storing files using Mongo GridFS",
        ...
    }
}

This approach has many advantages:

  1. The main package will be very stable. There are almost no reasons for it to change anymore, since all the moving parts are inside the “adapter” packages.
  2. The adapter packages can have different specialists as maintainers, for instance the knplabs/gaufrette-mongo-gridfs can be maintained by someone who knows all about MongoDB.
  3. Users don’t have to keep track of available updates for parts of the library they don’t use.
  4. Users don’t have to manually add extra dependencies to their projects (which means they don’t have to worry about version constraints for them).

So keep in mind, next time you are tempted to add a suggested dependency to your package: is it an actual dependency of (part of) the code in this package? Then split the package and reinstate that dependency as a true requirement. If all the code in the package works perfectly well without that suggested dependency, then you are indeed allowed to advertise it as a suggested dependency.

Want to know more?

bookpage-principles-of-php-package-design

I’m working on a book about package design principles, based on the work of Robert Martin. You may register yourself as an “interested reader” and receive a considerable discount when the first part of the book becomes available next week.

You may also want to read some of the articles about package coupling by Paul Jones (Frameworks are good, components are awesome!, Symfony components: sometimes decoupled, sometimes not). He maintains the Aura framework and components and does a great job when it comes to package coupling.

Posted in Book, PHP | Tagged , , | 14 Comments


Test Symfony2 commands using the Process component and asynchronous assertions

A couple of months ago I wrote about console commands that create a PID file containing the process id (PID) of the PHP process that runs the script. It is very usual to create such a PID file when the process forks itself and thereby creates a daemon which could in theory run forever in the background, with no access to terminal input or output. The process that started the command can take the PID from the PID file and use it to determine whether or not the daemon is still running.

Below you find some sample code of what a daemon console command would look like. It forks itself using pcntl_fork(), a fascinating function which has different return values at the same time (but in different processes!).

use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;

class DaemonCommand extends Command
{
    protected function configure()
    {
        $this->setName('daemon');
    }

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $pid = pcntl_fork();

        if ($pid === -1) {
            throw new \RuntimeException('Could not fork the process');
        } elseif ($pid > 0) {
            // we are the parent process
            $output->writeln('Daemon created with process ID ' . $pid);
        } else {
            file_put_contents(getcwd() . '/daemon.pid', posix_getpid());
            // do something in the background
            sleep(100);
        }
    }
}

Usually you would test a console command using the ConsoleTester class as described in the official Symfony documentation. But we can not use it in this case. We need to isolate the process in order for pcntl_fork() to work correctly, otherwise the PHPUnit test runner itself will be forked too, which will have the effect of running all the following unit tests twice (which can have very strange effects on the test results).

So first we need to start the daemon command in its own process. We can use the Symfony Process Component for this.


use Symfony\Component\Process\Process;

class DaemonCommandTest extends \PHPUnit_Framework_TestCase
{
    /**
     * @test
     */
    public function it_creates_a_pid_file()
    {
        $process = new Process('php app/console daemon');
        $process->start();
    }
}

However, since our console commands crosses the boundaries of one process, this will not suffice. As soon as the daemon child process has been created, it has a life on its own and you can not be sure if and when the daemon has executed its tasks, like creating a PID file. Simply calling assertTrue() like this

public function it_creates_a_pid_file()
{
    $process = new Process('php app/console daemon');
    $process->start();

    $this->assertTrue(file_exists('daemon.pid'));
}

would be nice, but turns out to be unreliable. Tests like these are called “flickering tests”, since they will sometimes fail, sometimes succeed, in a very unpredictable manner. So before we start making assertions about the expected behavior of the daemon we need to wait a reasonable amount of time for the child process to get up and running:

public function it_creates_a_pid_file()
{
    $process = new Process('php app/console daemon');
    $process->start();

    // 5 seconds should be enough to start the daemon, right?
    sleep(5);

    $this->assertTrue(file_exists('daemon.pid'));
}

Better safe than sorry: we wait 5 seconds and make the assertion. This is of course really bad for your test suite. Where the other 1000 unit tests all run in one second, this one test takes at least 5 seconds! Even worse, 5 seconds may not even be enough when, for instance, the daemon somehow triggers the Symfony cache to be rebuilt, which takes much longer than that on most machines. Then again, if the cache does not need to be rebuilt, 5 seconds may be way too much, and 500 milliseconds would be more appropriate.

Polling

Instead of putting the process into a long sleep we need to check regularly if the daemon.pid file has been created, make an assertion and leave the test method as soon as possible. In other words, we need a polling mechanism and a probe. The probe inspects the current system’s condition and lets the polling mechanism know if it is happy about that condition. In case the desired condition is never reached, the polling mechanism keeps track of time and gives up after a number of seconds (I read about this first in Growing Object-Oriented Software, Guided by Tests, an excellent book that taught me many new and interesting things about TDD).

In the case described above, we could implement such a polling/probing/timeout mechanism with a simple loop:

$keepTrying = true;
$startTime = time();
while ($keepTrying) {
    if (time() - $startTime > 5) {
        $this->fail('We waited for 5 seconds but the PID file was not created');
    }

    if (file_exists('daemon.pid')) {
        $keepTrying = false;
    }
}

Well, I don’t like this type of error-prone code in my test methods so I decided to abstract some things into a fully tested library for PHPUnit, which is called PHPUnit Asynchronicity. You can install it in your project using Composer (the package is called matthiasnoback/phpunit-asynchronicity). It allows you to use one simple assertion to accomplish the same thing as described above:

use Matthias\PhpUnitAsynchronicity\Eventually;

class DaemonCommandTest extends \PHPUnit_Framework_TestCase
{
    /**
     * @test
     */
    public function it_creates_a_pid_file()
    {
        $process = new Process('php app/console daemon');
        $process->start();

        $this->assertThat(
            function () {
                return file_exists('daemon.pid');
            },
            new Eventually()
        );
    }
}

You could abstract this a bit further, which would make it better readable:

class DaemonCommandTest extends \PHPUnit_Framework_TestCase
{
    public function it_creates_a_pid_file()
    {
        ...

        $this->assertEventually(
            function () {
                return file_exists('daemon.pid');
            }
        );
    }

    private function assertEventually($probe)
    {
        $this->assertThat($probe, new Eventually());
    }
}

The Eventually class is a PHPUnit constraint that accepts two arguments: a timeout in milliseconds (so 5000 milliseconds would mean a timeout of 5 seconds) and a wait time in milliseconds. After each wait time the probe will again be asked to examine the current state of the system.

By the way, it’s also possible to create a probe that is not a closure, but a stand-alone class:

use Matthias\Polling\ProbeInterface;

class PidFileExists implements ProbeInterface
{
    public function isSatisfied()
    {
        return file_exists('daemon.pid');
    }
}

Then it reads even better:

class DaemonCommandTest extends \PHPUnit_Framework_TestCase
{
    /**
     * @test
     */
    public function it_creates_a_pid_file()
    {
        ...

        $this->assertEventually(new PidFileExists());
    }
}

Posted in PHP, Symfony2, Testing | Tagged , , , | 3 Comments