Book review: Modernizing Legacy Applications in PHP

Posted on by Matthias Noback

Legacy code

I'm happy that I've discovered the work of P.M. Jones recently. His and mine interests seem to align at several interesting points. Though I don't personally enjoy "putting .308 holes in targets at 400 yards" (as quoted by Phil Sturgeon), I do care a great deal about package coupling (and cohesion for that matter) and I'm also lightly inflammable when it comes to the use of service locators. It also appears that Paul, just like myself and many others in this business, has felt the pain of working on a legacy PHP application, trying to add features to it, or change existing behavior. This is a particularly hard thing to do and because so many incompetent developers have been creating PHP applications since the dawn of PHP, chances are that a big part of the job of competent PHP developers nowadays consists of maintaining these dumpsites of include statements and superglobals.

To be completely honest: I myself have been there. Recently I've come to the conclusion that this is not bad at all. We all have to start at some point, and the logical sequence of concepts by which you learn the PHP programming language is this: first you learn about statements (echo something, write an if statement), then you learn about include files and you use them as functions, then you learn about functions, but you put them all in one big file and they only have coincidental cohesion (you also use arguments by reference). But then you learn about classes and you put your functions in them. You learn to reduce global state by introducing class variables. Finally, you learn how to modularize things, by combining classes into packages inside your project (this does not automatically mean you are going to open source them, which involves one extra learning step).

In my new book (Principles of PHP Package Design) I discuss a bad-ass "legacy class" I wrote myself once, called Page. It did so many things, it was ridiculous. But,

Although I would never write such code today, when I look at the Page class now, I don't feel ashamed about what I did back then. It's clear to me that I wasn't so much struggling to get things working (because everything just worked, I always did a good job in that respect). Rather, I was struggling to get things organized.

Modernizing Legacy Applications

This is what Paul Jones knows too: you need not be mad at your predecessors, you need not throw their code away and start all over, you only need to re-organize things. Paul calls this "modernizing" the application and he wrote this book Modernizing Legacy Applications in PHP about it. It's a good name, because "modernizing" here means getting an application up to todays standards. It's very well possible that in just a couple of years, "modernizing" would mean something else entirely.

To me the book was a real page-turner actually. I read it in just a couple of days. It was of great interest to me. Some time ago I have refactored a large legacy application and replaced uses of mysql_* functions by PDO (wrapped by Doctrine DBAL), Smarty by Twig, my own crappy Page class by the Symfony Routing and HttpFoundation component. Well, this was quite a lot of work (I wrote some things about it here and here). The big mistake I made was to try and do all these things at once! Looking back it was a really dangerous operation. My salvation was my own knowledge of the application and the business domain. I couldn't have verified the correctness of my ruthless refactoring otherwise.

Paul warns us for these kinds of endeavors: he has written this book to show us step by step how we can get what we want: a class-based application, with one front controller, no page scripts, no superglobals, no instantiation, no functions, no source code in the document root, a clear separation of concerns (in this case model, view and controller layers), etc. He takes us by the hand and shows us all the little steps toward this idealistic goal. And who would have guessed, looking at the initial spaghetti mess: this goal is entirely reachable!

I did not often have such an emotional response to programming books like I had to this one. The steps that are explained are so simple, yet you instantly feel that they are in the direction of the light. This feeling of hope that you will be saved is very encouraging. After reading the last pages I actually felt happy that things had turned out so well for this ugly little legacy application. You could call this catharsis: reading this book allowed to wash away my negative feelings with regard to legacy applications and the way I failed to turn them into something better in the past.

No need for a framework

The first thing that I've taken from this book is: you don't need all those fancy framework things to improve a legacy application. First, you have to reorganize the code in the application. Once you have decoupled things, introduced controllers, routing and dependency injection, it will be much easier to bring in some cool (framework) libraries. But before that, you will be fine using your own classes for those purposes. After all, in essence you don't need as much as frameworks tend to offer. When you fix your own problems, you will also learn what routing, templating, controllers and a front controller are actually for, instead of blindly using them as they are provided by your framework vendor.

Some small points of disagreement

Before I continue pointing out some things I would have done differently, please remember: this is a great book, and the things I'm going to mention now are really small issues, which are sometimes even a matter of personal taste and can thus be easily fixed, without being in conflict with the rest of the book.

To me it was a bit strange to see Request and Response objects being injected in controllers as constructor arguments. Request data should be seen as input arguments, response data should be considered return values of the constructor. So it does not make sense to provide them as constructor arguments, instead request data should be provided as an argument of the actual controller method (in this case __invoke()) and response data should simply be returned by the controller method. Creating a new Response object could optionally be left to a ResponseFactory object, which could be injected as a constructor argument of the controller.

Also, when near the end of the book a simple dependency injection container is being introduced, services get the fully qualified class name as their name. I would not choose this convention, since when the class name changes, the service name stays the same, which would be awkward. Funny enough, yesterday Paul himself tweeted a question about this exact same subject.

One last thing I didn't like that much were some code examples in which arguments were passed by reference. It was like this:

class UserValidator
{
    public function validate($user, array &$messages, &isValid;)
    {
        if (...) {
            $isValid = false;
            $messages[] = 'A user is never valid';
        } else {
            $isValid = true;
        }
    }
}

Using reference arguments does make sense when it's the first step in the refactoring process, but the final step of converting these reference arguments into complex return values was missing in the book. I would suggest something like this:

class ValidationResult
{
    private $isValid = false;
    private $messages = array();

    public function valid()
    {
        $this->isValid = true;
    }

    public function invalid()
    {
        $this->isValid = false;
    }

    public function addMessage($message)
    {
        $this->messages[] = $message;
    }

    public function getMessages()
    {
        return $this->messages;
    }
}

class UserValidator
{
    public function validate($user)
    {
        $result = new ValidationResult();

        if (...) {
            $result->invalid();
            $result->addMessage('A user is never valid');
        } else {
            $result->valid();
        }

        return $result;
    }
}

Finally, at some point we are encouraged to convert all uses of echo and print to string return values. A good thing of course! But legacy code is famous for its complicated mixing of PHP and HTML, like this:

function render_users($users) {
    ?>
    <table>
    <?php
    foreach ($users as $user) {
        ?>
        <tr>
            <td><?php echo $user->getName(); ?></td>
            <td><?php echo $user->isActive() ? 'Yes' : 'No'; ?></td>
        </tr>
        <?php
    }
    ?>
    </table>
    <?php
}

In these situations it’s really difficult to convert all the foreach loops and echo statements into building up a string and returning that string, without introducing errors. In such cases I would recommend to leave everything as it is and instead use output buffering:

function render_users($users) {
    ob_start();
    ?>
    <table>
    ...
    </table>
    <?php

    $html = ob_get_contents();
    ob_end_clean();

    return $html;
}

Conclusion

All in all "Modernizing Legacy Applications in PHP" is a really great book, I highly recommend it to you. So even though I found the price quite high (39 dollars), I think it's definitely worth it. Next time you face a legacy application, you will be happy to get to work and turn it into a beautiful modern application, which still does what it did before, but is much better organized and can easily be modified.

Book review PHP book review legacy