The "dark" side of PHP

Posted on by Matthias Noback

About the nature of PHP and its misuse in package design

> Cover of Principles of PHP Package Design This text will be part of the introduction of my upcoming book Principles of PHP Package Design. If you'd like to be notified when the book is released, you can subscribe on the book's page. This will entitle you to a 20% discount once the book is available for download.

Before we continue with the "main matter" of this book, I'd like to introduce you to the mindset I had while writing it. This particular mindset amounts to having a constant awareness that PHP has somewhat of a dark side, as well as keeping our hopes high that it is very well possible to expose this "dark side" to a warm sun of programming principles, coding standards and generally good ideas, collected over the years, carefully nurtured and enhanced by great programmers of different languages and cultures.

PHP is actually a very problematic language. It has somewhat of a bad reputation. This is no surprise to me, given the huge amounts of bad code written in PHP, produced by novice "developers", yet available for a large audience to copy into their projects.

PHP is also a very interesting programming language. I think it deserves to be taken seriously. Many, many web applications are written in PHP and are served by PHP via its server application abstraction layer. There are also many "offline" tools written in PHP, which you can run from the command-line. Stretching the capabilities of PHP a bit further, it is possible to run PHP applications as daemons, or even a desktop applications.

PHP has become such a big player - I guess - because it is so easy to learn. Starting with a simple HTML page it does not take much effort to add some dynamic functionality to it. There is no need to go to school and learn about programming before you can use PHP on your web server.

Dynamic typing

One reason why PHP is so easy to learn and work with is that it does not require you to choose a fixed type for the variables you use: variables can be anything. PHP will automatically convert a variable to the type that is necessary to perform a given operation on it.

I've seen many programmers being very passionately against such "dynamic typing practices". And of course, it has its downsides. But I've never really been in a situation where static typing would have saved me from big trouble. Indeed, it is memory-efficient nor fast, but most of the time this is not an issue (the biggest danger of dynamic typing in PHP is the fact that null, false, "", 0 and even array() are all equal).

No compiler errors

Another reason why PHP is easy to learn is that even when you are still in your trial-and-error period, PHP will not throw hundred compiler errors at you. It is likely that most problems that you introduce in your code will only surface at runtime. The reason for this is that not all code of your application is being loaded at once. That would not even be possible, since PHP is a highly dynamic language. It's possible to load an entire different set of PHP files for each request. This means that by design the symbols used in your code (like class names, constants, function names, etc.) will only be validated very late in the process of running a PHP application.

This has some dreadful effects: one page of a PHP web application may load without any problem. Another page may crash because of a fatal error "class Foo not found". These fatal errors arise because of invalid symbols, like a method call on something that is not an object and definitely does not have the requested method, or a class that implements an interface that does not exist.

Of course, there is nothing much that can be done. It is the nature of PHP that paves the way for very dynamic execution paths. Actually, when PHP starts handling a request, the shape of a PHP application is still not demarcated. Some files in the project directory may be loaded, others may not. There is nothing the PHP engine can do to prevent invalid code from being loaded. PHP can only prevent invalid code from being executed, once it is loaded (of course, PHP code with syntax errors will fail to be loaded).

The shape of a PHP application depends on what is loaded at runtime

The very late validation of symbols in your code introduces a major fragility into PHP applications. Any change, anywhere in a code-base, can cause problems that may not surface immediately, even though such problems would have been easily catchable by a compiler (static analysis is another way to fix "compile" errors; we will discuss this subject later).

Package maintainers misuse late symbol validation

The fact that symbols are validated only very late in the process brings incredible freedom to developers. It is possible to drag downright invalid code along in a project (as long as it is not being executed).

doesThisWork()` function with an obviously invalid argument that is a simple object does not yet trigger a 'Class "NonExistingClass" not found' error.

~~~language-php
doesThisWork(new stdClass());

Instead it would result in this error:

Argument 1 passed to doesThisWork() must be an instance of NonExistingClass, instance of stdClass given

It would not be a good habit to keep such obviously invalid code around in your project. Hopefully someone soon notices that it's dead code - relics of a distant past or antecedents of a future that never made it.

Meanwhile developers who maintain open source PHP packages (for instance those available via packagist.org often add many classes to their packages which would absolutely not pass a sound compilation process. For instance, the Gaufrette filesystem abstraction library comes with adapter classes for many different filesystems like Dropbox or Amazon S3. Each of those adapter classes refers to other classes that are not available. Unless of course you would install some other dependencies in your project.

These dependencies are called "suggested dependencies" or "optional dependencies". I don't believe that such dependencies should be allowed to exist, since they are a contradictio in terminis. You either depend on something, or not. If one of the classes in your package depends on some other class that may or may not be available, you should extract that class and put it in a new package which has the dependency as true dependency, not as an optional one. Code that doesn't work should not be part of your project, and it should not be part of your package too (of course there's more to this than just saying it's wrong - you will read everything about it in the part of the book called "Package design".

Package design principles

The book you are currently reading contains many design principles that are not originating from the PHP community or from the history of PHP itself. They come from great programmers like Robert C. Martin and Martin Fowler who are known for their excellent ideas about code structure and software architecture. Those ideas transcend the world of web application scripts. They also transcend our everyday habits of putting PHP code in a package and releasing it to the world.

I've chosen to dive into those language-agnostic package design principles, learn about them, try them in practice and finally write down my conclusions. The result is a book which gives you some guidelines about composing packages of PHP code.

Given the nature of PHP, as described above, the language itself does not force you to follow these principles. In fact, you can easily circumvent the "rules". However, depending on any of the PHP-specific behaviors described above would automatically turn into a violation of one or more of the package design principles. So in order to fully appreciate what follows, I invite you to set aside your assumptions about PHP and package design. While you are reading this book keep asking yourself: would the quality of my packages improve if I were to follow these package design principles?

Well, of course I think it would! But this is for you to judge.

Book PHP coupling package design
Comments
This website uses MailComments: you can send your comments to this post by email. Read more about MailComments, including suggestions for writing your comments (in HTML or Markdown).
cordoval

in terms of principles i have 2 retorts:

1. consider https://github.com/fabpot/P..., this is a case where maintainer decides to handicap the PHP language options striving via a simplicity/usability principle. The principle i think should go in the lines to prevent abuse but not to handicap possibilities imo.

2. considering that the package that has a suggested dependency is wrong and it is refactored into 2 packages. Then the package that has the dependency on the second package shows up a suggested dependency still. Is this type of suggested dependency still wrong? Or are you meaning that the first package should have determined some set of interfaces written in the first packages to render this suggested dependency into a hard dependency? An easy example would be a Shoe Business with some factories in Amsterdam and in Lima Peru. The business may or may not choose to use both factories at the same time. The boss is thinking in moving the factory to Asia somewhere so it determines standard procedures (interfaces). Now the business does depend on these interfaces. The factory could be anywhere but must implement these set of interfaces to be able to talk with the main business. Example of what i am talking about is https://github.com/jackalop...

Notice however a rightful use of suggested dependencies to clarify the dependencies, is that valid? Also a rightful use of require-dev dependencies. We would not require our test tools to depend on any particular interface of our own.

Matthias Noback

In the book I explain this a bit better. In the case of Gaufrette: the main package contains the abstract things like FileSystem and Adapter. The other packages contain each one specific adapter. Those packages have explicit dependencies, including relevant version constraints. They also depend on the main package. The main package itself could have suggests for the more specific packages.

The "provide" key is very interesting - I didn't see it used like this. Thanks for your suggestion!