Matthias Noback - Matthias Noback - Blog

New edition for the Rector Book

2024-02-12T10:30:00+01:00

A couple of weeks ago, Tomas Votruba emailed me saying that he just realized that we hadn't published an update of the book we wrote together since December 2021. The book I'm talking about is "Rector - The Power of Automated Refactoring". Two years have passed since we published the current version. Of course, we're all very busy, but no time for excuses - this is a book about keeping projects up-to-date with almost no effort... We are meant to set an example here!

In the meantime, a lot has changed. Tomas has become a very successful legacy-project-saver, with this incredibly powerful tool called Rector. The project has gained a lot of traction in the PHP development community. Tomas and the co-authors of the project keep improving the tool, making it more useful, developer-friendly, faster, and more stable every day. Recently he released version 1.0 on-stage at Laracon Europe, in Amsterdam. In important moment, which indicates that we are dealing with a mature tool.

I know Tomas as a hard worker. He'll do everything to Get Things Done. He keeps simplifying things, even in the Git project that's behind the book's manuscript. Always looking for ways to prevent developer (or writer) mistakes and to automate common tasks, he ruthlessly cuts away unnecessary weight. When he reads a convoluted paragraph that works around some quirk in Rector, he fixes the issue in Rector, so the text is once more easy to understand. If people ask the same questions over and over again, he adds a helpful command to Rector's command-line interface, so the question will disappear. In other words, he has a great idea for feedback. What kind of signal does the code give us? Is this too hard to work with? Can we simplify this? What kind of signal do we get from readers? Let's improve!

With a 1.0 version for Rector also comes a new version of the book about Rector. This 2024 Edition provides an even better start for your static analysis & automated refactoring journey. I can testify personally: once you start, you'll never want to go back.

Read more about the updates on Tomas' blog: Rector Book 2024 Release with Brand new Chapter

And get the new version here: https://leanpub.com/rector-the-power-of-automated-refactoring

If you already bought a previous version, you can download the latest files for free (of course!).

Dealing with technical debt during the sprint

2022-11-11T12:00:00+01:00

It's quite ironic that my most "popular" tweet has been posted while Twitter itself is in such a chaotic phase. It's also quite ironic that I try to provide helpful suggestions for doing a better job as a programmer, yet such a bitter tweet ends up to be so popular.

Twitter and Mastodon are micro-blogging platforms. The problem with micro-blogs, and with short interactions in general, is that everybody can proceed to project onto your words whatever they like. So at some point I often feel the need to explain myself with more words, in an "actual" blog like this one.

Waste in scrum processes

Hypothesis: the moment a team adds the requirement that each PR/commit should be related to a Jira issue, it will start accumulating even more tech debt than before.

We notice that one of the dependencies of our project has been marked as "abandoned" and we need to upgrade/switch to something else. When helping a new co-worker join the team, we find out that some crucial steps aren't documented in the README. The test framework hasn't been updated for some time, and upgrading means we have to rewrite some test setup code.

These are things that just happen to our projects from time to time. Many developers won't go ahead and make the necessary changes. Being part of a "scrum process" they will:

Create an issue in the backlog
Bring the issue to the attention of the decision maker
Wait for the decision maker to assign it to a future sprint

Finally, once the issue has been assigned to a sprint, it needs to be refined. So a group of people that is often too large starts to talk about and describe what needs to be done. Why? Because scrum tradition prescribes that every person on the team should be able to pick up the issue. I don't think that's true at all; we're just wasting time explaining everything to everyone, while often only 2 people are going to pick it up.

In the end we have to vote for the number of story points that we're going to assign. This magic number has no specific meaning. If we try to describe what such a point represents, we get different answers, even within the same team:

"It's definitely not how much time we spend on it" (because we don't want it to be an estimate)
"It's how complex we think this is" (unfortunately we all have different ideas about what "complexity 3" means)
"We can do about 50 points of these in each sprint" (so it is an estimate after all; given we have two weeks, we can do 50 of them, so each point represents 2 weeks divided by 50 of our time)

Don't get me wrong, it's good to think about how much time something will likely take and use a rough estimate to decide if you want to start working on it. It's just that we are guessing, and we can have huge surprises while we are actually doing the work. Or the opposite happens: we are over-complicating things in the refinement stage and it turns out the actual work was so easy, we get 5 points for the price of 1...

In both cases, a big part of the preparation phase is just a waste of time and energy. I'm certain that many scrum teams could do much, much more if they would let go of wasteful practices like those that "official scrum" or similar project management techniques prescribe.

How to deal with technical debt in a sprint

Back to the original point about technical debt. From time to time developers will notice something about the code base that really needs to be improved, something that is not part of any feature anybody is working on, just "general maintenance" or "developer experience", and so on. Developers need to do this work because they have to battle the forces pulling the project downward. If they do not continuously do that, one day the project will be beyond repair. Yet, the scrum process prescribes that no work be done in the sprint that is not on the board. So an issue has to be created, and we jump back into the project management waterfall.

At this point a developer have several options, each of which appeared several times in the comments to my bitter tweet:

They can follow the process because they have to. There are strict requirements, maybe an ISO-standard, that have to be enforced. No way around it. So first they create an issue, and wait for it to be assigned to a sprint.
In some teams they wouldn't have to wait, they can pick it up inside the sprint. But only if it's small, "a 1" or "a 2", because more would "endanger" the sprint.
They can play the process, and do general improvements as part of the current ticket they are working on (but these are now unrelated changes).
They can commit their changes to a dedicated Technical Debt ticket (but the issue itself is quite meaningless).
They can be passive-aggressive and commit their changes to a random ticket.

I can't speak for everyone and every team, but in my experience, developers are even less likely to improve structural issues (technical debt) if they have to deal with this slow and demotivating process. They don't create those tech debt issues anymore, and the project is going to decline faster.

That's really sad, because most developers I've met are well able to do what's good for the project. They know what will help the project survive. Yet the process keeps them from doing it. So I propose just skipping the whole scrum waterfall in these cases.

Objections

Of course, not everything should go "under the radar". We need planning, discussions, exchanging ideas. We need to challenge solutions, adopt a domain-oriented collaboration style, and so on. But, there are also things we just have to do as developers, and if you ask me, we shouldn't always go through the process. These things just need to be done, sooner rather than later. So many projects have already accumulated so much technical debt that now is the time to act.

Some common objections I've heard to skipping the whole scrum process for technical debt-resolving tickets (rephrasing here):

Developers may no longer spend time on "real work", they become renegades, improving code forever, never delivering any value for the business or the users.
Developers may make mistakes, in judgement and code. They should still create an issue.

These objections make sense to me personally:

I once was such a renegade developer who didn't care about the user, the business, only about cool code and the cool framework. Yes, you need some way to make sure that the work these developers do is visible, and relevant. I'm not sure if scrum is the best option.
We should help everyone not to make mistakes, choose the right solutions, and so on. But... creating an issue isn't saving us from those mistakes. Only if somebody thinks about the problem, and talks with you about the right solution. Which means it's not about the issue per se, you just need to collaborate. For this, I recommend pair or team programming, and building a good test suite.

Conclusion

In short, I think most developers know what's right, what needs to be done to keep the project in a good shape. Just let them do it. But not alone, and not in the shadows.

Refactoring without tests should be fine

2022-10-04T13:30:00+02:00

Refactoring without tests should be fine. Why is it not? When could it be safe?

From the cover of "Refactoring" by Martin Fowler:

Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which "too small to be worth doing". However the cumulative effect of each of these transformations is quite significant.

Although the word "refactoring" is used by programmers in many different ways (often it just means "changing" the code), in this case I'm thinking of those small behavior-preserving transformations. The essence of those transformations is:

The structure of the code changes (e.g. we add a method, a class, an argument, etc.), but
The behavior of the program stays the same.

We perform these refactorings to establish a better design, which in the end will look nothing like the current design. But we only take small, safe steps. Fowler mentions in the introduction of the book that tests are necessary to find out if a refactoring didn't break anything. Then the book goes on to show a large number of small behavior-preserving transformations. Most of those transformations are quite safe, and we can't think of a reason why they would go wrong. So they wouldn't really need tests as a safety net after all. Except, in practice, we do need tests because we run into problems, like:

We make a mistake at syntax-level (e.g. we forget a comma, a bracket, or we write the wrong function name etc.)
We neglect to update all the clients (e.g. we rename a method, but one of the call sites still uses the old name)
An existing test is tied to the old code structure (e.g. it refers to a class that no longer exists after the refactoring)
We change not only the structure, but also the behavior

I find that a static analysis tool like PHPStan will cover you when it comes to the first category of issues. For instance, PHPStan will report errors for incorrect code, or calls to methods that don't exist, etc.

The same goes for the second category, but there's a caveat. In order to make no mistakes here, we have to be aware of all the call sites/clients. This can be hard in applications where a lot of dynamic programming is used: a method is not called explicitly but dynamically, e.g. $controller->{$method}(). In that case, PHPStan won't be able to warn you if you changed the method name that used to be invoked in this way. It's why I don't like methods and classes being dynamically resolved: because it makes refactoring harder and more dangerous. In some cases we may install a PHPStan extension or write our own, so PHPStan can dynamically resolve types that it could never derive from the code itself. But still, dynamic programming endangers your capability to safely refactor.

The third group of failures, where tests become the problem that keeps us from making simple refactorings, can be overcome to a large extent by writing better tests. Many unit tests that I've seen would count as tests that are too close to the current structure of the code. If you want to extract a class, or merge two classes, the unit test for the original class really gets in the way, because it still relies on that old class. We'd have to rewrite, rename, move tests, in order to keep track of the modified structure of the code. This is wasteful, and unnecessary: there are good ways to write so-called higher-level tests that aren't so susceptible to a modified code structure. When the structure changes, they don't immediately become useless or break because of the modified structure.

From my own experiences with refactoring-without-tests, I bet the fourth category is the worst and hardest to tackle. It often happens that you don't just make that structural change, but add some semi-related change to the same commit, one that turns out to change the behavior of the code. I've found that you really need to program in (at least) a pair to reduce the number of mistakes in this category. A navigator will check constantly: are we changing behavior here as well? Is this merely the structural changes we were here for? Examples of mistakes that I made that were more than the small behavior-preserving transformations that refactorings were intended to be:

I removed stuff that seemed unused (correctness of such a change may be really hard to prove)
I changed settings, tweaked stuff hoping for a better performance (this has to be proven with a benchmark)
I tried to replace the design of several classes at once, instead of in the step-by-step fashion that refactoring requires.

I think as developers we've deviated a lot from the original idea behind refactoring. We've been hoping to improve things in many ways at once, trying to pay back all the technical debt. As a result, we don't limit ourselves to making only small structural transformations that would be safe to make, even without tests. And it gets us in this impossible state: we want to refactor but we need tests, but they are so hard to write for this bad code, we can never refactor, we can never test. All is lost.

If only we would have some good, high-level tests, and we'd stick to safe, structural transformations, and we'd all use PHPStan, then nothing could go wrong.

Good design means it's easy-to-change

2022-09-27T11:00:00+02:00

Software development seems to be about change: the business changes and we need to reflect those changes, so the requirements or specifications change, frameworks and libraries change, so we have to change our integrations with them, etc. Changing the code base accordingly is often quite painful, because we made it resistant to change in many ways.

Code that resists change

I find that not every developer notices the "pain level" of a change. As an example, I consider it very painful if I can't rename a class, or change its namespace. One reason could be that some classes aren't auto-loaded with Composer, but are still manually loaded with require statements. Another reason could be that the framework expects the class to have a certain name, be in a certain namespace, and so on. This may be something you personally don't consider painful, since you can avert the pain by simply not considering to rename or move classes.

Still, in the end, you know that a code base that resists change like this is going to be considered "a case of severe legacy code". That's because you can't resist change in the software development world, and eventually it will be time to make that change, and then you can experience the pain that you've been postponing for so long.

Software can resist change in many ways, here are just a few examples that come to mind:

Classes have to be in a certain directory/namespace and methods have to have certain names in order to be picked up by the framework
There's another application using the same database, which can't deal with extra columns that it doesn't know about
Class auto-loading only works if you call session_start() first
And so on... If you have another cool example, please add it as a comment below!

Socially-established change aversion

Change-aversion can also be socially established. As an example, the team may use a rule that says "If you create a class, you also have to create a unit test for it". Which is very bad, because you can use multiple classes in a test and still call it a unit-test, so the every-class-is-a-unit assumption is plain wrong. More importantly, you can't unit-test all types of classes; some will require integrated tests. Anyway, let's not get carried away ;) My point is, if you have such a rule you'll make it harder for developers to add a new class, since they fear the additional (often pointless) work of creating a test for it. In a sense, developers start to resist change. The code base itself will resist change as well, because unit tests are often too close to the implementation, making a change in the design really hard.

Unit tests are my favorite example, but there are other socially-established practices that get in the way of change. Like, "don't change this code, because 5 years ago Leo touched it and we had to work until midnight to fix production". Or "we asked the manager for some time to work on this, but we didn't get it".

Make changing things easy

From these, and many more - indeed - painful experiences, I have come to the conclusion that a very powerful way to judge the quality of code and design is to answer the question: is this easy to change? The change can be about a function name, the location of a file, installing a Composer dependency, injecting an additional constructor dependency, and so on.

However, it's sometimes really hard to perform this evaluation yourself, since as a long-time developer you may already be used to quite some "pain". You may be jumping through hoops to make a change, and not even realize that it's silly and should be much easier. This is where pair or mob/ensemble programming can be really useful: working together on the same computer will expose all the changes that you avoid:

"Hey, let's rename that class!"
"Well, I'm not sure that we can, let's save this for another time."
"Now let's inject that new service as a constructor argument."
"Sorry, we can't use dependency injection in this part of the code base."

That's why I usually go all-in on ensemble programming, so we can have a clear view on all the changes that the team averts. We look the monster in the eyes.

Addendum: when changes break stuff

A partial reason for change aversion in developers is the risk that the change may break other things. If you rename a method, you should rename all the relevant method calls too. Luckily, we have static reflection these days, which will tell you about any call sites that you missed. And of course, the IDE can safely make the change for you in most cases. Unfortunately, this is not always the case. A lot of code in this world has the following issue: if you rename a file/class/method/etc., you won't be able to find all the places that you'd have to update. For instance:

The class name might be used to derive a string name from (e.g. an EmailValidator can be used by dynamically invoking the 'email' validator). Renaming the class breaks the validation.
The controller action has to be called public function [name]Action(), or it can't be invoked. Renaming the class makes an entire route or endpoint unreachable.
A model class has to be in src/App/Entity or it won't be added to the database schema.
And so on!

The danger is, you can't see that you broke something unless you run the full application (or its tests, if you have full coverage). This is very inconvenient. It's why the general rule is "Good code is easy to change", and part of "easy to change" is that a change that wasn't fully propagated through the system will "explode" early. Besides an early warning, preferably a static error (as opposed to a runtime error), it would be great if the error message we get after breaking something is clear and helps us find the problem (the change that we just made). This is by far not always the case in real-world projects!

Building a safety net to prevent breaking changes

One way to make your project safer-to-change is to look for these errors. When you make a change, and you learn (too late) about some issue it causes, make sure it can never happen again. As an example, recently I was working on a Symfony console command. I wanted to use the QuestionHelper which can be retrieved by calling $this->getHelper('question'). Being a dependency for one of my own services, I didn't want to get this helper on the spot, but I wanted to be properly set up as a dependency in the constructor, along with the other service dependencies. Unfortunately, the 'question' helper is not available in the constructor, and you only learn this when you run the command in the terminal. To prevent the issue from happening again I added an integration test that verifies you can run the command and it will at least do something (get past the constructor). That way, the build will let me know when I accidentally broke the command.

Adding a test is a way of preventing a change-related issue at runtime. Maybe a better option is to prevent it analysis time, e.g. by writing a PHPStan rule that triggers an error whenever you call $this->getHelper() in the constructor of a command:

/**
 * @implements Rule
 */
final class GetHelperRule implements Rule
{
    public function getNodeType(): string
    {
        return MethodCall::class;
    }

    /**
     * @param MethodCall $node
     */
    public function processNode(Node $node, Scope $scope): array
    {
        // ...

        if ($node->name->name !== 'getHelper') {
            // This is not a call to getHelper()
            return [];
        }

        // ...

        if (! $scope->getFunction()->getDeclaringClass()
            ->isSubclassOf(Command::class)) {
            // This is not a command class
            return [];
        }

        if ($scope->getFunctionName() !== '__construct') {
            // The call happens outside the constructor
            return [];
        }

        return [
            RuleErrorBuilder::message('getHelper() should not be called ...')
                ->build()
        ];
    }
}

Now whenever someone makes the mistake of calling getHelper() inside the constructor, they'll get this nice and useful error:

 15/15 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%

 ------ ------------------------------------------------------------- 
  Line   src/PhpAstInspector/Console/InspectCommand.php               
 ------ ------------------------------------------------------------- 
  38     getHelper() should not be called in the constructor because  
         helpers have not been registered at that point               
 ------ -------------------------------------------------------------

By the way, if you want to become a master at writing custom PHPStan rules that will make your code safer to change, get my latest book Recipes for Decoupling.

Conclusion

In short, keep an eye out for the following situations:

You want to make a change but you can't because (legacy) reason X. Don't ignore this, or work around it, do something about it!
You make a change, but the change breaks something in a surprising way. Make sure you can't make the mistake again (add a test, or a static analysis rule).

Can we consider DateTimeImmutable a primitive type?

2022-09-20T10:30:00+02:00

During a workshop we were discussing the concept of a Data Transfer Object (DTO). The main characteristic of a DTO is that it holds only primitive-type values (strings, integers, booleans), lists or maps of these values including "nested" DTOs. Not sure who came up with this idea, but I'm using it because it ensures that the DTO becomes a data structure that only enforces a schema (field names, the expected types, required fields, and optional fields), but doesn't enforce semantics for any value put into it. That way it can be created from any data source, like submitted form values, CLI arguments, JSON, XML, Yaml, and so on. Using primitive values in a DTO makes it clear that the values are not validated. The DTO is just used to transfer or carry data from one layer to the next. A question that popped up during the workshop: can we consider DateTimeImmutable a primitive-type value too? If so, can we use this type inside DTOs?

I thought it was an interesting question to explore. I'd like to say "No" immediately, but why?

Is something deserving of a predicate? To decide, we have to define what the predicate means. Stated in an abstract way like this it makes a lot of sense, but when discussing concrete questions it's not often clear that we should talk about definitions; we often like to jump to an answer immediately! So, in this case, what's a primitive-type value?

First, we could consider "primitive" to mean "can't be further divided into parts". In that sense, when a Money value object would consist of an integer for the number of cents, and a string declaring the currency, Money is not primitive, but the integer and the string inside are, because it doesn't make sense to take them apart. Although you might say that something more primitive than a string is a character. It's just that PHP doesn't distinguish this type. DateTimeImmutable in this sense is not primitive, the value that it contains (the timestamp) is.

Second, we could consider "primitive" to mean "what's not an object", or as PHP calls these values: scalars. Again, "primitive" takes into account what the programming language considers primitive, because Java for instance has strings which are considered primitive values but are nevertheless objects. Java has a weird relationship with primitive values anyway, because strings, integers, etc. look very primitive in Java code (i.e. you don't do new String("a string") but just write "a string")). With PHP there's less confusion around this concept. When "primitive" is used in this sense, DateTimeImmutable could never be considered a primitive-type value, because it's an object, but in Java it could be, because other primitive values are considered to be primitive regardless of them being an object.

Third, we could consider "primitive" to mean "whatever types the language offers out-of-the-box". This is often equivalent to "native". Unfortunately, this isn't a very helpful definition, since what's native is unclear. Is there a "core" part of the language that defines these types? In that case, where does DateTimeImmutable belong? Isn't that part of an extension? Also, would we consider file handles (resources) primitive types? In the end, most of what is part of the "core" or "native" language is quite arbitrary.

Fourth, we could consider "primitive" to mean - regardless of the language - "what types do we need to describe data?" In that sense, we may look back in the history of humanity itself and consider numbers very primitive (e.g. for describing the value of something). Same for strings (e.g. for writing down a name). Arguably a date or a time isn't primitive, because it's built up from strings (or characters), and numbers.

Fifth, we could consider "primitive" to mean "bare values, not necessarily sensible or correct ones". So "2" is a primitive value, it doesn't say 2 of what, so we can't judge if the value is correct. "UKj" is a primitive value, it doesn't say what it describes, so there's no way to judge this value. Using this definition, a DateTimeImmutable value is certainly not a primitive value because when you instantiate it, it processes the provided string constructor argument and throws an error if it is not a sensible one. Or, maybe worse, converts it into a value that does make sense, but may no longer match the intention of the actor that produced the value.

For me, this final point is the most important attribute of primitive-ness, which disqualifies DateTimeImmutable as a primitive-type value. Anyway, we already established that DateTimeImmutable can't be considered primitive according to the other definitions either.

Am I missing any possible definitions of "primitive" here? Just let me know!

Is it a DTO or a Value Object?

2022-09-13T09:30:00+02:00

A common misunderstanding in my workshops (well, whose fault is it then? ;)), is about the distinction between a DTO and a value object. And so I've been looking for a way to categorize these objects without mistake.

What's a DTO and how do you recognize it?

A DTO is an object that holds primitive data (strings, booleans, floats, nulls, arrays of these things). It defines the schema of this data by explicitly declaring the names of the fields and their types. It can only guarantee that all the data is there, simply by relying on the strictness of the programming language: if a constructor has a required parameter of type string, you have to pass a string, or you can't even instantiate the object. However, a DTO does not provide any guarantee that the values actually make sense from a business perspective. Strings could be empty, integers could be negative, etc.

There are different flavours of the class design for DTOs:

/**
 * @object-type DTO
 *
 * Using a constructor and public readonly properties:
 */
final class AnExample
{
    public function __construct(
        public readonly string $field,
        // ...
    ) {
    }
}

/**
 * @object-type DTO
 *
 * Using a constructor with private readonly properties
 *  and public getters:
 */
final class AnotherExample
{
    public function __construct(
        private readonly string $field,
        // ...
    ) {
    }

    public function field(): string
    {
        return $this->field;
    }
}

Regarding the naming of a DTO: I recommend not adding "DTO" to the name itself. If you want to make it clear what the type is, add a comment, or an invented annotation (or attribute) like @object-type. This will be very useful for developers that are not aware of these object types. It may trigger them to look up an article about what it means (this article, maybe :)).

What's a value object and how do you recognize it?

A value object is an object that wraps one or more values or value objects. It guarantees that all the data is there, and also that the values make sense from a domain perspective. Strings will no longer be empty, numbers will be verified to be in the correct range. A value object can offer these guarantees by throwing exceptions inside the constructor, which is private, forcing the client to use one of the static, named constructors. This makes a value object easy to recognize, and clearly distinguishable from a DTO:

final class AnExample
{
    private function __construct(
        private string $value
    ) {
    }

    public static function fromValue(
        string $value
    ): self {
        /*
         * Throw an exception when the value doesn't 
         * match all the expectations.
         */

         return new self($value);
    }
}

While a DTO just holds some data for you and provides a clear schema for this data, a value object also holds some data, but offers evidence that the data matches the expectations. When the value object's class is used as a parameter, property, or return type, you know that you are dealing with a correct value.

How should we use these object types?

Meaning is defined by use. If we are using "DTO" and "value object" in the wrong way, their names will eventually get a different meaning. This might be how the confusion between the two terms arises in the first place.

DTOs

A DTO should only be used in two places: where data enters the application or where it leaves the application. Some examples:

When a controller receives an HTTP POST request, the request data may have any shape. We need to go from shapeless data to data with a schema (verified keys and types). We can use a DTO for this. A form library may be able to populate this DTO based on submitted form data, or we can use a serializer to convert the plain-text request body to a populated DTO.
When we make an HTTP POST request to a web service, we may collect the input data in a DTO first, and then serialize it to a request body that our HTTP client can send to the service.
For queries the situation is similar. Here we can use a DTO to represent the query result. As an example we can pass a DTO to a template to render a view based on it. We can use a DTO, serialize it to JSON and send it back as an API response.
When we send an HTTP GET request to a web service, we may deserialize the API response into a DTO first, so we can apply a known schema to it instead of just accessing array keys and guessing the types. API client packages usually offer DTOs for requests and responses.

Value objects

A value object is used wherever we want to verify that a value matches our expectations, and we don't want to verify it again. We also use it to accumulate behavior related to a particular value. E.g. if we have an EmailAddress value object, we know that the value has been verified to look like a valid email address,so we don't have to check it again in other places. We can also add methods to the object that extract for instance the username, or the hostname, from the email address.

Value objects are often used in domain models because guarantees, or invariants, are an important part of their business. But they can be used anywhere in an application, since every part of the application will need ways to centralize some rules, provide evidence of correctness, and accumulate related behavior.

Conclusion

There's much more to say about value objects, but that was not the point of this article (if you want to read more, check out my book Object Design Style Guide, or Implementing Domain-Driven Design by Vaughn Vernon). The goal was to show most clearly the difference between DTOs and value objects and I hope they will no longer be confused. Here's a summary table:

A DTO:

Declares and enforces a schema for data: names and types
Offers no guarantees about correctness of values

A value object:

Wraps one or more values or value objects
Provides evidence of the correctness of these values

A step-debugger for the PHP AST

2022-09-05T11:30:00+02:00

When you're learning to write custom rules for PHPStan or Rector, you'll have to learn more about the PHP programming language as well. To be more precise, about the way the interpreter parses PHP code. The result of parsing PHP code is a tree of nodes which represents the structure of the code, e.g. you'll have a Class definition node, a Method definition node, and within those method Statement nodes, and so on. Each node can be checked for errors (with PHPStan), or automatically refactored in some way (with Rector).

The tree of nodes is called Abstract Syntax Tree, and a successful PHPStan or Rector rule starts with selecting the right nodes from the tree and "subscribing" your rule to these nodes. A common approach for this is to start var_dump-ing or echo-ing nodes inside your new rule, but I've found this to be quite tedious. Which is why I've created a simple command-line tool that lets you inspect the nodes of any given PHP file.

The tool is called AST Inspector and is available on GitHub.

Install it with Composer:

composer require --dev matthiasnoback/php-ast-inspector

Then run:

vendor/bin/ast-inspect inspect [file.php]

You'll see something similar to this output:

You can navigate through the tree by going to the next or previous node, or jumping into the subnodes of the selected node. Navigation conveniently uses the a,s,d,w keys.

Currently the project uses the PHP-Parser library for parsing. Since PHPStan adds additional virtual nodes to the AST, it will be useful to show them in this tool as well, but that requires some additional work. Another interesting addition would be to show the types that PHPStan derives for variables in the inspected code. That will also require some more work...

For now, please give this program a try, and let me know what you think! I'm happy to add more features to it, as long as it makes the learning curve for these amazing tools less steep. And if you're looking for an in-depth exploration of writing your own PHPStan or Rector rules, check out the documentation linked above or one of my books (Recipes for Decoupling, which shows how to create PHPStan rules, and Rector - The Power of Automated Refactoring, which does the same for Rector).

Simple Solutions 1 - Active Record versus Data Mapper

2022-08-10T00:00:00+02:00

Having discussed different aspects of simplicity in programming solutions, let's start with the first topic that should be scrutinized regarding their simplicity: persisting model objects. As you may know, we have competing solutions which fall into two categories: they will follow either the Active Record (AR) or the Data Mapper pattern (DM) (as described in Martin Fowler's "Patterns of Enterprise Application Architecture", abbrev. PoEAA).

Active record

How do we recognize the AR pattern? It's when you instantiate a model object and then call save() on it:

$user = new User('Matthias');

$user->save();

In terms of simplicity as seen from the client's perspective, this is amazing. We can't imagine anything that would be easier to use. But let's take a look behind the scenes. If we'd create our own AR implementation, then the save() function looks something like this:

final class User
{
    public function __construct(
        private string $name
    ) {
    }

    public function save(): void
    {
        // get the DB connection

        $connection->execute(
            'INSERT INTO users SET name = ?',
            [
                $this->name
            ]
        );
    }
}

In order for save() to be able to do its work, we need to somehow inject the database connection object to the save(), so it can run the necessary INSERT SQL statement. Two options:

One, we let save() fetch the connection:

use ServiceLocator\Database;

final class User
{
    // ...

    public function save(): void
    {
        $connection = Database::getConnection();

        $connection->execute(
            'INSERT INTO users SET name = ?',
            [
                $this->name
            ]
        );
    }
}

The practice of fetching dependencies is called service location, and it's often frowned upon, but for now this does the trick. However, the simplicity score goes down, since we have to import the service locator, and call a method on it (-2 points?).

The second option is to pass the connection somehow to the User object. The wrong approach is this:

use ServiceLocator\Database;

final class User
{
    // ...

    public Connection $connection;

    public function save(): void
    {
        $this->connection->execute(
            // ...
        );
    }
}

That's because the burden of providing the Connection is now on the call site where the User is instantiated:

$user = new User();
$user->connection = /* ... get the connection */;
// ...

This would definitely cost points in the "ease-of-use" category. A better idea is to provide the connection in the framework's bootstrap code somehow:

final class User
{
    // ...

    public static Connection $connection;

    public function save(): void
    {
        self::$connection->execute(
            // ...
        );
    }
}

// Somewhere in the framework boot phase:
User::$connection = /* get the connection from the container */;

Because we don't want to do this setup step for every model class, and because we are likely doing similar things in the save() function of every model, and because we want each of our model classes to have a save() function anyway, every AR implementation will end up with a more generalized, reusable approach. The way to do that is to remove the specifics (e.g. the table and column names) and define a parent class that can do everything. This parent class defines a few abstract methods so the model is forced to fill in the details:

abstract class Model
{
    public static Connection $connection;

    abstract protected function tableName(): string;

    /**
     * @return array
     */
    abstract protected function dataToSave(): array;

    public function save(): void
    {
        $dataToSave = $this->dataToSave();

        $columnsAndValues = /* turn into column = ? */;
        $values = /* values for parameter binding */;

        $this->connection->execute(
            'INSERT INTO ' . $this->tableName() 
                . ' SET ' . $columnsAndValues,
            $values
        );
    }
}

// Pass the connection to all models at once:
Model::$connection = /* get the connection from the container */;

We should award ourselves several simplicity points in the area of reusability! The AR model class is now portable and useful in different contexts. We can use the same simple solution again and again. However, we also get a lot of negative points. Because we are introducing a parent class, and each model class has to provide a number of abstract methods, so the number of elements (functions, classes, etc.) as well as the number of lines of code (LoC) increase dramatically.

A certain risk of this Model class is that it's going to have a lot of additional behavior that is not needed by all of the model classes that extend from Model. The API of each model becomes very big, containing methods like find(), delete(), and for creating or loading related model objects in a dynamic way.

In fact, instead of implementing AR ourselves, it's more likely that we'll be importing a library that solves all of our current and future needs. To be honest, we already had (at least) one dependency for the DB's Connection class, but now we add another one, which itself includes many more, making our solution drop many points on the simplicity scale.

Data mapper

Let's consider the data mapper pattern now. You recognize this pattern by a model object that is instantiated and then handed over to a service that persists the object for you. The service is often called "repository", mixing in the Repository pattern (also from PoEAA, but maybe more famous in the modeling space because of the Domain-Driven Design books by Eric Evans, Vaughn Vernon, and the likes):

final class UserRepository
{
    public function __construct(
        private Connection $connection
    ) {
    }

    public function save(User $user): void
    {
        $this->connection->executeQuery(
            'INSERT INTO users SET name = ?',
            [
                // how does it get the name?
            ]
        );
    }
}

Since UserRepository is a service, we don't have to worry about "getting" the database connection, it's there. So this class has no dependency on the service locator, but of course it does have a dependency on the Connection class itself. So in terms of dependencies UserRepository::save() is simpler than User::save(). However, from the perspective of the client, it's less simple because a client can no longer call $user->save(), but has to pass User to UserRepository:

// assuming $this->userRepository is a `UserRepository`:

$user = new User('Matthias');

$this->userRepository->save($user);

This means every client that wants to save a User requires an additional dependency, so the overall number of points for dependency management may be equal for both AR and DA. However, I think we could make a strong case for putting a higher penalty on resorting to static service location, versus (constructor) dependency injection. We'll save that discussion for another article.

One thing to note is that User does not extend from a base Model class. In fact, it will never have to. It has no special abilities, or any methods at this point. We are free to make what we want of this object, which is why the Data Mapper pattern is naturally a better match for a domain model made with objects.

final class User
{
    public function __construct(
        private string $name
    ) {
    }
}

Earlier I skipped one important step in the UserRepository::save() implementation, which will get us into trouble now: how does the repository get the data out of the User object in order to use in in the SQL INSERT query? I wrote about this problem earlier in ORMless; a Memento-like pattern for object persistence, but let's repeat our options here:

We could add getters to the User for each property that needs to be persisted. This would widen the API way too much, exposing all the internal state to any client of User.
We could use reflection to extract the data from the User's private properties. This is what ORMs implementing DM will do, but it requires dynamic programming, leaving most of the "mapping" logic implicit, not type-safe, while fully breaking the object's encapsulation, allowing the model to keep absolutely nothing to itself.
We could add a single method to User that exposes all of its persistable data at once, e.g. asDatabaseRecord(): array.

I think the last option makes the most sense, at least in our current example. We'll add one method to User:

final class User
{
    public function __construct(
        private string $name
    ) {
    }

    /**
     * @return array
     */
    public function asDatabaseRecord(): array
    {
        return [
            'name' => $this->name
        ];
    }
}

The repository uses this to build up the SQL query:

final class UserRepository
{
    public function __construct(
        private Connection $connection
    ) {
    }

    public function save(User $user): void
    {
        $data = $user->asDatabaseRecord();

        $columnsAndValues = /* turn into column = ? */;
        $values = /* values for parameter binding */;

        $this->connection->executeQuery(
            'INSERT INTO users SET ' . $columnsAndValues,
            $values
        );
    }
}

When you approaching the model object with the DM pattern like this, there is no immediate need to extract common functionality into a package, or to introduce a third-party package to the project, besides the one that contains the Connection class. If you still want to do that, the solution will become less simple again, loosing some simplicity points. Compared to AR, introducing a DM-support package doesn't change the API of the model class itself by adding a lot of methods (or code in general) to it that aren't needed, but it will certainly introduce what's known as accidental complexity. This is complexity you didn't want to deal with, but that you inherit because this DM-support package (ORM) has to solve any potential persistence-related problem, not just your problems.

Conclusion

In this article I've tried to analyze the "simplicity" of two competing patterns for object persistence: Active Record and Data Mapper. With regard to dependency management and ease of use, both had some positive and negative points, resulting in similar scores. However, AR introduces many more code elements than needed, mostly because it relies on inheritance to give the model the powers it needs to persist itself. When using DM, the model objects don't inherit anything. They are plain old objects. A complicating issue for DM is how to get the data out of the object, which is not a problem for AR. You can solve this with a simple state-exposing method as we've seen, but many projects may introduce an additional DM-support package, which complicates the solution a lot. In the end, importing an additonal ORM package for your persistence needs is what complicates both AR and DM solutions.

What's a simple solution?

2022-08-02T14:00:00+02:00

"As I'm becoming a more experienced programmer, I tend to prefer simple solutions." Or something similar. As is the case with many programming-related quotes, this is somewhat of a blanket statement because who doesn't prefer simple solutions? To make it a powerful statement again, you'd have to explain what a simple solution is, and how you distinguish it from not-so-simple solutions. So the million-dollar question is "What is a simple solution?", and I'll answer it now.

Just kidding. But I do have some ideas about this.

There are several aspects of simplicity we'd have to consider. For instance, we could consider simple solutions in programming to mean several things:

Easy to use (e.g. a single method call, a single dependency, a single argument, etc.)
Easy to understand (e.g. the logic can be grasped by looking at a single function definition)
Requires the least amount of code (e.g. 5 lines of code instead of 100)
Requires the least amount of classes, functions, etc. (e.g. 1 class instead of 10)
Independent (e.g. no additional packages need to be installed)
Easy to change (e.g. maintained by ourselves, not dependended upon too heavily)
Allowing reuse of the solution in a somewhat different context (e.g. portable, not necessarily generic)

These properties aren't exclusive. What we consider simple solutions often expose several of them at the same time. There may also be others that I've missed here (of course). Still, this list could help us judge the simplicity of a given solution. We might even be able to quantify this simplicity. For instance, extending from a parent class would subtract 5 points from the simplicity score, because it imports quite a bit of code, adds a class to your solution, makes it less easy to understand because you have to jump to the parent class and figure out how it merges its behavior with the subclass, and on top of that requires an additional library to be installed. Considering all the downsides, the penalty might be a bit higher ;)

In the next few articles I'll cover several areas of programming where there's a debate about the best way to do something. E.g. for a model, should we use active record, or the data mapper pattern? For dependency resolution, should we use a container, or a locator? And so on. We'll look at each of the solutions from the perspective of simplicity, and calculate some kind of score for the competing solutions. If I'm not mistaken, you should then be able to calculate your level of programming experience based on those simplicity scores ;)

My book-writing workflow

2022-07-29T13:00:00+02:00

By request: what's my workflow for writing books? Steps, tools, etc.

Writing with the Leanpub platform

A long time ago I noticed that testing-advocate Chris Hartjes published his books on Leanpub. When I had the idea of writing a book about the Symfony framework, I tried this platform and it was a good match. You can write your book in Markdown, commit the manuscript to a GitHub repository, and push the button to publish a new version. Leanpub will generate EPUB and PDF versions for you. Readers can be updated about releases via email or a notification on the website.

While publishing multiple books like this, I kept running into these limitations:

Even though writing a book like this feels a lot like writing code (make a change, push, deploy), what's missing is a continuous integration step where we can verify that the book is "correct".
With the Leanpub Markdown setup you can include .md chapter files from your main Book.txt file, but you can't include files from those .md files. This makes it hard to create a nested folder structure for chapters, sections, and code samples.
If you include source code in a book, it has to be formatted to be readable and to fit on a page, and parts have to be skipped or abbreviated. Still, you want the code to be correct. For this you want to run tests on the code, but you can't run tests on incomplete or abbreviated code samples. I no longer wanted to copy/paste working code and format it manually for presentation purposes.
The code samples should be tested, but they should also be analyzed with PHPStan, and they should be upgrade-able to newer PHP versions. The go-to tool for this is Rector. I want to have these tools available in my book projects by default.
When I include the output of tools like PHPUnit in the book, I don't want to copy/paste it from the terminal either. When I make a change to the code, the output should be regenerated automatically and included in the book.

A manuscript pre-processor

To overcome all these problems I created a pre-processor tool that allows me to do things that aren't possible with Leanpub's Markdown (or Markua, their variation on Markdown), but that eventually result in a manuscript directory that Leanpub can turn into an actual book. This means that nowadays I write in "Matthias-Flavored Markua" ;) And that I have a continously-integrated book-writing process which uses PHPUnit for tests, PHPStan for static analysis, Rector for automated refactoring, and ECS for the coding standard. The pre-processor tool is hand-written in PHP. Given that it has to process Markua, it contains a Markua parser which leverages the awesome Parsica library.

Some of the things the tool can do:

You can add special comments to code samples that influence the final look when included in the manuscript, e.g. // skip-start and // skip-end to create an ellipsis (// ...) or // crop-start and // crop-end to remove lines from the beginning and end of an included file.
You can create a cover image with Gimp, and the tool will convert the .xcf file to a .png file that matches Leanpub's expectations.
You can include .md files from any other .md file. In the end, everything will be compiled into a single book.md file.
When you regenerate this manuscript, it's easy to see which parts have been changed, which allows you to spot parts that were changed by mistake, or that no longer work because of a different output from PHPUnit, etc.
You can collect all external links and generate a link registry for it, so the book will never have broken links.
You can include code from a package in vendor/ and reformat it according to the style used for the other code samples.
And so on! The project is yet another example of the amazing powers that will be unleashed if you have access to an Abstract Syntax Tree for a language, in this case Markua.

The path towards releasing a new book

For me, a new book always starts with an idea, which I immediately judge for its capability of becoming a book (as opposed to a blog post, or series of blog posts). In a sense, every idea can become a book, but it has to be interesting, I should be able to build on my own experience in the subject area, and at the same time there should be plenty to learn or explore, so the process of writing will be interesting for me as well. The result of this thinking is often a title. Strictly speaking it's a working title; the final title is often (slightly) different:

"More than a month with Symfony" became "A year with Symfony"
"Principles of Package Design" was used as-is
"Microservices for everyone" idem
"A field guide to object design" became "Style Guide for Object Design" and when republished with Manning "Object Design Style Guide"
"Rector book" became "Rector - The power of automated refactoring"
"PHP for the web", "Advanced Web Application Architecture", and "Recipes for decoupling" had these titles from the start.

With a (working) title, you can set up a new project on Leanpub, and a new Git repository on GitHub. I usually start working on a cover first, looking for abstract stock images that somehow match the book's topic. The cover image tends to change later on, which makes sense. The personality of the book only becomes apparent after writing the biggest part of it. To match the personality the image needs to be modified.

For quite some time I write quietly, and I don't announce the book in public. I want to be certain of its viability before making any kind of "promise" about it. There's the 100 page milestone, at which point I like to share the landing page, that shows the cover image and a description, and allows interested readers to sign up to be notified about the release.

When most of the chapters are written (but not yet good enough to publish), I start revising the first chapters. In the case of "Recipes for Decoupling", my approach changed radically during the writing process. Revising the first chapters to match the style of the later chapters took quite a bit of time.

When the first chapters are good enough to be published, I push the button and release the first ~80 pages to the public. My partner still thinks it's weird to release an incomplete book, but I do it for several reasons:

I hope that what I wrote can already be helpful for some readers today, and it would be a shame to hide it from them.
I hope to receive some feedback that helps me improve what has been published, but also what is still to be written.
Knowing that people are buying and reading the book gives me an incredible boost: I really want to finish it and make sure that it doesn't take too long.
Publishing early prevents me from getting stuck in "perfection mode". I want high quality, but also pragmatism (in book writing and in programming).

At this point I also add a banner to my website, so visitors may notice that a new book is available.

To stay in the "release flow" I declare that I'll publish a new chapter every two weeks (or every week sometimes). Steadily releasing new chapters is great, but I guess it only works if there is some kind of linearity to the book. So far, I haven't felt the need to make big changes to chapters that have already been released. I feel like this is taken care of by the big revision phase before the release of the first few chapters. Still, the chapters have to be somewhat modularized, self-contained, and I'm always looking a natural progression from one topic to the next, where the next one seamlessly builds on top of the previous one.

Then, after many hours, the final version is released.

Time investment

I kept track of all the hours that went into writing, revising, project management, etc. (except the hours for "A Year With Symfony" because I wasn't a freelancer back then):

Principles of Package Design: 137 (Leanpub) + 111 (Apress)
Microservices for everyone: 342
Object Design Style Guide: 127 (Leanpub) + 137 (Manning)
Advanced Web Application Architecture: 422 hours
PHP for the web: 98 hours
Rector: 180 hours + 50 hours (book tool)
Recipes for Decoupling: 194 hours + 20 hours (book tool)

This amounts to about 1 hour per page on average. That's pretty efficient, but it's not really hard to achieve this average because code samples take up a lot more space than regular paragraphs. How much time has to be spent also depends on how much research and trail and error is involved. If it's a subject I know a lot about, it won't take that much time. Comparing with the lifetime sales of these books, it amounts to a ~$50/hour income. Which is quite alright if you ask me, although development or training work pays more. Anyway, it doesn't really make sense to value a book by the calculated hourly income. Personally I like to earn a living with other work, thereby earning time to write books and do other things I like to do.

Beyond a certain number of pages/chapters I find that a book becomes almost unmanagable, i.e. too big for my brain to deal with. So keeping it below 300 pages is a good idea. For "Recipes for Decoupling" I even aimed for 150 pages, and it would've been nice to write a smaller book this time, but the topic didn't fit in such a small number of pages, so it's a ~300 page book again...

The print edition

After releasing the e-book some work has to be done to make it ready for print. Preparing a book for print-on-demand via Lulu is quite easy. Leanpub has a built-in option for generating a print-ready PDF version. There used to be some problems with this PDF. Code snippets inside regular paragraphs weren't properly line-wrapped, which would lead to text being printed too far into the page margins. Nowadays, there are no such issues. Lulu accepts the format as-is. I use the common "Crown Quarto" format, which is supported by both Lulu and Leanpub. Most of the work goes into remaking the e-book cover into something that works for a book. Lulu provides a very useful template PDF for the cover that can be imported into Gimp and the likes. This template already takes into account the number of pages and the thickness of the paper.

Before you can release the print edition, you have to buy a copy of the book for yourself, to check if everything is good. After receiving it you can activate the book's landing page on Lulu.

Publicity

I don't do much in the marketing area. Leanpub collects email addresses of interested readers that you can use only for the release notification. Of course, it would be useful if I could notify readers of previous books too, but I also think that there is a large overlap between my book audience and people following me on Twitter, or regularly reading my blog posts, so I'm pretty sure I'm able to reach my existing audience via these channels. This remains guesswork though.

I've heard from other writers that building up a mailing list works really well for them. To make it work you have to collect people's email addresses, offer them a regular email with some truly interesting things in it. As a reward you can notify thousands of people about a new book, which is likely resulting in some sales. Something about this has always felt a bit weird to me. The newsletter is not really a way to share useful information, it's a way to indirectly secure more income. Also, it uses the "spam" concept of targeting many, in the hope that a few will buy. This just isn't a good fit for me, although many have recommended this approach to me, and all books about self-publishing authors advocate it. Also, everybody does it. I hope it's clear that I'm not saying you shouldn't do it, just that I'd rather not do it. Although I know I miss out on some money by making this decision.

What does work well for me is the monthly Leanpub sale, which regularly offers my books with a 10-15% discount. Also setting up bundles of books that can be bought together with a big discount turned out to be a good idea.

Conclusion

I hope this article has given you some useful background information about my book-writing process, the workflow I use, and what tools are involved. To be honest, I don't think it should be taken as a tutorial for future tech book writers. It's just what I do and what works for me, and there are probably many more factors in successful book writing that have not been covered here. However, if you were thinking about writing a book, I can recommend taking a similar lean approach to book-writing. Just get your words out there!

Would you like to have more information? Please leave a comment.