A couple of weeks ago, Tomas Votruba emailed me saying that he just realized that we hadn't published an update of the book we wrote together since December 2021. The book I'm talking about is "Rector - The Power of Automated Refactoring". Two years have passed since we published the current version. Of course, we're all very busy, but no time for excuses - this is a book about keeping projects up-to-date with almost no effort... We are meant to set an example here!
In the meantime, a lot has changed. Tomas has become a very successful legacy-project-saver, with this incredibly powerful tool called Rector. The project has gained a lot of traction in the PHP development community. Tomas and the co-authors of the project keep improving the tool, making it more useful, developer-friendly, faster, and more stable every day. Recently he released version 1.0 on-stage at Laracon Europe, in Amsterdam. In important moment, which indicates that we are dealing with a mature tool.
I know Tomas as a hard worker. He'll do everything to Get Things Done. He keeps simplifying things, even in the Git project that's behind the book's manuscript. Always looking for ways to prevent developer (or writer) mistakes and to automate common tasks, he ruthlessly cuts away unnecessary weight. When he reads a convoluted paragraph that works around some quirk in Rector, he fixes the issue in Rector, so the text is once more easy to understand. If people ask the same questions over and over again, he adds a helpful command to Rector's command-line interface, so the question will disappear. In other words, he has a great idea for feedback. What kind of signal does the code give us? Is this too hard to work with? Can we simplify this? What kind of signal do we get from readers? Let's improve!
With a 1.0 version for Rector also comes a new version of the book about Rector. This 2024 Edition provides an even better start for your static analysis & automated refactoring journey. I can testify personally: once you start, you'll never want to go back.
Read more about the updates on Tomas' blog: Rector Book 2024 Release with Brand new Chapter
And get the new version here: https://leanpub.com/rector-the-power-of-automated-refactoring
If you already bought a previous version, you can download the latest files for free (of course!).
]]>Twitter and Mastodon are micro-blogging platforms. The problem with micro-blogs, and with short interactions in general, is that everybody can proceed to project onto your words whatever they like. So at some point I often feel the need to explain myself with more words, in an "actual" blog like this one.
Hypothesis: the moment a team adds the requirement that each PR/commit should be related to a Jira issue, it will start accumulating even more tech debt than before.
We notice that one of the dependencies of our project has been marked as "abandoned" and we need to upgrade/switch to something else. When helping a new co-worker join the team, we find out that some crucial steps aren't documented in the README. The test framework hasn't been updated for some time, and upgrading means we have to rewrite some test setup code.
These are things that just happen to our projects from time to time. Many developers won't go ahead and make the necessary changes. Being part of a "scrum process" they will:
Finally, once the issue has been assigned to a sprint, it needs to be refined. So a group of people that is often too large starts to talk about and describe what needs to be done. Why? Because scrum tradition prescribes that every person on the team should be able to pick up the issue. I don't think that's true at all; we're just wasting time explaining everything to everyone, while often only 2 people are going to pick it up.
In the end we have to vote for the number of story points that we're going to assign. This magic number has no specific meaning. If we try to describe what such a point represents, we get different answers, even within the same team:
Don't get me wrong, it's good to think about how much time something will likely take and use a rough estimate to decide if you want to start working on it. It's just that we are guessing, and we can have huge surprises while we are actually doing the work. Or the opposite happens: we are over-complicating things in the refinement stage and it turns out the actual work was so easy, we get 5 points for the price of 1...
In both cases, a big part of the preparation phase is just a waste of time and energy. I'm certain that many scrum teams could do much, much more if they would let go of wasteful practices like those that "official scrum" or similar project management techniques prescribe.
Back to the original point about technical debt. From time to time developers will notice something about the code base that really needs to be improved, something that is not part of any feature anybody is working on, just "general maintenance" or "developer experience", and so on. Developers need to do this work because they have to battle the forces pulling the project downward. If they do not continuously do that, one day the project will be beyond repair. Yet, the scrum process prescribes that no work be done in the sprint that is not on the board. So an issue has to be created, and we jump back into the project management waterfall.
At this point a developer have several options, each of which appeared several times in the comments to my bitter tweet:
I can't speak for everyone and every team, but in my experience, developers are even less likely to improve structural issues (technical debt) if they have to deal with this slow and demotivating process. They don't create those tech debt issues anymore, and the project is going to decline faster.
That's really sad, because most developers I've met are well able to do what's good for the project. They know what will help the project survive. Yet the process keeps them from doing it. So I propose just skipping the whole scrum waterfall in these cases.
Of course, not everything should go "under the radar". We need planning, discussions, exchanging ideas. We need to challenge solutions, adopt a domain-oriented collaboration style, and so on. But, there are also things we just have to do as developers, and if you ask me, we shouldn't always go through the process. These things just need to be done, sooner rather than later. So many projects have already accumulated so much technical debt that now is the time to act.
Some common objections I've heard to skipping the whole scrum process for technical debt-resolving tickets (rephrasing here):
These objections make sense to me personally:
In short, I think most developers know what's right, what needs to be done to keep the project in a good shape. Just let them do it. But not alone, and not in the shadows.
]]>From the cover of "Refactoring" by Martin Fowler:
Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which "too small to be worth doing". However the cumulative effect of each of these transformations is quite significant.
Although the word "refactoring" is used by programmers in many different ways (often it just means "changing" the code), in this case I'm thinking of those small behavior-preserving transformations. The essence of those transformations is:
We perform these refactorings to establish a better design, which in the end will look nothing like the current design. But we only take small, safe steps. Fowler mentions in the introduction of the book that tests are necessary to find out if a refactoring didn't break anything. Then the book goes on to show a large number of small behavior-preserving transformations. Most of those transformations are quite safe, and we can't think of a reason why they would go wrong. So they wouldn't really need tests as a safety net after all. Except, in practice, we do need tests because we run into problems, like:
I find that a static analysis tool like PHPStan will cover you when it comes to the first category of issues. For instance, PHPStan will report errors for incorrect code, or calls to methods that don't exist, etc.
The same goes for the second category, but there's a caveat. In order to make no mistakes here, we have to be aware of all the call sites/clients. This can be hard in applications where a lot of dynamic programming is used: a method is not called explicitly but dynamically, e.g. $controller->{$method}()
. In that case, PHPStan won't be able to warn you if you changed the method name that used to be invoked in this way. It's why I don't like methods and classes being dynamically resolved: because it makes refactoring harder and more dangerous. In some cases we may install a PHPStan extension or write our own, so PHPStan can dynamically resolve types that it could never derive from the code itself. But still, dynamic programming endangers your capability to safely refactor.
The third group of failures, where tests become the problem that keeps us from making simple refactorings, can be overcome to a large extent by writing better tests. Many unit tests that I've seen would count as tests that are too close to the current structure of the code. If you want to extract a class, or merge two classes, the unit test for the original class really gets in the way, because it still relies on that old class. We'd have to rewrite, rename, move tests, in order to keep track of the modified structure of the code. This is wasteful, and unnecessary: there are good ways to write so-called higher-level tests that aren't so susceptible to a modified code structure. When the structure changes, they don't immediately become useless or break because of the modified structure.
From my own experiences with refactoring-without-tests, I bet the fourth category is the worst and hardest to tackle. It often happens that you don't just make that structural change, but add some semi-related change to the same commit, one that turns out to change the behavior of the code. I've found that you really need to program in (at least) a pair to reduce the number of mistakes in this category. A navigator will check constantly: are we changing behavior here as well? Is this merely the structural changes we were here for? Examples of mistakes that I made that were more than the small behavior-preserving transformations that refactorings were intended to be:
I think as developers we've deviated a lot from the original idea behind refactoring. We've been hoping to improve things in many ways at once, trying to pay back all the technical debt. As a result, we don't limit ourselves to making only small structural transformations that would be safe to make, even without tests. And it gets us in this impossible state: we want to refactor but we need tests, but they are so hard to write for this bad code, we can never refactor, we can never test. All is lost.
If only we would have some good, high-level tests, and we'd stick to safe, structural transformations, and we'd all use PHPStan, then nothing could go wrong.
]]>I find that not every developer notices the "pain level" of a change. As an example, I consider it very painful if I can't rename a class, or change its namespace. One reason could be that some classes aren't auto-loaded with Composer, but are still manually loaded with require
statements. Another reason could be that the framework expects the class to have a certain name, be in a certain namespace, and so on. This may be something you personally don't consider painful, since you can avert the pain by simply not considering to rename or move classes.
Still, in the end, you know that a code base that resists change like this is going to be considered "a case of severe legacy code". That's because you can't resist change in the software development world, and eventually it will be time to make that change, and then you can experience the pain that you've been postponing for so long.
Software can resist change in many ways, here are just a few examples that come to mind:
session_start()
firstChange-aversion can also be socially established. As an example, the team may use a rule that says "If you create a class, you also have to create a unit test for it". Which is very bad, because you can use multiple classes in a test and still call it a unit-test, so the every-class-is-a-unit assumption is plain wrong. More importantly, you can't unit-test all types of classes; some will require integrated tests. Anyway, let's not get carried away ;) My point is, if you have such a rule you'll make it harder for developers to add a new class, since they fear the additional (often pointless) work of creating a test for it. In a sense, developers start to resist change. The code base itself will resist change as well, because unit tests are often too close to the implementation, making a change in the design really hard.
Unit tests are my favorite example, but there are other socially-established practices that get in the way of change. Like, "don't change this code, because 5 years ago Leo touched it and we had to work until midnight to fix production". Or "we asked the manager for some time to work on this, but we didn't get it".
From these, and many more - indeed - painful experiences, I have come to the conclusion that a very powerful way to judge the quality of code and design is to answer the question: is this easy to change? The change can be about a function name, the location of a file, installing a Composer dependency, injecting an additional constructor dependency, and so on.
However, it's sometimes really hard to perform this evaluation yourself, since as a long-time developer you may already be used to quite some "pain". You may be jumping through hoops to make a change, and not even realize that it's silly and should be much easier. This is where pair or mob/ensemble programming can be really useful: working together on the same computer will expose all the changes that you avoid:
"Well, I'm not sure that we can, let's save this for another time."
"Now let's inject that new service as a constructor argument."
That's why I usually go all-in on ensemble programming, so we can have a clear view on all the changes that the team averts. We look the monster in the eyes.
A partial reason for change aversion in developers is the risk that the change may break other things. If you rename a method, you should rename all the relevant method calls too. Luckily, we have static reflection these days, which will tell you about any call sites that you missed. And of course, the IDE can safely make the change for you in most cases. Unfortunately, this is not always the case. A lot of code in this world has the following issue: if you rename a file/class/method/etc., you won't be able to find all the places that you'd have to update. For instance:
EmailValidator
can be used by dynamically invoking the 'email'
validator). Renaming the class breaks the validation.public function [name]Action()
, or it can't be invoked. Renaming the class makes an entire route or endpoint unreachable.src/App/Entity
or it won't be added to the database schema.The danger is, you can't see that you broke something unless you run the full application (or its tests, if you have full coverage). This is very inconvenient. It's why the general rule is "Good code is easy to change", and part of "easy to change" is that a change that wasn't fully propagated through the system will "explode" early. Besides an early warning, preferably a static error (as opposed to a runtime error), it would be great if the error message we get after breaking something is clear and helps us find the problem (the change that we just made). This is by far not always the case in real-world projects!
One way to make your project safer-to-change is to look for these errors. When you make a change, and you learn (too late) about some issue it causes, make sure it can never happen again. As an example, recently I was working on a Symfony console command. I wanted to use the QuestionHelper
which can be retrieved by calling $this->getHelper('question')
. Being a dependency for one of my own services, I didn't want to get this helper on the spot, but I wanted to be properly set up as a dependency in the constructor, along with the other service dependencies. Unfortunately, the 'question'
helper is not available in the constructor, and you only learn this when you run the command in the terminal. To prevent the issue from happening again I added an integration test that verifies you can run the command and it will at least do something (get past the constructor). That way, the build will let me know when I accidentally broke the command.
Adding a test is a way of preventing a change-related issue at runtime. Maybe a better option is to prevent it analysis time, e.g. by writing a PHPStan rule that triggers an error whenever you call $this->getHelper()
in the constructor of a command:
/**
* @implements Rule<MethodCall>
*/
final class GetHelperRule implements Rule
{
public function getNodeType(): string
{
return MethodCall::class;
}
/**
* @param MethodCall $node
*/
public function processNode(Node $node, Scope $scope): array
{
// ...
if ($node->name->name !== 'getHelper') {
// This is not a call to getHelper()
return [];
}
// ...
if (! $scope->getFunction()->getDeclaringClass()
->isSubclassOf(Command::class)) {
// This is not a command class
return [];
}
if ($scope->getFunctionName() !== '__construct') {
// The call happens outside the constructor
return [];
}
return [
RuleErrorBuilder::message('getHelper() should not be called ...')
->build()
];
}
}
Now whenever someone makes the mistake of calling getHelper()
inside the constructor, they'll get this nice and useful error:
15/15 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%
------ -------------------------------------------------------------
Line src/PhpAstInspector/Console/InspectCommand.php
------ -------------------------------------------------------------
38 getHelper() should not be called in the constructor because
helpers have not been registered at that point
------ -------------------------------------------------------------
By the way, if you want to become a master at writing custom PHPStan rules that will make your code safer to change, get my latest book Recipes for Decoupling.
In short, keep an eye out for the following situations:
DateTimeImmutable
a primitive-type value too? If so, can we use this type inside DTOs?
I thought it was an interesting question to explore. I'd like to say "No" immediately, but why?
Is something deserving of a predicate? To decide, we have to define what the predicate means. Stated in an abstract way like this it makes a lot of sense, but when discussing concrete questions it's not often clear that we should talk about definitions; we often like to jump to an answer immediately! So, in this case, what's a primitive-type value?
First, we could consider "primitive" to mean "can't be further divided into parts". In that sense, when a Money
value object would consist of an integer for the number of cents, and a string declaring the currency, Money
is not primitive, but the integer and the string inside are, because it doesn't make sense to take them apart. Although you might say that something more primitive than a string is a character. It's just that PHP doesn't distinguish this type. DateTimeImmutable
in this sense is not primitive, the value that it contains (the timestamp) is.
Second, we could consider "primitive" to mean "what's not an object", or as PHP calls these values: scalars. Again, "primitive" takes into account what the programming language considers primitive, because Java for instance has strings which are considered primitive values but are nevertheless objects. Java has a weird relationship with primitive values anyway, because strings, integers, etc. look very primitive in Java code (i.e. you don't do new String("a string")
but just write "a string"
)). With PHP there's less confusion around this concept. When "primitive" is used in this sense, DateTimeImmutable
could never be considered a primitive-type value, because it's an object, but in Java it could be, because other primitive values are considered to be primitive regardless of them being an object.
Third, we could consider "primitive" to mean "whatever types the language offers out-of-the-box". This is often equivalent to "native". Unfortunately, this isn't a very helpful definition, since what's native is unclear. Is there a "core" part of the language that defines these types? In that case, where does DateTimeImmutable
belong? Isn't that part of an extension? Also, would we consider file handles (resources) primitive types? In the end, most of what is part of the "core" or "native" language is quite arbitrary.
Fourth, we could consider "primitive" to mean - regardless of the language - "what types do we need to describe data?" In that sense, we may look back in the history of humanity itself and consider numbers very primitive (e.g. for describing the value of something). Same for strings (e.g. for writing down a name). Arguably a date or a time isn't primitive, because it's built up from strings (or characters), and numbers.
Fifth, we could consider "primitive" to mean "bare values, not necessarily sensible or correct ones". So "2" is a primitive value, it doesn't say 2 of what, so we can't judge if the value is correct. "UKj" is a primitive value, it doesn't say what it describes, so there's no way to judge this value. Using this definition, a DateTimeImmutable
value is certainly not a primitive value because when you instantiate it, it processes the provided string constructor argument and throws an error if it is not a sensible one. Or, maybe worse, converts it into a value that does make sense, but may no longer match the intention of the actor that produced the value.
For me, this final point is the most important attribute of primitive-ness, which disqualifies DateTimeImmutable
as a primitive-type value. Anyway, we already established that DateTimeImmutable
can't be considered primitive according to the other definitions either.
Am I missing any possible definitions of "primitive" here? Just let me know!
]]>A DTO is an object that holds primitive data (strings, booleans, floats, nulls, arrays of these things). It defines the schema of this data by explicitly declaring the names of the fields and their types. It can only guarantee that all the data is there, simply by relying on the strictness of the programming language: if a constructor has a required parameter of type string
, you have to pass a string, or you can't even instantiate the object. However, a DTO does not provide any guarantee that the values actually make sense from a business perspective. Strings could be empty, integers could be negative, etc.
There are different flavours of the class design for DTOs:
/**
* @object-type DTO
*
* Using a constructor and public readonly properties:
*/
final class AnExample
{
public function __construct(
public readonly string $field,
// ...
) {
}
}
/**
* @object-type DTO
*
* Using a constructor with private readonly properties
* and public getters:
*/
final class AnotherExample
{
public function __construct(
private readonly string $field,
// ...
) {
}
public function field(): string
{
return $this->field;
}
}
Regarding the naming of a DTO: I recommend not adding "DTO" to the name itself. If you want to make it clear what the type is, add a comment, or an invented annotation (or attribute) like @object-type
. This will be very useful for developers that are not aware of these object types. It may trigger them to look up an article about what it means (this article, maybe :)).
A value object is an object that wraps one or more values or value objects. It guarantees that all the data is there, and also that the values make sense from a domain perspective. Strings will no longer be empty, numbers will be verified to be in the correct range. A value object can offer these guarantees by throwing exceptions inside the constructor, which is private, forcing the client to use one of the static, named constructors. This makes a value object easy to recognize, and clearly distinguishable from a DTO:
final class AnExample
{
private function __construct(
private string $value
) {
}
public static function fromValue(
string $value
): self {
/*
* Throw an exception when the value doesn't
* match all the expectations.
*/
return new self($value);
}
}
While a DTO just holds some data for you and provides a clear schema for this data, a value object also holds some data, but offers evidence that the data matches the expectations. When the value object's class is used as a parameter, property, or return type, you know that you are dealing with a correct value.
Meaning is defined by use. If we are using "DTO" and "value object" in the wrong way, their names will eventually get a different meaning. This might be how the confusion between the two terms arises in the first place.
A DTO should only be used in two places: where data enters the application or where it leaves the application. Some examples:
A value object is used wherever we want to verify that a value matches our expectations, and we don't want to verify it again. We also use it to accumulate behavior related to a particular value. E.g. if we have an EmailAddress
value object, we know that the value has been verified to look like a valid email address,so we don't have to check it again in other places. We can also add methods to the object that extract for instance the username, or the hostname, from the email address.
Value objects are often used in domain models because guarantees, or invariants, are an important part of their business. But they can be used anywhere in an application, since every part of the application will need ways to centralize some rules, provide evidence of correctness, and accumulate related behavior.
There's much more to say about value objects, but that was not the point of this article (if you want to read more, check out my book Object Design Style Guide, or Implementing Domain-Driven Design by Vaughn Vernon). The goal was to show most clearly the difference between DTOs and value objects and I hope they will no longer be confused. Here's a summary table:
A DTO:
A value object:
The tree of nodes is called Abstract Syntax Tree, and a successful PHPStan or Rector rule starts with selecting the right nodes from the tree and "subscribing" your rule to these nodes. A common approach for this is to start var_dump
-ing or echo
-ing nodes inside your new rule, but I've found this to be quite tedious. Which is why I've created a simple command-line tool that lets you inspect the nodes of any given PHP file.
The tool is called AST Inspector and is available on GitHub.
Install it with Composer:
composer require --dev matthiasnoback/php-ast-inspector
Then run:
vendor/bin/ast-inspect inspect [file.php]
You'll see something similar to this output:
You can navigate through the tree by going to the next or previous node, or jumping into the subnodes of the selected node. Navigation conveniently uses the a,s,d,w
keys.
Currently the project uses the PHP-Parser library for parsing. Since PHPStan adds additional virtual nodes to the AST, it will be useful to show them in this tool as well, but that requires some additional work. Another interesting addition would be to show the types that PHPStan derives for variables in the inspected code. That will also require some more work...
For now, please give this program a try, and let me know what you think! I'm happy to add more features to it, as long as it makes the learning curve for these amazing tools less steep. And if you're looking for an in-depth exploration of writing your own PHPStan or Rector rules, check out the documentation linked above or one of my books (Recipes for Decoupling, which shows how to create PHPStan rules, and Rector - The Power of Automated Refactoring, which does the same for Rector).
]]>How do we recognize the AR pattern? It's when you instantiate a model object and then call save()
on it:
$user = new User('Matthias');
$user->save();
In terms of simplicity as seen from the client's perspective, this is amazing. We can't imagine anything that would be easier to use. But let's take a look behind the scenes. If we'd create our own AR implementation, then the save()
function looks something like this:
final class User
{
public function __construct(
private string $name
) {
}
public function save(): void
{
// get the DB connection
$connection->execute(
'INSERT INTO users SET name = ?',
[
$this->name
]
);
}
}
In order for save()
to be able to do its work, we need to somehow inject the database connection object to the save()
, so it can run the necessary INSERT
SQL statement. Two options:
One, we let save()
fetch the connection:
use ServiceLocator\Database;
final class User
{
// ...
public function save(): void
{
$connection = Database::getConnection();
$connection->execute(
'INSERT INTO users SET name = ?',
[
$this->name
]
);
}
}
The practice of fetching dependencies is called service location, and it's often frowned upon, but for now this does the trick. However, the simplicity score goes down, since we have to import the service locator, and call a method on it (-2 points?).
The second option is to pass the connection somehow to the User
object. The wrong approach is this:
use ServiceLocator\Database;
final class User
{
// ...
public Connection $connection;
public function save(): void
{
$this->connection->execute(
// ...
);
}
}
That's because the burden of providing the Connection
is now on the call site where the User
is instantiated:
$user = new User();
$user->connection = /* ... get the connection */;
// ...
This would definitely cost points in the "ease-of-use" category. A better idea is to provide the connection in the framework's bootstrap code somehow:
final class User
{
// ...
public static Connection $connection;
public function save(): void
{
self::$connection->execute(
// ...
);
}
}
// Somewhere in the framework boot phase:
User::$connection = /* get the connection from the container */;
Because we don't want to do this setup step for every model class, and because we are likely doing similar things in the save()
function of every model, and because we want each of our model classes to have a save()
function anyway, every AR implementation will end up with a more generalized, reusable approach. The way to do that is to remove the specifics (e.g. the table and column names) and define a parent class that can do everything. This parent class defines a few abstract methods so the model is forced to fill in the details:
abstract class Model
{
public static Connection $connection;
abstract protected function tableName(): string;
/**
* @return array<string, string>
*/
abstract protected function dataToSave(): array;
public function save(): void
{
$dataToSave = $this->dataToSave();
$columnsAndValues = /* turn into column = ? */;
$values = /* values for parameter binding */;
$this->connection->execute(
'INSERT INTO ' . $this->tableName()
. ' SET ' . $columnsAndValues,
$values
);
}
}
// Pass the connection to all models at once:
Model::$connection = /* get the connection from the container */;
We should award ourselves several simplicity points in the area of reusability! The AR model class is now portable and useful in different contexts. We can use the same simple solution again and again. However, we also get a lot of negative points. Because we are introducing a parent class, and each model class has to provide a number of abstract methods, so the number of elements (functions, classes, etc.) as well as the number of lines of code (LoC) increase dramatically.
A certain risk of this Model
class is that it's going to have a lot of additional behavior that is not needed by all of the model classes that extend from Model
. The API of each model becomes very big, containing methods like find()
, delete()
, and for creating or loading related model objects in a dynamic way.
In fact, instead of implementing AR ourselves, it's more likely that we'll be importing a library that solves all of our current and future needs. To be honest, we already had (at least) one dependency for the DB's Connection
class, but now we add another one, which itself includes many more, making our solution drop many points on the simplicity scale.
Let's consider the data mapper pattern now. You recognize this pattern by a model object that is instantiated and then handed over to a service that persists the object for you. The service is often called "repository", mixing in the Repository pattern (also from PoEAA, but maybe more famous in the modeling space because of the Domain-Driven Design books by Eric Evans, Vaughn Vernon, and the likes):
final class UserRepository
{
public function __construct(
private Connection $connection
) {
}
public function save(User $user): void
{
$this->connection->executeQuery(
'INSERT INTO users SET name = ?',
[
// how does it get the name?
]
);
}
}
Since UserRepository
is a service, we don't have to worry about "getting" the database connection, it's there. So this class has no dependency on the service locator, but of course it does have a dependency on the Connection
class itself. So in terms of dependencies UserRepository::save()
is simpler than User::save()
. However, from the perspective of the client, it's less simple because a client can no longer call $user->save()
, but has to pass User
to UserRepository
:
// assuming $this->userRepository is a `UserRepository`:
$user = new User('Matthias');
$this->userRepository->save($user);
This means every client that wants to save a User
requires an additional dependency, so the overall number of points for dependency management may be equal for both AR and DA. However, I think we could make a strong case for putting a higher penalty on resorting to static service location, versus (constructor) dependency injection. We'll save that discussion for another article.
One thing to note is that User
does not extend from a base Model
class. In fact, it will never have to. It has no special abilities, or any methods at this point. We are free to make what we want of this object, which is why the Data Mapper pattern is naturally a better match for a domain model made with objects.
final class User
{
public function __construct(
private string $name
) {
}
}
Earlier I skipped one important step in the UserRepository::save()
implementation, which will get us into trouble now: how does the repository get the data out of the User
object in order to use in in the SQL INSERT
query? I wrote about this problem earlier in ORMless; a Memento-like pattern for object persistence, but let's repeat our options here:
User
for each property that needs to be persisted. This would widen the API way too much, exposing all the internal state to any client of User
.User
's private properties. This is what ORMs implementing DM will do, but it requires dynamic programming, leaving most of the "mapping" logic implicit, not type-safe, while fully breaking the object's encapsulation, allowing the model to keep absolutely nothing to itself.User
that exposes all of its persistable data at once, e.g. asDatabaseRecord(): array
.I think the last option makes the most sense, at least in our current example. We'll add one method to User
:
final class User
{
public function __construct(
private string $name
) {
}
/**
* @return array<string,string>
*/
public function asDatabaseRecord(): array
{
return [
'name' => $this->name
];
}
}
The repository uses this to build up the SQL query:
final class UserRepository
{
public function __construct(
private Connection $connection
) {
}
public function save(User $user): void
{
$data = $user->asDatabaseRecord();
$columnsAndValues = /* turn into column = ? */;
$values = /* values for parameter binding */;
$this->connection->executeQuery(
'INSERT INTO users SET ' . $columnsAndValues,
$values
);
}
}
When you approaching the model object with the DM pattern like this, there is no immediate need to extract common functionality into a package, or to introduce a third-party package to the project, besides the one that contains the Connection
class. If you still want to do that, the solution will become less simple again, loosing some simplicity points. Compared to AR, introducing a DM-support package doesn't change the API of the model class itself by adding a lot of methods (or code in general) to it that aren't needed, but it will certainly introduce what's known as accidental complexity. This is complexity you didn't want to deal with, but that you inherit because this DM-support package (ORM) has to solve any potential persistence-related problem, not just your problems.
In this article I've tried to analyze the "simplicity" of two competing patterns for object persistence: Active Record and Data Mapper. With regard to dependency management and ease of use, both had some positive and negative points, resulting in similar scores. However, AR introduces many more code elements than needed, mostly because it relies on inheritance to give the model the powers it needs to persist itself. When using DM, the model objects don't inherit anything. They are plain old objects. A complicating issue for DM is how to get the data out of the object, which is not a problem for AR. You can solve this with a simple state-exposing method as we've seen, but many projects may introduce an additional DM-support package, which complicates the solution a lot. In the end, importing an additonal ORM package for your persistence needs is what complicates both AR and DM solutions.
]]>Just kidding. But I do have some ideas about this.
There are several aspects of simplicity we'd have to consider. For instance, we could consider simple solutions in programming to mean several things:
These properties aren't exclusive. What we consider simple solutions often expose several of them at the same time. There may also be others that I've missed here (of course). Still, this list could help us judge the simplicity of a given solution. We might even be able to quantify this simplicity. For instance, extending from a parent class would subtract 5 points from the simplicity score, because it imports quite a bit of code, adds a class to your solution, makes it less easy to understand because you have to jump to the parent class and figure out how it merges its behavior with the subclass, and on top of that requires an additional library to be installed. Considering all the downsides, the penalty might be a bit higher ;)
In the next few articles I'll cover several areas of programming where there's a debate about the best way to do something. E.g. for a model, should we use active record, or the data mapper pattern? For dependency resolution, should we use a container, or a locator? And so on. We'll look at each of the solutions from the perspective of simplicity, and calculate some kind of score for the competing solutions. If I'm not mistaken, you should then be able to calculate your level of programming experience based on those simplicity scores ;)
]]>A long time ago I noticed that testing-advocate Chris Hartjes published his books on Leanpub. When I had the idea of writing a book about the Symfony framework, I tried this platform and it was a good match. You can write your book in Markdown, commit the manuscript to a GitHub repository, and push the button to publish a new version. Leanpub will generate EPUB and PDF versions for you. Readers can be updated about releases via email or a notification on the website.
While publishing multiple books like this, I kept running into these limitations:
.md
chapter files from your main Book.txt
file, but you can't include files from those .md
files. This makes it hard to create a nested folder structure for chapters, sections, and code samples.To overcome all these problems I created a pre-processor tool that allows me to do things that aren't possible with Leanpub's Markdown (or Markua, their variation on Markdown), but that eventually result in a manuscript directory that Leanpub can turn into an actual book. This means that nowadays I write in "Matthias-Flavored Markua" ;) And that I have a continously-integrated book-writing process which uses PHPUnit for tests, PHPStan for static analysis, Rector for automated refactoring, and ECS for the coding standard. The pre-processor tool is hand-written in PHP. Given that it has to process Markua, it contains a Markua parser which leverages the awesome Parsica library.
Some of the things the tool can do:
// skip-start
and // skip-end
to create an ellipsis (// ...
) or // crop-start
and // crop-end
to remove lines from the beginning and end of an included file..xcf
file to a .png
file that matches Leanpub's expectations..md
files from any other .md
file. In the end, everything will be compiled into a single book.md
file.vendor/
and reformat it according to the style used for the other code samples.For me, a new book always starts with an idea, which I immediately judge for its capability of becoming a book (as opposed to a blog post, or series of blog posts). In a sense, every idea can become a book, but it has to be interesting, I should be able to build on my own experience in the subject area, and at the same time there should be plenty to learn or explore, so the process of writing will be interesting for me as well. The result of this thinking is often a title. Strictly speaking it's a working title; the final title is often (slightly) different:
With a (working) title, you can set up a new project on Leanpub, and a new Git repository on GitHub. I usually start working on a cover first, looking for abstract stock images that somehow match the book's topic. The cover image tends to change later on, which makes sense. The personality of the book only becomes apparent after writing the biggest part of it. To match the personality the image needs to be modified.
For quite some time I write quietly, and I don't announce the book in public. I want to be certain of its viability before making any kind of "promise" about it. There's the 100 page milestone, at which point I like to share the landing page, that shows the cover image and a description, and allows interested readers to sign up to be notified about the release.
When most of the chapters are written (but not yet good enough to publish), I start revising the first chapters. In the case of "Recipes for Decoupling", my approach changed radically during the writing process. Revising the first chapters to match the style of the later chapters took quite a bit of time.
When the first chapters are good enough to be published, I push the button and release the first ~80 pages to the public. My partner still thinks it's weird to release an incomplete book, but I do it for several reasons:
At this point I also add a banner to my website, so visitors may notice that a new book is available.
To stay in the "release flow" I declare that I'll publish a new chapter every two weeks (or every week sometimes). Steadily releasing new chapters is great, but I guess it only works if there is some kind of linearity to the book. So far, I haven't felt the need to make big changes to chapters that have already been released. I feel like this is taken care of by the big revision phase before the release of the first few chapters. Still, the chapters have to be somewhat modularized, self-contained, and I'm always looking a natural progression from one topic to the next, where the next one seamlessly builds on top of the previous one.
Then, after many hours, the final version is released.
I kept track of all the hours that went into writing, revising, project management, etc. (except the hours for "A Year With Symfony" because I wasn't a freelancer back then):
This amounts to about 1 hour per page on average. That's pretty efficient, but it's not really hard to achieve this average because code samples take up a lot more space than regular paragraphs. How much time has to be spent also depends on how much research and trail and error is involved. If it's a subject I know a lot about, it won't take that much time. Comparing with the lifetime sales of these books, it amounts to a ~$50/hour income. Which is quite alright if you ask me, although development or training work pays more. Anyway, it doesn't really make sense to value a book by the calculated hourly income. Personally I like to earn a living with other work, thereby earning time to write books and do other things I like to do.
Beyond a certain number of pages/chapters I find that a book becomes almost unmanagable, i.e. too big for my brain to deal with. So keeping it below 300 pages is a good idea. For "Recipes for Decoupling" I even aimed for 150 pages, and it would've been nice to write a smaller book this time, but the topic didn't fit in such a small number of pages, so it's a ~300 page book again...
After releasing the e-book some work has to be done to make it ready for print. Preparing a book for print-on-demand via Lulu is quite easy. Leanpub has a built-in option for generating a print-ready PDF version. There used to be some problems with this PDF. Code snippets inside regular paragraphs weren't properly line-wrapped, which would lead to text being printed too far into the page margins. Nowadays, there are no such issues. Lulu accepts the format as-is. I use the common "Crown Quarto" format, which is supported by both Lulu and Leanpub. Most of the work goes into remaking the e-book cover into something that works for a book. Lulu provides a very useful template PDF for the cover that can be imported into Gimp and the likes. This template already takes into account the number of pages and the thickness of the paper.
Before you can release the print edition, you have to buy a copy of the book for yourself, to check if everything is good. After receiving it you can activate the book's landing page on Lulu.
I don't do much in the marketing area. Leanpub collects email addresses of interested readers that you can use only for the release notification. Of course, it would be useful if I could notify readers of previous books too, but I also think that there is a large overlap between my book audience and people following me on Twitter, or regularly reading my blog posts, so I'm pretty sure I'm able to reach my existing audience via these channels. This remains guesswork though.
I've heard from other writers that building up a mailing list works really well for them. To make it work you have to collect people's email addresses, offer them a regular email with some truly interesting things in it. As a reward you can notify thousands of people about a new book, which is likely resulting in some sales. Something about this has always felt a bit weird to me. The newsletter is not really a way to share useful information, it's a way to indirectly secure more income. Also, it uses the "spam" concept of targeting many, in the hope that a few will buy. This just isn't a good fit for me, although many have recommended this approach to me, and all books about self-publishing authors advocate it. Also, everybody does it. I hope it's clear that I'm not saying you shouldn't do it, just that I'd rather not do it. Although I know I miss out on some money by making this decision.
What does work well for me is the monthly Leanpub sale, which regularly offers my books with a 10-15% discount. Also setting up bundles of books that can be bought together with a big discount turned out to be a good idea.
I hope this article has given you some useful background information about my book-writing process, the workflow I use, and what tools are involved. To be honest, I don't think it should be taken as a tutorial for future tech book writers. It's just what I do and what works for me, and there are probably many more factors in successful book writing that have not been covered here. However, if you were thinking about writing a book, I can recommend taking a similar lean approach to book-writing. Just get your words out there!
Would you like to have more information? Please leave a comment.
]]>