Designing a JSON serializer

Posted on by Matthias Noback

Workshop utilities

For the workshops that I organize, I often need some "utilities" that will do the job, but are as simple as possible. Examples of such utilities are:

  • Something to dispatch events with.
  • Something to serialize and deserialize objects with.
  • Something to store and load objects with.
  • Something to use with event sourcing (an event store, an event-sourced repository).

I put several of these tools in a code base called php-workshop-tools. These utilities should be small, and very easy to understand and use. They will be different from the frameworks and libraries most workshop participants usually use, but they should offer more or less the same functionality. While designing these tools, I was constantly looking for the golden mean:

  • Make the tool generic, but don't support every imaginable use case.
  • Add support for proper dependency injection, but provide static/singleton/function-based helpers.

I have already described my thoughts about usability versus proper design of utility objects in another post: The case for singleton objects, fa├žades, and helper functions. In this post I'd like to look closer at some of the design considerations for the JSON serializer I eventually came up with.

The serializer utility class hasn't become part of the workshop tools code base. It lives in its own code base, as I thought it deserves its own project (with a few tweaks it might some day become really useful outside of my workshops). I called the serializer "naive serializer" as I believed it to be a bit dumb at first.

Use cases, requirements

In my applications I use serialization mainly to:

  1. Take some plain text structured data (e.g. JSON, XML) and transform it into some object that I can use in my application. Such an object has its own type and predefined properties, but it has no behavior. I use it to let data travel to a deeper layer in my application. In other words, such an object is a Data Transfer Object (DTO).
  2. Take some event objects and serialize them, in order to persist them in an event store or publish them to a queue.
  3. Take some view object (which is also a DTO) and serialize it as a response to some API query.

Considering only these use cases allowed me to keep the naive-serializer pretty simple. It doesn't need to have some of the features that other major serializers have, like:

  1. Using getters or setters to retrieve or update values.

    I don't perform (de)serialization directly from or to entities. That means, "domain invariants" won't have to be protected while deserializing, as this will be done properly by the domain objects themselves in a later stage (as if an anemic domain model could perform this kind of protection anyway). Data Transfer Objects don't have to "protect" themselves, so we can just copy the data right into it. I do recommend using a schema validator though, to make sure the plain text data you provide has the right structure.

  2. Resolving subclasses using discriminator maps, etc.

    This is a pretty invasive feature that comes with a lot of trouble in terms of code complexity. I think there are valid use cases for it, but I assume that they are not within the most common ones (at least not in my career).

  3. Support for custom "handlers", for \DateTime objects, etc.

    Other serializers have this feature because for one, PHP's built-in \DateTime[Immutable] objects can be serialized:

    {"date":"2017-07-11 08:14:31.379653","timezone_type":3,"timezone":"UTC"}
    

    But this string can not be deserialized by simply instantiating an empty instance of \DateTime[Immutable] and then repopulating the attributes.

    I believe this is a bit weird, but I also believe it can be easily circumvented (and should be anyway, in my opinion), by not using \DateTime[Immutable] as a primitive type in your domain model. I prefer to use a wrapper value object like:

    final class Timestamp
    {
        /**
         * @var string
         */
        private $timestamp;
    
        private function __construct(string $timestamp)
        {
            $this->timestamp = $timestamp;
        }
    
        public static function fromDateTimeImmutable(\DateTimeImmutable $timestamp): Timestamp
        {
            return new self($timestamp->format(\DateTime::ATOM));
        }
    
        public function asDateTimeImmutable(): \DateTimeImmutable
        {
            return \DateTimeImmutable::createFromFormat(\DateTime::ATOM, $this->timestamp);
        }
    
        public function __toString(): string
        {
            return $this->timestamp;
        }
    }
    

    This approach forces you to think of an internal value for the object that uniquely determines the value it represents, which I find beneficial to the design of the object itself.

    If you don't allow complicated values like \DateTime[Immutable], the (de)serialization algorithm becomes much simpler as you won't need to write or allow any custom handler code anymore.

  4. Custom configuration (e.g. annotations) to indicate the type of a property, like this:

    /**
     * @Serializer\Type("string")
     * @var string
     */
    private $foo;
    

    After dropping the support for custom handlers (see the previous point), it's now easy to limit the possible types for properties. In fact, we can limit the list of supported types to those already supported and recognized by PHP, or slightly broader, those used in @var and @return annotations recognized by PHPDocumentor. By the way, there is an accompanying library implementing type resolving for @var annotations, which turned out to be very useful for my own project: phpdocumentor/reflection-docblock.

    This allowed me to drop the need for extra configuration on top of existing @var annotations. In fact, I started relying on those, forcing users to add them, assuming many developers already do. Since we won't be able to serialize a value of type resource for example, this limits the list of supported property types to:

    • null
    • scalar (int, float, bool)
    • user-defined classes
    • arrays where every value is of the same type (maps or lists)
    • and any combination of the above

    If PHP ever comes with support for type declarations for properties, I assume that the need for relying on @var annotations will disappear.

  5. Custom configuration (e.g. annotations) to indicate which values should be included or excluded, like this:

    /**
     * @Serializer\Exclude("ALL")
     */
    class Foo
    {
        /**
         * @Serializer\Include
         */
        private $bar;
    }
    

    In practice this kind of configuration is often used when objects are a bit confused about the roles they play. It's the same for form validation groups by the way. Most often objects like these are either anemic domain entities, or they are fulfilling both command and query responsibilities. I recommend using different DTOs for different write and read-related use cases anyway, so I didn't want to make this part of the specification. If you find yourself wanting to skip some object-internal properties, like caches, etc. you can simply create another object, one that contains none of the tricky stuff, and serialize that.

  6. Custom configuration (e.g. annotations) to indicate how the name of a property should be converted to the name of a JSON object identifier, like this:

    /**
     * @Serializer\NamingStrategy("SnakeCase")
     */
    class Foo
    {
        /**
         * @Serializer\SerializedName("bazzzz")
         */
        private $bar;
    }
    

    In practice, such a feature is used for some kind of information hiding, where we don't want to expose the names of our properties. I think it's better to just rename the property anyway, or create a new object with the right property names after all. By the way, if you like "snake case", you can always name your properties in that style too. So the "naive serializer" simply uses the real property names and provides no transformation options.

Implementing the serializer

All of the reasoning that went into the project so far allowed me to define the following list of design guidelines:

  • Users shouldn't be forced to add custom configuration to their existing classes.
  • Users shouldn't need to write any supporting code.
  • The solution should take care of as few edge cases as possible.
  • The solution should be as small as possible, without becoming useless (<=100 LOC).
  • The solution should warn the user about its limitations using descriptive exceptions.

This list was pretty helpful as it really helped me to focus. Whenever I had to make some decision while writing the code, I could always use these guidelines to make the do the right thing.

I defined a class with all the cases I wanted to support:

final class SupportedCases
{
    /**
     * @var string
     */
    public $a;

    /**
     * @var int
     */
    public $b;

    /**
     * @var SupportedCases[]
     */
    public $c = [];

    /**
     * @var bool
     */
    public $d;

    /**
     * @var float
     */
    public $e;
}

Then I defined that the expected JSON output should be:

{
    "a": "a",
    "b":1,
    "c": [
        {
            "a": "a1",
            "b": 2,
            "c": [],
            "d": null,
            "e": null
        }
    ],
    "d": true,
    "e": 1.23
}

Deserializing the JSON data should result in an object equal to the one we just serialized and this of course is a perfect starting point for a unit test (maybe more like a component test). After I implemented the "happy path", I added some checks and assertions here and there to make the serializer fail in more explicit and developer-friendly ways.

Conclusion

I wanted to write about this project, not because I want you to use the serializer (I guess you shouldn't; something I didn't do is performance optimization for example). I simply wanted to describe how a bit of thinking can help drastically limit the scope of a project. Why are frameworks and libraries (for serialization, for forms, for persistence) so big and complicated? Because they want or need to support every imaginable use case. I find it very inspiring that the Doctrine team is currently dropping support for several features, in order to ease maintenance, and keep the library more focused. Removing features like detaching and merging entities will make the code much simpler. And removing support for Yaml configuration will prevent future bugs (apparently there has been a lot of trouble with it in the past).

I also wanted to describe some of the ways in which design issues on the user's side can lead to complicated feature requirements. Once you've fixed design issues like confused object roles, anemic domain models, lack of CQRS, etc. you won't need most of the special features which existing serializers are offering. You can keep the "happy path" of the code pretty clean, reducing code complexity. By effect, maintenance will be easier. In fact, you won't have too many users complaining about things that don't work. Particularly so if you throw clear exceptions when the library is used in an appropriate way.

Finally, I experienced again that posing some limitations can make you more creative. I have encountered this principle in several creative/artistic contexts before. In this case it was: aim for less than 100 lines of code (LOC). Of course, it code quality shouldn't be sacrificed for LOC. Also, you should always keep questioning the limitations themselves. But aiming for a small solution helped me cut a lot of waste from the code. Only after all the tests were green and the edge cases had been covered, I allowed myself some breathing space and expanded the code a bit to improve readability. The result is - I think - a pretty simple, clean, yet useful serializer.

PHP serializer Comments

How to make Sculpin skip certain sources

Posted on by Matthias Noback

Whenever I run the Sculpin generate command to generate a new version of the static website that is this blog, I notice there are a lot of useless files that get copied from the project's source/ directory to the project's output/ directory. All the files in the output/ directory will eventually get copied into a Docker image based on nginx (see also my blog series on Containerizing a static website with Docker). And since I'm on a hotel wifi now, I realized that now was the time to shave off any unnecessary weight from this Docker image.

My biggest mistake was not googling for the quickest way to skip certain sources from getting copied to output/. Instead, I set out to hook into Sculpin's event system. I thought it would be a good idea to create an event subscriber and make it subscribe to the Sculpin::EVENT_BEFORE_RUN event. Event subscribers for this event will receive a so-called SourceSetEvent, allowing them to mark certain sources as "should be skipped".

Sculpin is built on many Symfony components and it turned out to be quite easy to set up a traditional event subscriber, which I called SkipSources:

final class SkipSources implements EventSubscriberInterface
{
    /**
     * @var string[]
     */
    private $patterns = [];

    public function __construct(array $patterns)
    {
        $this->patterns = $patterns;
    }

    public function skipSourcesMatchingPattern(SourceSetEvent $event): void
    {
        // see below
    }

    public static function getSubscribedEvents(): array
    {
        return [
            Sculpin::EVENT_BEFORE_RUN => ['skipSourcesMatchingPattern']
        ];
    }
}

You can create your own Symfony-style bundles for a Sculpin project, but in this case defining a simple service in sculpin_kernel.yml seemed to me like a fine option too:

# in app/config/sculpin_kernel.yml

services:
    skip_sources:
        class: SculpinTools\SkipSources
        arguments:
            # more about this below 
            - ["components/*", "_css/*", "_js/*"]
        tags:
            - { name: kernel.event_subscriber }

Due to the presence of the kernel.event_subscriber tag Symfony will make sure to register this service for the events returned by its getSubscribedEvents() method.

Looking for a way to use glob-like patterns to filter out certain sources, I stumbled on the fnmatch() function. After that, the code for the skipSourcesMatchingPattern() method ended up being quite simple:

foreach ($event->allSources() as $source) {
    foreach ($this->patterns as $pattern) {
        if (fnmatch($pattern, $source->relativePathname())) {
            $source->setShouldBeSkipped();
        }
    }
}

It matches a source with each of the patterns based on the source's relative pathname, as nothing outside of the source/ directory is relevant anyway. The patterns themselves are passed in as the event subscriber's first constructor argument. It's simply a list of glob-like string patterns.

My solution turned out to be quite an effective way to mark certain files as "should be skipped", which was my goal.

La grande finale

Just like in my previous blog post, I finally ran into another possible solution, that's actually built in to Sculpin - a simple ignore configuration key allowing you to ignore certain sources using glob-like patterns. It does use a rather elaborate pattern matching utility based on code from Ant. Not sure if this library and fnmatch() have "feature parity" though.

Turns out, all my extra work wasn't required after all. A simple Google search would have sufficed!

So I removed all of this code and configuration from my project. But I still wanted to share my journey with you. And who knows, it could just be useful to have an example lying around of how to register an event subscriber and hook into Sculpin's build lifecycle...

PHP Sculpin Comments

Making a Docker image ready for use with Swarm Secrets

Posted on by Matthias Noback

Here's what I wanted to do:

  • Run the official redis image as a service in a cluster set up with Docker Swarm.
  • Configure a password to be used by connecting clients. See also Redis's AUTH command. The relevant command-line option when starting the redis server would be --requirepass.

This is just a quick post, sharing what I've figured out while trying to accomplish all of this. I hope it's useful in case you're looking for a way to make a container image (an official one or your own) ready to be used with Docker Secrets.

I started out with this docker-compose.yml configuration file, which I provided as an option when running docker stack deploy:

version: '3.1'

services:
    redis:
        image: redis
        command: redis-server --requirepass $(cat /run/secrets/db_password)
        secrets:
            - db_password

secrets:
    db_password:
        file: db_password.txt

This configuration defines the db_password secret, the (plain text) contents of which should be read from the db_password.txt file on the host machine. The (encrypted) secret will be stored inside the cluster. When the redis service gets launched on any node in the cluster, Docker shares the (decrypted) secret with the container, by means of mounting it as a file (i.e. /run/secrets/db_password) inside that container.

The naive solution above looked simple and I thought that it might just work. However, I got this error message:

Invalid interpolation format for "command" option in service "redis": "redis-server --requirepass $(cat /run/secrets/db_password)"

Docker Compose does variable substitution on commands and thinks that $(...) is invalid syntax (it's expecting ${...}). I escaped the '$' by adding another '$' in front of it: redis-server --requirepass $$(cat /run/secrets/db_password). New errors:

Reading the configuration file, at line 2
>>> 'requirepass "$(cat" "/run/secrets/db_password)"'
Bad directive or wrong number of arguments

Bad stuff. I thought I'd just have to wrap the values into quotes: redis-server --requirepass "$(cat /run/secrets/db_password)". Now, everything seemed to be fine, the Redis service was up and running, except that the password wasn't set to the contents of the db_password. Instead, when I tried to connect to the Redis server, the password seemed to have become literally "$(cat /run/secrets/db_password)"...

At this point I decided: let's not try to make this thing work from inside the docker-compose.yml file. Instead, let's define our own ENTRYPOINT script for a Docker image that is built on top of the existing official redis image. In this script we can simply read the contents of the db_password file and use it to build up the command.

The Dockerfile would look something like this:

FROM redis:3.2.9-alpine
COPY override_entrypoint.sh /usr/local/bin/
ENTRYPOINT ["override_entrypoint.sh"]

And the override_entrypoint.sh script mentioned in it could be something like this:

#!/usr/bin/env sh -eux

# Read the password from the password file
PASSWORD=$(cat ${REDIS_PASSWORD_FILE})

# Forward to the entrypoint script from the official redis image
docker-entrypoint.sh redis-server --requirepass "${PASSWORD}"

Building the image, tagging it, pushing it, and using it in my docker-compose.yml file, I could finally make this work.

I was almost about to conclude that it would be smart not to try and fix everything in docker-compose.yml and simply define a new image that solves my uses case perfectly. However, the advantage of being able to pull in an image as it is is quite big: I don't have to rebuild my images in case a new official image is released. This means I won't have to keep up with changes that make my own modifications break in some unexpected ways. Also, by adding my own entrypoint script, I'm ruining some of the logic in the existing entrypoint script. For example, with my new script it's impossible to run the Redis CLI.

La grande finale

Then I came across some other example of running a command, and I realized, maybe I've been using the wrong syntax for my command. After all, there already appeared to be some kind of problem with chopping up the command and escaping it in unexpected ways. So I tried the alternative, array-based syntax for commands:

command: ["redis-server", "--requirepass", "$$(cat /run/secrets/db_password)"]

No luck. However, the problem was again that the password was taken literally, instead of being evaluated. I remember there was an option to provide a shell command as an argument (using sh -c), to be evaluated just like you would pass in a string to eval(). This turned out to be the final solution:

command: ["sh", "-c", "redis-server --requirepass \"$$(cat /run/secrets/db_password)\""]

I hope this saves you some time, some day.

Docker Docker Swarm Comments