Refactoring the Cat API client - Part III

In the first and second part of this series we've been doing quite a bit of work to separate concerns that were once all combined into one function.

The major "players" in the field have been identified: there is an HttpClient and a Cache, used by different implementations of CatApi to form a testable, performing client of The Cat Api.

Representing data

We have been looking at behavior, and the overall structure of the code. But we didn't yet look at the data that is being passed around. Currently everything is a string, including the return value of CatApi::getRandomImage(). When calling the method on an instance we are "guaranteed" to retrieve a string. I say "guaranteed" since PHP would allow anything to be returned, an object, a resource, an array, etc. Though in the case of RealCatApi::getRandomImage() we can be sure that it is a string, because we explicitly cast the return value to a string, we can't be sure it will be useful to the caller of this function. It might be an empty string, or a string that doesn't contain a URL, like 'I am not a URL'.

class RealCatApi implements CatAPi
{
    ...

    /**
     * @return string URL of a random image
     */
    public function getRandomImage()
    {
        try {
            $responseXml = ...;
        } catch (HttpRequestFailed $exception) {
            ...
        }

        $responseElement = new \SimpleXMLElement($responseXml);

        return (string) $responseElement->data->images[0]->image->url;
    }
}

To make our code more robust and trustworthy, we can do a better job at making sure we return a proper value.

One thing we could do is verify the post-conditions of our method:

$responseElement = new \SimpleXMLElement($responseXml);

$url = (string) $responseElement->data->images[0]->image->url;

if (filter_var($url, FILTER_VALIDATE_URL) === false) {
    throw new \RuntimeException('The Cat Api did not return a valid URL');
}

return $url;

Though correct, this would be pretty bad for readability. It would become worse if there are multiple functions which all need the same kind of post-condition validations. We'd want to have a way to reuse this validation logic. In any case, the return value is still a meaningless string. It would be nice if we could preserve the knowledge that it's a URL. That way, any other part of the application that uses the return value of CatApi::getRandomImage() would be aware of the fact that it's a URL and would never mistake it for, say, an email address.

A value object representing a URL

Instead of writing post-conditions for CatApi::getRandomImage() implementations, we could write pre-conditions for image URLs. How can we make sure that a value like an image URL can't exist, except when it's valid? By turning it into an object and preventing it from being constructed using something that's not a valid URL:

class Url
{
    private $url;

    public function __construct($url)
    {
        if (!is_string($url)) {
            throw new \InvalidArgumentException('URL was expected to be a string');
        }

        if (filter_var($url, FILTER_VALIDATE_URL) === false) {
            throw new \RuntimeException('The provided URL is invalid');
        }

        $this->url = $url;
    }
}

This type of object is called a value object.

Creating it with invalid data is impossible:

new Url('I am not a valid URL');

// RuntimeException: "The provided URL is invalid"

So every instance of Url that you encounter can be trusted to represent a valid URL. We can now change the code of RealCatApi::getRandomImage() to return a Url instance:

$responseElement = new \SimpleXMLElement($responseXml);

$url = (string) $responseElement->data->images[0]->image->url;

return new Url($url);

Usually value objects have methods to create them from and convert them back to primitive types, in order to prepare them or reload them from persistence, or to reconstruct them based on different value objects. In this case we might end up with a fromString() factory method and a __toString() method. This paves the way for parallel construction methods (e.g. fromParts($scheme, $host, $path, ...)) and specialized "getters" (host(), isSecure(), etc.). Of course, you shouldn't implement these methods before you actually need them.

class Url
{
    private $url;

    private function __construct($url)
    {
        $this->url = $url;
    }

    public static function fromString($url)
    {
        if (!is_string($url)) {
            ...
        }

        ...

        return new self($url);
    }

    public function __toString()
    {
        return $this->url;
    }
}

Finally, we need to modify the contract of getRandomImage() and make sure the default image URL will also be returned as a Url object:

class RealCatApi implements CatAPi
{
    ...

    /**
     * @return Url URL of a random image
     */
    public function getRandomImage()
    {
        try {
            $responseXml = ...;
        } catch (HttpRequestFailed $exception) {
            return Url::fromString('https://cdn.my-cool-website.com/default.jpg');
        }

        $responseElement = new \SimpleXMLElement($responseXml);

        return Url::fromString((string) $responseElement->data->images[0]->image->url);
    }
}

Of course, this change should also be reflected in the Cache interface and any class implementing it, like FileCache, where you should convert from and to Url objects:

class FileCache implements Cache
{
    ...

    public function put(Url $url)
    {
        file_put_contents($this->cacheFilePath, (string) $url);
    }

    public function get()
    {
         return Url::fromString(file_get_contents($this->cacheFilePath));
    }
}

Parsing the XML response

One last part of the code I'd like to improve, is this:

$responseElement = new \SimpleXMLElement($responseXml);

$url = (string) $responseElement->data->images[0]->image->url;

I personally don't like SimpleXML, but that's not the problem here. What's really wrong is the fact that we're assuming we get a valid XML response, containing a root element, which contains one <data> element, which contains at least one <images> element, which contains one <image> element, which contains one <url> element, the value of which is supposed to be a string (the image URL). At any point in this chain, an assumption may be wrong, which might cause a PHP error.

We want to take control of this situation and instead of making PHP throw errors at us, define our own exceptions which we can catch and bypass if we want. Again, we should hide all the knowledge about the exact element names and the hierarchy of the response XML inside an object. That object should also handle any exceptional things. The first step is to introduce a simple DTO (data transfer object) which represents an "image" response from the Cat API.

class Image
{
    private $url;

    public function __construct($url)
    {
        $this->url = $url;
    }

    /**
     * @return string
     */
    public function url()
    {
        return $this->url;
    }
}

As you can see, this DTO hides the fact that there is also are <data>, <images>, <image>, etc. elements in the original response. We are only interested in the URL, and we can make it accessible as a simple getters (url()).

Now the code in getRandomImage() could become:

$responseElement = new \SimpleXMLElement($responseXml);
$image = new Image((string) $responseElement->data->images[0]->image->url);

$url = $image->url();

This doesn't help, since we're still stuck with that fragile piece XML traversal code.

Instead of creating the Image DTO inline, we'd be better off delegating it to a factory which will be given the knowledge about the expected XML hierarchy.

class ImageFromXmlResponseFactory
{
    public function fromResponse($response)
    {
        $responseElement = new \SimpleXMLElement($response);

        $url = (string) $responseElement->data->images[0]->image->url;

        return new Image($url);
    }
}

We only need to make sure to inject an instance of ImageFromXmlResponseFactory into RealCatApi and then we can reduce the code in RealCatApi::getRandomImage() to:

$image = $this->imageFactory->fromResponse($responseXml);

$url = $image->url();

return Url::fromString($url);

Moving code around like this gives us the opportunity to make something better of it. Small classes are easier to test. Like mentioned before, the tests we have in place for our classes at this point in general only test the "happy path". A lot of edge cases aren't covered by it. For example, what happens when:

The response body is empty
It contains invalid XML
It contains valid XML with an unexpected structure
...

Moving the XML processing logic to a different class allows you to focus completely on behavior surrounding that particular subject. It even allows you to use a true TDD style of programming, where you define the situations (like the ones in the previous list of edge cases) and the expected results.

Conclusion

This concludes the "Refactoring the Cat API client" series. I hope you liked it. If you have other suggestions for refactoring, leave a comment at the bottom of this page.

Javier Ferrer González

08/08/2015

Really awesome posts series. It really illustrate the SOLID principles really well with a very concise example. And the fact of seeing how the test changes because of that is another really good point :)

Congrats and thanks for sharing :)

Reply to this comment by email

Matthias Noback

11/08/2015

Thanks!

Robert Clark

20/07/2015

The cat API method provides the different function facility you are describing about it through this blog. Really, you are providing more information with programming codes. I want to learn
PHP development course from Best PHP development course training
institute in Jaipur so this information is more important for me. Nice post.

Sascha Schimke

19/07/2015

Hi Matthias, thanks for this series about refactoring. I like it very much.
But, why do you want to put the Url::fromString-stuff into Cache::get? So your cache is only able to hold urls. Seems a bit strange.

01/08/2015

I personally like to have Cache classes for very particular types of things (not just the generic Cache interfaces from Doctrine and the likes). That way, any specific conversion to and from the "raw" cache can be handled in more specific ways (not just a generic serialize() or something).

Luděk Benedík

17/07/2015

Hi Matthias, nice article. I have one question. Why you move URL validation from constructor to fromString() static method? If I don't use static method then I can put invalid URL to Url object.
Thanks

18/07/2015

There are multiple ways to do this. You can also do it like this:


private function __construct()
{
}

public static function fromString($url)
{
    // validate $url

    $url = new self();
    $url->url = $url;

    return $url;
}

Note that in the post itself, __construct() is private already, so it's impossible to construct it in other ways than using fromString().

mTorres

Little typo Mathias, when checking if valid $url:

if (!filter_var($url, FILTER_VALIDATE_URL) === false)

sure you meant:

if (filter_var($url, FILTER_VALIDATE_URL) === false)

or to be consistent with the previous if:

if (!filter_var($url, FILTER_VALIDATE_URL))

Thanks! I changed it to === false.

Xu Ding

A bit confused about DTO in this case.
I thought DTO is meant to transfer data between layers.

Not necessarily, it also helps to transform data coming from "the
world outside" (often plain text) into objects which conceptually
represent that data.

Matthias Noback - Blog About software development

Refactoring the Cat API client - Part III

Representing data

A value object representing a URL

Parsing the XML response

Conclusion

Related posts