Creating virtual pages with Sculpin

Posted on by Matthias Noback

Previously we looked at how to use the static site generator Sculpin to generate in-project documentation. When Sculpin builds the HTML files and assets it looks at what files are in the source/ directory and processes them based on certain rules (e.g. "parse Markdown files", "format HTML files with Twig", etc.). The purpose for my "living documentation" project is to also dynamically generate documentation based on PHP and configuration files from the main project. For example, consider extracting a Composer dependency diagram. I don't want to manually copy the diagram and an HTML template to the source/ directory. I want Sculpin to generate a current diagram on-demand and wrap it inside a template.

The Sculpin documentation mentions several ways to generate dynamic content for a static site, as opposed to the standard way of adding static Markdown or HTML files to source/:

  • You can use generators to "fabricate new virtual sources based on an existing concrete source". Generators are used by Sculpin to generate pages for tags and categories based on the actual posts, so you don't have to manually create those pages. Note that in our use case we don't have a "concrete source" yet (since our source is dynamic), so we can't use generators.
  • You can use custom types to "quickly create content types from any source based on path or other source meta data". However, custom types are about collections of source data, like posts, projects, events, etc. For our use case we have no existing sources to rely on, we just want to dynamically generate an entire page.

So, I dug into the source code of Sculpin to find out what would be a reasonable extension point. It turned out that the best way to approach this was to create a custom data source (which is currently not documented).

Creating a custom data source

To generate the site, you usually run sculpin generate --server. During the build process, Sculpin collects so-called data sources, which are implementations of Sculpin\Core\Source\DataSourceInterface:

interface DataSourceInterface
{
    public function dataSourceId();

    public function refresh(SourceSet $sourceSet);
}

It's a small interface and it's pretty easy to implement. dataSourceId() should just return some unique identifier. Sculpin uses this mainly for logging purposes.

The main method is refresh() and it receives a source set. Any source (i.e. file) that will be a part of the generated site must be added to this SourceSet as an instance of Sculpin\Core\Source\SourceInterface. For example:

public function refresh(SourceSet $sourceSet) {
    $source = new FileSource(
        new Analyzer(),
        $this,
        new SplFileInfo(
            // source file path:
            ...,
            // relative path of the target file (from the root of the output dir):
            ...,
            // relative path of the target file including the file name:
            ...
        ),
        true,
        true
    );

    $sourceSet->mergeSource($source);
}

Note that the source will be merged, meaning that if a source with the same ID already exists (derived by calling SourceInterface::sourceId()), it will be replaced by the new source. In the example above we add a FileResource, meaning that we have an actual physical file available, but you could imagine other types of sources that can be defined by implementing SourceInterface.

The catch is that it should be possible to call refresh() any number of times. If you run sculpin generate with the --watch flag, the refresh() method will in fact be called every second. So, whatever you do in refresh(), it should be fast and only do the work if necessary.

There are several ways to save Sculpin some work.

  1. Add a guard clause to the refresh() method, preventing it from being executed if not necessary:
public function refresh(SourceSet $sourceSet) {
    if (...) {
        // do nothing
        return;
    }

    // do the hard work
}
  1. Mark sources as "not changed". This will cause Sculpin to simply skip them instead of creating new files in the output directory of the project over and over again. Note that by default a source should be marked has "has changed", or else Sculpin will not copy the file at the first run of the sculpin generate command.
public function refresh(SourceSet $sourceSet) {
    // define the source
    $source = ...;

    if (/* the source has not changed */) {
        $source->setHasNotChanged();
    }

    $sourceSet->mergeSource($source);
}

Any source that you add to the SourceSet will be treated like any other file, so Markdown files will be parsed, HTML files will be formatted using Twig, etc.

Finally: how to make Sculpin aware of your new data source?

  1. If it does not exist yet, create the file app/config/sculpin_services.yml. This is a Symfony service definition file (Yaml notation).
  2. Define a new service with a tag sculpin.data_source:
services:
    custom_data_source:
        class: ... # the full class name
        tags:
            - { name: sculpin.data_source }

If you want to make your data source reusable across different Sculpin projects, you should turn it into a Symfony bundle. You would then have to create your own SculpinKernel (which is simple enough though), and register your bundle there.

PHP Sculpin documentation