Background

One of the main goals for Apertis is to provide teams the tools to support their products for long the lifecycles needed in many industries, from civil infrastructure to automotive.

This document discusses some of the challenges related to long-term support and how Apertis addresses them, with particular interest in reliably reproducing builds over a long time span.

Apertis addresses that need by providing stable release channels as a platform for products with a clear tradeoff between updateness and stability. Apertis encourages products to track these channels closely to deploy updates on a regular basis to ensure important fixes reach devices in a timely manner.

Stable release channels are supported for at least two years, and product teams have three quarters of overlap to rebase to the next release before the old one reaches end of life. Depending on the demand, Apertis may extend the support period for specific release channels.

However, for debugging purposes it is useful to be able to reproduce old builds as closely as possible. This document describes the approach chosen by Apertis to address this use case.

For our purposes bit-by-bit reproducibility is not a goal, but the aim is to be able to reproduce builds closely enough that one can reasonably expect that no regressions are introduced. For instance some non essential variations involve things like timestamps or items being listed differently in places where order is not significant, cause builds to not be bit-by-bit identical while the runtime behavior is not affected.

Apertis artifacts and release channels

As described in the Release flow and product lines document, at any given time Apertis has multiple active release channels to both provide a stable foundation for product teams and also give them full visibility on the latest developments.

Each release channel has its own artifacts, the main ones being the deployable images targeting the reference hardware platforms, which get built by mixing:

  • reproducible build environments
  • build recipes
  • packages
  • external artifacts

These inputs are also artifacts themselves in moderately complex ways:

  • build enviroments are built by mixing dedicated recipes and packages
  • packages are themselves built using dedicated reproducible build enviroments

However, the core principle for maintaining multiple concurrent release channels is that each channel should have its own set of inputs, so that changes in a channel do not impact other channels.

Even within channels sometimes it is desirable to reproduce a past build as closely as possible, for instance to deliver an hotfix to an existing product while minimizing the chance of introducing regressions due to unrelated changes. The Apertis goal of reliable, reproducible builds does not only help developers in their day-to-day activities, but also gives them the tools to address this specific use-case.

The first step is to ensure that all the inputs to the build pipeline are version-controlled, from the pipeline definition itself to the package repositories and to any external data.

To track which input got used during the build process the pipeline stores an identifier for each of them to uniquely identify them. For instance, the pipeline saves all the git commit hashes, Docker image hashes, and package versions in the output metadata.

While the pipeline defaults to using the latest version available in a specific channel for each input, it is possible to pin specific version to closely reproduce a past build using the identifiers saved in its metadata.

Reproducible build environments

A key challenge in the long term maintenance of a complex project is the ability to reproduce its build environment in a consistent way. Failing to do so means that undetected differences across build environments may introduce hard to debug issues or that builds may fail entirely depending on where/when they get triggered.

In some cases, losing access to the build environment effectively means that a project can't be maintained anymore, as no new build can be made.

To be able to avoid these issues as much as possible, Apertis makes heavy use of isolated containers based on Docker images.

All the Apertis build pipelines run in containers with minimal access to external resources to keep the impact of the environment as low as possible.

For the most critical components, even the container images themselves are created using Apertis resources, minimizing the reliance on any external service and artifact.

For instance, the apertis-v2020-image-builder container image provides the reproducible environment to run the pipelines building the reference image artifacts for the v2020 release, and the apertis-v2020-package-source-builder container image is used to convert the source code stored on GitLab in a format suitable for building on OBS.

Each version of each image is identified by a hash, and possibly by some tags. As an example the latest tag points to the image which gets used by default for new builds. However, it is possible to retrieve arbitrary old images by specifying the actual image hash, providing the ability to reliably reproduce arbitrarily old build environments.

By default the Docker registry where image are published keeps all the past versions, so every build environment can be reproduced exactly.

Unfortunately this comes with a significant cost from a storage point of view, so each team needs to evaluate the tradeoff that better fits their goals in the spectrum that goes from keeping all Docker images around for the whole lifespan of the product to more aggressive pruning policies involving the deletion of old images on the assumtion that changes in the build environment have a limited effect on the build and using an image version which is close to but not exactly the original one gives acceptable results.

To further make build environments more reproducible, care can be taken to make their own build process as reproducible as possible. The same concerns affecting the main build recipes affect the recipes for the Docker images, from storing pipelines in git, to relying only on snapshotted package archives, to taking extra care on third-party downloads, and the following sections address those concerns for both the build environments and the main build process.

Build recipes

The process to the reference images is described by textual, YAML-based Debos recipes stored in git repository, with a different branch for each release channel.

The textual, YAML-based GitLab-CI pipeline definitions then control how the recipes are invoked and combined.

Relying on git for the definition of the build pipelines make preserving old versions and tracking changes over time trivial.

Rebuilding the v2020 artifacts locally is then a matter of checking out the recipes in the apertis/v2020 branch and launching debos from a container based on the apertis-v2020-image-builder container image.

By forking the repository on GitLab the whole build pipeline can be reproduced easily with any desired customization under the control of the developer.

Packages and repositories

The large majority of the software components shipped in Apertis are packaged using the Debian packaging format, with the source code stored in GitLab that OBS uses to generate pre-built binaries to be published in a APT-compatible repository.

Separate git branches and OBS projects are used to track packages and versions across different parallel releases, see the Release flow and product lines document for more details.

For instance, for the v2020 stable release:

  • the apertis/v2020 git branch tracks the source revisions to be landed in the main OBS project
  • the apertis:v2020:{target,development,sdk} projects build the stable packages
  • the deb https://repositories.apertis.org/apertis/ v2020 target development sdk entry points apt to the published packages

For most of the time the stable channel is frozen and updates are exclusively delivered through the dedicated channels described below.

Updates are split between small security fixes with low chance of regressions and updates that also address important but non security-related issues which usually benefit from more testing.

For security updates:

  • the git branch is apertis/v2020-security
  • the OBS projects are apertis:v2020:security:{target,development,sdk}
  • deb https://repositories.apertis.org/apertis/ v2020-security target development sdk is the APT repository

Similarly, for the general updates:

  • the git branch is apertis/v2020-updates
  • the OBS projects are apertis:v2020:updates:{target,development,sdk}
  • deb https://repositories.apertis.org/apertis/ v2020-updates target development sdk is the APT repository

On a quarterly basis the stable channel get unfrozen and all the updates get rolled in it, while the security and updates channel get emptied.

This approach provides to downstreams and product teams a stable basis to build their product without hard to control changes. Products are recommended to also track the security channel for timely fixes, enabling product teams to easily identify and review the changes shipped through it.

The updates channel is not directly meant for production, but it offers to product teams a preview of the pending changes to let them proactively detect issues before they reach the stable channel and thus their products.

While the stability of the release channels is suitable for most use-cases, sometimes it is desirable to reproduce an old build as close to the original as possible, ignoring any update regardless of their importance.

To accomplish that goal the package archives are snapshotted regularly, storing their full history. The image build pipeline accepts an optional parameters to use a specific snapshot rather than the latest contents. This results in the execution installing exactly the same packages and versions as the original run, regardless of any changes that landed in the archive in the meantime.

To use a snapshot it is sufficient to change the APT mirror address, for instance going from https://repositories.apertis.org/apertis/ to https://repositories.apertis.org/apertis/20200305T132100Z and similarly for product-specific repositories.

Every time an update is published from OBS a snapshot is created, tracking the full history of each archive. More advanced use-cases can be addressed using the optional Aptly HTTP API.

External artifacts

While the packaging pipeline effectively forbids any reliance on external artifacts, the other pipelines in some case include components not under the previously mentioned systems to track per-release resources.

For instance, the recipes for the HMI-enabled images include a set of example media files retrieved from a multimedia-demo.tar.gz file hosted on an Apertis webserver.

Another example is given by the apertis-image-builder recipe checking out Debos directly from the master branch on GitHub.

In both cases, any change on the external resources impacts directly all the release channels when building the affected artifacts.

A minimal solution for multimedia-demo.tar.gz would be to put a version in its URL, so that recipes can be updated to download new versions without affecting older recipes. Even better, its contents could be put in a version tracking tool, for instance using the Git-LFS support available on GitLab.

In the Debos case it would be sufficient to encode in the recipe a specific revision to be checked out. A more robust solution would be to use the packaged version shipped in the Apertis repositories.

Main artifacts and metadata

Ther purpose of the previuosly described software items is to generate a set of artifacts, such as those described in the v2019 release artifacts document. With the artifacts themselves a few metadata entries are generated to help tracking what has been used during the build.

In particular, the pkglist files capture the full list of packages installed on each artifact along their version. The filelist files instead provide basic information about the actual files in each artifact.

With the information contained in the pkglist files it is possible to find the exact binary package version installed and from there find the corresponding commit for the sources stored on GitLab by looking at the matching git tag.

Other files capture other pieces of information that can be useful to reproduce builds. For instance the build-url point to the full log of the build where the recipe commit hash and the Docker image hash can be identified, however in the Implementation plan section a few improvements are described to make that information easier to retrieve and use.

Package builds

Package builds happen on OBS which does not have snapshotting capabilities and always builds every package on a clean, isolated environment built using the latest package versions for each channel.

Since the purposes taken in account in this document do not involve large scale package rebuilds, it is recommended to use the SDK images and the devroots in combination with the snapshotted APT archives to rebuild packages in a environment closely matching a past build.

Recommendations for product teams

Builds for production should:

  1. pick a specific stable channel (for instance, v2020)
  2. version control the build pipelines using branches specific to a stable channel
  3. in the build pipeline, use the latest Docker image for that specific channel, for instance v2020-image-builder or a product-specific downstream image based on that
  4. use the main OBS projects for the release channel, for instance apertis:v2020:target, with the security fixes from apertis:v2020:security:target layered on top
  5. store the product-specific packages in OBS projects targeting a specific release channel, layered on top of the projects mentioned in the previous point
  6. use the matching APT archives during the image build process
  7. deploy fixes from the stable channels as often as possible

Development builds are encouraged to also use the contents from the non-security updates (for instance, apertis:v2020:updates:target) to get a preview of non time-critical updates that will folded in the main archive on a quarterly basis.

The assumption is that products will use custom build pipelines tailored to the specific hardware and software needs of the product. However, product teams are strongly encouraged to reuse as much as possible from the reference Apertis build pipelines using the GitLab CI and Debos include mechanisms, and to follow the same best-practices about metadata tracking and build reproducibility described in this document.

Implementation plan

Snapshot the package archive

To ensure that build can be reproduced, it is fundamental to make the same contents available from the package archive.

The most common approach, also employed in Debian upstream, is to take snapshots of the archive contents so that subsequent builds can point to the snapshotted version and retrieve the exact package versions originally used.

To provide the needed server-side support, the archive manager need to be switched to the aptly archive manager as it provide explicit support for snapshots. The build recipes then need to be updated to capture the current snapshot version and to be able to optionally specify one when initiating the build.

Due to the way APT works, the increase in storage costs for the snapshot is small, as the duplication is limited to the index files, while the package contents are deduplicated.

Point to the recipe commit hash

The current metadata do not capture the exact commit hash of the recipe used for the build. This is a extremely important bit of information to reproduce the build, and can be captured easily.

Capture the Docker image hash

For full reproducibility it is recommended to use the exact image originally used, but to be able to do so the image hash needs to be stored in the metadata for the build.

Version control external artifacts

External artifacts like the sample multimedia files need to be versioned just like all the other components. Using Git-LFS and git tags would give fine control to the build recipe over what gets downloaded.

The package name and package version as captured in the pkglist files are sufficient to identify the exact sources used to generate the packages installed on each artifact, as they can be used to identify an exact commit.

However, the process can be further automated by providing explicit hyperlinks to the tagged revision on GitLab.

How to reproduce a release build and customize a package

Identify the recipe and build environment

  1. Open the folder containing the build artifacts, for instance https://images.apertis.org/release/v2021dev1/v2021dev1.0/
  2. Find the recipe-revision.txt metadata, for instance https://images.apertis.org/release/v2021dev1/v2021dev1.0/meta/recipe-revision.txt
  3. The recipe-revision.txt metadata file points to a specific commit in a specific git repository, for instance https://gitlab.apertis.org/infrastructure/apertis-image-recipes/commit/cf6bfb79ea3163465c529868bf333f83d40d2b1a
  4. The apt-snapshot.txt metadata file indicates the snapshot of the APT package archive used for the build
  5. The docker-image.txt reports the Docker image name and hash used for the build, for instance registry.gitlab.apertis.org/infrastructure/apertis-docker-images/v2021dev1-image-builder:cf381b5e78f2

Once all the input metadata are known, the build can be reproduced.

Reproduce the build

  1. On GitLab, fork the previously identified recipe repository
  2. In the forked repository on GitLab, create a new branch pointing to the commit identified in the steps above (for instance, cf6bfb79ea3163465c529868bf333f83d40d2b1a)
  3. Execute a CI pipeline on the newly created branch, specifying parameters for the exact Docker image revision and the APT snapshot identifier

When the pipeline completes, the produced artifacts should closely match the original ones, albeit not being bit-by-bit identical.

Customizing the build

On the newly created branch in the forked recipe repository, changes can be committed just like on the main repository.

For instance, to install a custom package:

  1. Check out the forked repository
  2. Edit the relevant ospack recipe to install the custom package, either by adding a custom APT archive in the /etc/apt/sources.list.d folder if available, or retrieving and installing it with wget and dpkg (small packages can even be committed as part of the repository to run quick experiments during development)
  3. Commit the results and push the branch
  4. Execute the pipeline as described in the previous section

The results of the search are