Build infrastructure on Intel x86-64

Introduction

The current Apertis infrastructure is largely made of hosts based on the Intel x86-64 architecture, often using virtualized machines.

The only exceptions are:

  • OBS workers used to build packages natively for the ARM 32 bit and ARM 64 bit architectures
  • LAVA workers, which match the reference hardware platforms

While LAVA workers are by nature meant to be hosted separatedly from the rest of the infrastructure and are handled via geographically distributed LAVA dispatchers, the constraint on the OBS workers is problematic for adopters that want to host a downstream Apertis infrastructure.

Why hosting the whole build infrastructure on Intel x86-64

Being able to host the build infrastructure solely on Intel x86 64 bit (usually referred to as x86-64 or amd64) machines enables downstream Apertis to be hosted on standard public or private cloud solution as these usually only offer x86-64 machines.

Deploying the OBS workers on cloud providers would also allow for implementing elastic workload handling.

Elastic scaling and the desire to ensure that the cloud approach is tested and viable for dowstream mean that the deploying the approach described in this document is of interest for the main Apertis infrastructure, not just for downstreams.

Some cloud provider like Amazon Web Services have recently started offering ARM 64 bit servers as well so it should be always possible to adopt an hybrid approach mixing foreign builds on x86-64 and native ones on ARM machines.

In particular Apertis is currently committed to maintain native workers for all the supported architectures, aiming for a hybrid set up where foreign packages get built on a mix of native and non-native Intel x86 64 bit machines.

Downstreams will be able to opt for fully native, hybrid or Intel-only OBS worker setups.

Why OBS workers need a native environment

Development enviroment for embedded devices often rely on cross-compilation to build software targeting a foreign architecture from x86-64 build hosts.

However, pure cross-compilation prevents running the unit tests that are shipped with the projects being built, since the binaries produced do not match the current machine.

In addition, supporting cross-compilation across all the projects that compose a Linux distribution involves a considerable effort since not all build systems support cross-compilation, and where it is supported some features may still be incompatible with it.

From the point of view of upstream projects, cross-compilation is in general a less tested path, which often lead cross-building distributors to ship a considerable amount of patches adding fixes and workarounds.

For this reason all major package-based distributions like Fedora, Ubuntu, SUSE and in particular Debian, the upstream distribution from which Apertis sources most of its packages, choose to only officially support native compilation for their packages.

The Debian infrastructure thus hosts machines with different CPU architectures, since build workers must run hardware that matches the architecture of the binary package being built.

Apertis inherits this requirements, and currently has build workers with Intel 64 bit, ARM 32 and 64 bit CPUs.

CPU emulation

Using the right CPU is fortunately not the only way to execute programs for non-Intel architectures: the QEMU project provides the ability to emulate a multitude of platforms on a x86-64 machine.

QEMU offers two main modes:

  • system mode: emulates a full machine, including the CPU and a set of attached hardware devices;
  • user mode: translates CPU instructions on a running Linux system, running foreign binaries as they where native.

The system mode is useful when running entire operating systems, but it has a severe performance impact.

The user mode has a much lighter impact on performance as it only deals with translating the CPU instructions in a Linux executable, for instance running an ARMv7 ELF binary on top of the x86-64 kernel running on a x86-64 host.

Using emulation to target foreign architectures from x86-64

The build process on the OBS workers already involves setting up a chroot where the actual compilation happens. By combining it with the static variant of the QEMU user mode emulator it can be used to build software on a x86-64 host targeting a foreign architectures as it were a native build.

The binfmt_misc subsystem in the kernel can be used to make the emulation transparent so that emulation happens automatically and transparently when a foreign binary is executed. Packages can then be built for foreign architectures without any changes.

The emulation-based compilation is also known as Type 4 cross-build in the OBS documentation.

The following diagram shows how the OBS backend can distribute build jobs to its workers.

Each CPU instruction set is marked by the codename used by OBS:

  • x86_64: the Intel x86 64 bit ISA, also known as amd64 in Debian
  • armv7hl: the ARMv7 32 bit Hard Float ISA, also known as armhf in Debian
  • aarch64: the ARMv8 64 bit ISA, also known as arm64 in Debian

Particularly relevant here are the armv7hl jobs building ARMv7 32 bit packages that can be dispatched to:

  1. the native armv7hl worker machine;
  2. the aarch64 worker machine, which supports the ARMv7 32 bit ISA natively and thus can run binaries in armv7hl chroots natively;
  3. the x86_64 worker machine, which uses the qemu-arm-static binary translator to run binaries in armv7hl chroots via emulation.

It's worth nothing that some ARM 64 bit server systems do not support the ARMv7 32 bit ISA natively, and would thus require the same emulation-based approach used on the x86-64 machines to execute the ARM 32 bit jobs.

Mitigating the impact on performance

The most obvious way to handle the performance penalty is to use faster CPUs. Cloud providers offer a wide range of options for x86-64 machines, and establishing the appropriate cost/perfomance balance is the first step. It is possible that the performance of an emulated build on a fast x86-64 CPU may be comparable or even faster than a native build on a older ARMv7 machine.

In addition, compilation is often a largely parallel task:

  1. big software projects like WebKit are made of many compilation units that can be built in parallel
  2. during large scale rebuilds each package can be built in parallel

Even if some phases of the build process do not benefit from multiple cores, most of the time is spent on processing the compilation units which means that increasing the numbers of cores on the worker machines can effectively mitigate the slowdown due to emulation on large packages.

For large scale rebuilds, scaling the number of machines is already helpful, as the build process for each package is isolated from the others.

A different optimization would be to use some selected binaries for the native architecture during the qemu-linux-user emulation. For instance, a real cross-compiler can be injected in the build chroot and make it pretend to be the "native" compiler in the otherwise emulated environment.

This would give the best possible performance as the compilation is done with native x86-64 code, but care has to be taken to ensure that the cross-compiler can run reliably in the foreign chroot, and keeping the native and emulated versions synchronized can be challenging.

Risks

Limited maturity of the support for cross-builds in OBS

Support for injecting the QEMU static emulator in the OBS build chroot seems to be only well tested on RPM-based systems, and there may be some issues with the DEB-based approach used by Apertis.

A feasibility study was done by Collabora in the past demonstrating the viability of the approach, but some issues may need to be dealt with to deploy it at scale.

Versioning mismatches between emulated and injected native components

If native components are injected in the otherwise emulated cross-build environment to mitigate the impact on performance, particular care must be made to ensure that the versions match.

Impact of performance loss on timing-depended tests

Some unit tests shipped in upstream packages can be very sensitive to timing issues, failing on slower machines. If the performance impact is non-trivial, the emulated environment may be subject to the same failures.

However, this is not specific to the emulated environment: Apertis often faces this kind of issues where some tests that pass on the main Apertis infrastructure fail due to timing issues on the slower workers that downstream distributions may use.

To mitigate the impact on downstream distributors, the flaky tests usually get fixed or, if the effort required is too large, disabled.

Emulation bugs

The emulator may have bugs that may get triggered by the build process of some packages.

Since upstream distributors use native workers those issues may not be caught before the triggering package is built on the Apertis infrastructure.

Debugging this kind of issues is often not trivial.

Approach

These are the high level steps to be undertaken to be able to run the whole Apertis build infrastructure on x84-64 machines:

  • Set up an OBS test instance with a single x86-64 worker
  • Configure the test instance and worker for armhf and aarch64 emulated builds
  • Test a selected set of packages by building them for armhf and aarch64
  • Set up other x86-64 workers and test a rebuild of the whole archive, ensuring that all the packages can be build from using the emulated approach
  • Devise mitigations in case some packages fail to build in the emulated environment
  • Measure and evaluate performance impact comparing build times with those on the native workers currently in use in Apertis, to decide whether scaling the number of workers is sufficient to compensate the impact
  • Test mitigation approaches over a selected set of packages and evaluate the gains
  • Do another rebuild of the whole archive to ensure that the mitigations didn't introduce regressions
  • Refine and deploy the chosen mitigation approaches to, for instance, ensure that the injected native binaries are kept synchronized with the emulated ones they replace

There's a risk that no mitigation end up being effective on some packages so they keep failing in the emulated approach. In the short term those packages will be required to be built on the native workers in a hybrid set up, but they would be more problematic in a hypotetic downstream setup with no native workers as they can't be built there. In that case, pre-built binaries coming from an upstream with native workers will have to be injected in the archive.

Alternatively, it may be possible to mix type 3 and 4 crossbuilds by modifying the failing packages to make them buildable with a real cross-compiler. This solution requires a much higher maintenance cost as packages do not generally support being built in that way, but it may be an option to be able to do full builds on x86-64 in the few cases where emulation fails.

The results of the search are