Audio management

Introduction

Apertis audio management is built around PulseAudio. Apertis applications use the PulseAudio API as the final layer to access sound features. This does not mean that applications have to deal directly with PulseAudio: applications can still make use of their preferred sound APIs as intermediate layers for manipulating audio streams. Those intermediate layers will in the end send the streams to PulseAudio using the dedicated plugin. For example, GStreamer applications can make use of pulsesink. ALSA applications can use the default ALSA sink: pcm_pulse.

In an analogous manner, applications can capture sound for various purposes. For instance, speech recognition or voice recorder applications may need to capture input from the microphone. The sound will be captured from PulseAudio. ALSA users can use pcm_pulse. GStreamer users can use pulsesrc.

Terminology and concepts

See also the Apertis glossary for background information on terminology.

Standalone setup

A standalone setup is an installation of Apertis which has full control of the audio driver. Apertis running in a virtual machine is an example of a standalone setup.

Hybrid setup

A hybrid setup is an installation of Apertis in which the audio driver is not fully controlled by Apertis. An example of this is when Apertis is running under an hypervisor or using an external audio router component such as GENIVI audio manager. In this case, the Apertis system can be referred to as Consumer Electronics domain (CE), and the other domain can be referred to as Automotive Domain (AD).

Different audio sources for each domain

Among others, a standalone Apertis system can generate the following sounds:

  • Application sounds
  • Bluetooth sounds, for example music streamed from a phone or voice call sent from a handsfree car kit
  • Any kind of other event sounds, for example somebody using the SDK can generate event sounds using an appropriate command line

A hybrid Apertis system can generate the same sounds as a standalone system, plus some additional sounds not always visible to Apertis. For example, hardware sources further down the audio pipeline such as:

  • FM Radio
  • CD Player
  • Driver assistance systems

In this case, some interfaces should be provided to interact with the additional sound sources.

Mixing, corking, ducking

Mixing is the action of playing simultaneously from several sound sources.

Corking is a request from PulseAudio to pause an application. See PulseAudio Corking.

Ducking is the action of lowering the volume of a background source, while mixing it with a foreground source at normal volume.

Use cases

The following section lists examples of usages requiring audio management. It is not an exhaustive list, unlimited combinations exists. Discussion points will be highlighted at the end of some use cases.

Application developer

An application developer uses the SDK in a virtual machine to develop an application. He needs to play sounds. He may also need to record sounds or test their application on a reference platform. This is a typical standalone setup.

Car audio system

In a car, Apertis is running in a hypervisor sharing the processor with a real time operating system controlling the car operations. Apertis is only used for applications and web browsing. A sophisticated Hi-Fi system in installed under a seat and accessible via a network interface. This is a hybrid setup.

Different types of sources

Some systems classify application sound sources in categories. It's important to note that no standard exists for those categories.

Both standalone and hybrid systems can generate different sound categories.

Example 1

In one system of interest, sounds are classified as main sources, and interrupt sources. Main sources will generally represent long duration sound sources. The most common case are media players, but it could be sound sources emanating from web radio, or games. As a rule of thumb, the following can be used: when two main sources are playing at the same time, neither is intelligible. Those will often require an action from the user to start playing, should it be turn ignition on, press a play button on the steering wheel or the touchscreen. As a consequence, only policy mechanisms ensure that only one main source can be heard at a time.

Interrupt sources will generally represent short duration sound sources, they are emitted when an unsolicited event occurs. This could be when a message is received in any application or email service.

Example 2

In another system of interest, sounds are classified as main sources, interrupt sources and chimes. Unlike the first example, in this system, a source is considered a main source if it is an infinite or loopable source, which can only be interrupted by another main source such FM radio or CD player. Interrupt sources are informational sources such as navigation instructions, and chimes are unsolicited events of short duration. Each of these sound sources is not necessarily generated by an application. It could come from a system service instead.

While some music from FM Radio is playing, a new navigation instruction has to be given to the driver: the navigation instructions should be mixed with the music.

Traffic bulletin

Many audio sources can be paused. For example, a CD player can be paused, as can media files played from local storage (including USB mass storage), and some network media such as Spotify.

While some music from one of these sources is playing, a new traffic bulletin is issued: the music could be paused and the traffic bulletin should be heard. When it is finished, the music can continue from the point where the playback was paused.

By their nature, some sound sources cannot be paused. For example, FM or DAB radio cannot be paused.

While some music from a FM or DAB radio is playing, a new traffic bulletin is issued. Because the music cannot be paused, it should be silenced and the traffic bulletin should be heard. When it is finished, the music can be heard again.

Bluetooth can be used when playing a game or watching live TV. As with the radio use-case, Bluetooth cannot be paused.

USB drive

While some music from the radio is playing, a new USB drive is inserted. If setting automatic playback from USB drive is enabled, the Radio sound stops and the USB playback begins.

Rear sensor sound

While some music from the radio is playing, the driver selects rear gear, the rear sensor sound can be heard mixed with the music.

Blind spot sensor

While some music from Bluetooth is playing, a car passes through the driver's blind spot: a short notification sound can be mixed with the music.

Seat belt

While some music from the CD drive is playing, the passenger removes their seat belt: a short alarm sound can be heard mixed with the music.

Phone call

While some music from the CD drive is playing, a phone call is received: the music should be paused to hear the phone ringing and being able to answer the conversation. In this case, another possibility could be to notify the phone call using a ring sound, mixed in the music, and then pause the music only if the call is answered.

Resume music

If music playback has been interrupted by a phone call and the phone call has ended, music playback can be resumed.

VoIP

The driver wishes to use internet telephony/VoIP without noticing any difference due to being in a car.

Emergency call priority

While a phone call to emergency services is ongoing, an app-bundle process attempts to initiate lower-priority audio playback, for example playing music. The lower-priority audio must not be heard. The application receives the information that it cannot play.

Mute

The user can press a mute hard-key. In this case, and according to OEM-specific rules, all sources of a specific category could be muted. For example, all main sources could be muted. The OEM might require that some sources are never muted even if the user pressed such a hard-key.

Audio recording

Some apps might want to initiate speech recognition. They need to capture input from a microphone.

Microphone mute

If the user presses a "mute microphone" button (sometimes referred to as a "secrecy" button) during a phone call, the sound coming from the microphone should be muted. If the user presses this button in an application during a video conference call, the sound coming from the microphone should be muted.

Application crash

The Internet Radio application is playing music. It encounters a problem and crashes. The audio manager should know that the application no longer exists. In an hybrid use case, the other audio routers could be informed that the audio route is now free. It is then possible to fall back to a default source.

ClutterGst Applications

The Apertis Multimedia specification recommends the use of GStreamer for writing multimedia applications, either directly or via ClutterGst actors. All of the above use-cases must be supportable when using GStreamer.

Web applications

Web applications should be able to play a stream or record a stream.

Control malicious application

An application should not be able to use an audio role for which it does not have permission. For example, a malicious application could try to simulate a phone call and deliver advertising.

Multiple roles

Some applications can receive both a standard media stream and traffic information.

External audio router

In order to decide priorities, an external audio router can be involved. In this case, Apertis would only be providing a subset of the possible audio streams, and an external audio router could take policy decisions, to which Apertis could only conform.

Non-use-cases

Automatic actions on streams

It is not the purpose of this document to discuss the action taken on a media when it is preempted by another media. Deciding whether to cork or silence a stream is a user interface decision. As such it is OEM dependent.

Streams' priorities

The audio management framework defined by this document is intended to provide mechanism, not policy: it does not impose a particular policy, but instead provides a mechanism by which OEMs can impose their chosen policies.

Multiple independent systems

Some luxury cars may have multiple IVI touchscreens and/or sound systems, sometimes referred to as multi-seat (please note that this jargon term comes from desktop computing, and one of these "seats" does not necessarily correspond to a space where a passenger could sit). We will assume that each of these "seats" is a separate container, virtual machine or physical device, running a distinct instance of the Apertis CE domain.

Requirements

Standalone operation

The audio manager must support standalone operation, in which it accesses audio hardware directly (Application developer).

Integrated operation

The audio manager must support integrated operation, in which it cannot access the audio hardware directly, but must instead send requests and audio streams to the hybrid system. (Different types of sources, External audio router).

Priority rules

It must be possible to implement OEM-specific priority rules, in which it is possible to consider one stream to be higher priority than another.

When a lower-priority stream is pre-empted by a higher-priority stream, it must be possible for the OEM-specific rules to choose between at least these actions:

  • silence the lower-priority stream, with a notification to the application so that it can pause or otherwise minimise its resource use (corking)
  • leave the lower-priority stream playing, possibly with reduced volume (ducking)
  • terminate the lower-priority stream altogether

It must be possible for the audio manager to lose the ability to play audio (audio resource deallocation). In this situation, the audio manager must notify the application with a meaningful error.

When an application attempts to play audio and the audio manager is unable to allocate a necessary audio resource (for example because a higher-priority stream is already playing), the audio manager must inform the application using an appropriate error message. (Emergency call priority)

Multiple sound outputs

The audio manager should be able to route sounds to several sound outputs. (Different types of sources).

Remember preempted source

It should be possible for an audio source that was preempted to be remembered in order to resume it after interruption. This is not a necessity for all types of streams. Some OEM-specific code could select those streams based on their roles. (Traffic bulletin, Resume music)

Audio recording

App-bundles must be able to record audio if given appropriate permission. (Audio recording)

Latency

The telephony latency must be as low as possible. The user must be able to hold a conversation on the phone or in a VoIP application without noticing any form of latency. (VoIP)

Security

The audio manager should not trust applications for managing audio. If some faulty or malicious application tries to play or record an audio stream for which permission wasn't granted, the proposed audio management design should not allow it. (Application crash, Control malicious application)

Muting output streams

During the time an audio source is preempted, the audio framework must notify the application that is providing it, so that the application can make an attempt to reduce its resource usage. For example, a DAB radio application might stop decoding the received DAB data. (Mute, Traffic bulletin)

Muting input streams

The audio framework should be able to mute capture streams. During that time, the audio framework must notify the application that are using it, so that the application can update user interface and reduce its resource usage. (Microphone mute)

Control source activity

Audio management should be able to set each audio source to the playing, stopped or paused state based on priority. (Resume music)

Per stream priority

We might want to mix and send multiple streams from one application to the automotive domain. An application might want to send different types of alert. For instance, a new message notification may have higher priority than 'some contact published a new photo'. (Multiple roles)

GStreamer and ClutterGst support

GStreamer and ClutterGst should be supported (ClutterGst Applications)

Approach

General design proposal

A PulseAudio audio management module listens to specific events such as stream creation, termination, or property changes and take some actions based on custom business rules. The main point to follow is that business rules should be implemented as a separate library. Applications provide information on their needs using streams' properties.

Distinguish standalone setup from hybrid setup

In this document, the hypothesis is made that configuration is different for both setups. There is a set of configuration files, one for standalone setup, and another one for hybrid setup. Each distinct OEM could have is own set of configuration files depending on its hardware setup. These configuration files would be static.

The configuration file are located at /etc/pulse/. They are listed below:

user@apertis:~$ ls /etc/pulse
client.conf  daemon.conf  default.pa  system.pa

/etc/pulse/default.pa is the one that interests us the most. It contains a set of commands to be executed when PulseAudio starts. More specifically, the configuration file can load an audio routing module.

The load-module command is the way to load a module. Parameters can be given to a module for a customized behavior:

load-module module-oss-mmap device="/dev/dsp" sink_name=output source_name=input

Additionally, this configuration file can trigger loading of automotive domain specific sinks and/or sources using their respective modules.

The automotive domain may also contain additional audio elements such as a mixer. This mixer is the responsibility of the automotive domain alone as long as the proper sink is used from the CE domain.

Separate library concerns

PulseAudio does not support compiling modules out-of-tree, which leads to a maintenance concern: anything done in PulseAudio will have to be a patch to the PulseAudio package itself, which must be maintained and updated for each new version of PulseAudio. The reason behind this is that PulseAudio modules rely on a private library which API isn't stable.

To mitigate this, it is recommended to minimize the amount of code in the PulseAudio module, and instead doing the business rules in a library that is linked into that module. The code that is integrated into PulseAudio should only provide the mechanism by which policies can be applied: all policies (business rules) should be elsewhere, so that a change to the business rules does not normally require patching PulseAudio. Of course, if the business rules require new functionality because they have new requirements that have not previously been considered, then modifying the PulseAudio patches might still be needed.

External audio router

OEMs might want to use an external audio router for policy management. For that, they can derive a custom library implementing their custom policies from the Apertis audio management library.

The OEM-specific audio manager library could reside in the OEM repository. It would still have to use the interface provided by the PulseAudio module.

Example integration in libcanterbury

A new library in the canterbury source tree can be introduced, libcanterbury-audio-manager, in a subdirectory canterbury/audio-manager/. This library will only be used by the PulseAudio module. It may #include headers from PulseAudio's public API libpulse, but not from the private (module) APIs libpulsecommon and libpulsecore. For example, it could define a class and some GInterfaces with contents similar to this pseudocode:


class CbyAudioManager: GObject
{
    CbyAudioManager *new(CbyAudioBackend *backend);

    void add_stream(CbyAudioStream *stream);
    void remove_stream(CbyAudioStream *stream);
}

interface CbyAudioStream: GObject
{
    const gchar *get_apparmor_label();
    guint32 get_user_id();

    /** @key: a PulseAudio property, for example PA_PROP_MEDIA_ROLE */
    const gchar *get_metadata(const gchar *key);
    ... any other metadata accessors that need to be added ...

    void set_muted(gboolean muted);
    void set_corked(gboolean corked);
    void set_ducked(gboolean ducked);
    void terminate(... error details of some sort ...);
}

interface CbySinkInput: CbyAudioStream
{
    void move_to_sink(gchar *sink);
}

The CbyAudioBackend would be implemented by the PulseAudio plugin, and by a mock implementation in Canterbury's tests.

interface CbyAudioBackend {
    /* Returns the unique identifier of the hardware sink when running standalone,
     * or NULL in an integrated system. */
    gchar *get_hardware_sink();

    /* Create a sink sending RTP to the given address, and return its unique
     * identifier. */
    gchar *create_rtp_sink(struct sockaddr *destination);

    /* ... and similar for sources, etc. ... */
}

Alternatively, those could be in libcanterbury-platform, with source code in canterbury/platform/. They must not be in libcanterbury, because they are not stable SDK-APIs.

Canterbury's automated tests should have a mock implementation of the two GInterfaces, which can be used to test the CbyAudioManager.

A new PulseAudio module can be introduced, module-canterbury, as an Apertis-specific patch, and PulseAudio configured to load it by default. This module defines a GObject class, PulseSinkInput, which implements CbyAudioStream and CbySinkInput, and a second GObject class, PulseSourceOutput, which implements only CbyAudioStream. These wrap a pa_sink_input and pa_source_output respectively. It also instantiates a single global CbyAudioManager.

The PulseAudio configuration file could also create statically the requested sink or sources, or leave that up to CbyAudioManager to create and report appropriate sinks and sources for the selected setup.

When a new stream appears, module-canterbury immediately routes it to a null source or sink (module-null-source, module-null-sink) and corks it, then calls cby_audio_manager_add_stream to tell the audio manager about it. The CbyAudioManager responds to that signal by doing any asynchronous processing that might be required, then behaving according to its business rules: it can call methods on the streams to retrieve metadata or control them, it can call into libcanterbury-platform to get a CbyProcessInfo or a CbyComponent based on the AppArmor label, and so on. If the stream is accepted by the business rules, the CbyAudioManager would normally move it to an appropriate RTP sink (if in the integrated system mode) or to the hardware sink (otherwise); if the stream is rejected by the business rules, the CbyAudioManager would normally close it.

When a stream closes, module-canterbury detaches it from the corresponding CbyAudioStream, then calls remove_stream passing that CbyAudioStream as an argument. The CbyAudioManager is expected to cancel any transactions that involved the stream wrapper, then unref the stream wrapper. After the stream has been removed, the stream wrapper's methods may fail.

The mechanism by which OEMs define business rules is OEM defined. libcanterbury-audio-manager could load configuration files or even load C plugins. For example, if an external audio router is involved, communicating with this audio router can be done through D-Bus, or use inter domain communication, or any other suitable means.

Using libpulse, an application can query information on the stream using the PulseAudio querying API. It provides access to the stream properties using pa_context_get_sink_input_info. A callback must be given to that function which in turn received a pa_sink_input_info structure. The proplist field of that structure contain the metadata store as a pa_proplist. libpulse API is intended for PulseAudio clients outside of PulseAudio process. The audio router library would instead use the GObject interfaces described above.

Diagram of the processes involved

From an implementation point of view, the policy engine implementation should reside in the Canterbury source (together with the logic managing app-bundles and their meta-data), while the actual mechanics of enforcing the policies is implemented in a PulseAudio module which should live in PulseAudio source tree and queries the Canterbury library. Which means from that sense the Audio Manager implementation is split between PulseAudio and Canterbury.

The process which is responsible for enforcing the actual audio management policies is PulseAudio, not Canterbury. The Audio Manager source code lives in the Canterbury source tree but once compiled it is executed in the context of the PulseAudio process.

Here is a diagram which explain that:

/usr/bin/pulseaudio      An audio app.      /usr/bin/canterbury
process                  process            process
/------------------\     /--------------\   /------------------\
|                  |     | /----------\ |   |                  |
|                  |     | | GStreamer| |   |                  |
|                  |     | \----^-----/ |   |                  |
|                  |     |      |       |   |                  |
| /------------\  Pulse  | /----V----\  |   |                  |
| | module-    <==Audio===> libpulse |  |   |                  |
| | canterbury |  protocol \---------/  |   |                  |
| \-----^------/   |     |              |   |                  |
|       | uses     |     | /----------\ |   |                  |
| /-----V--------\ |     | | other    | |   |                  |
| |libcanterbury-| |     | | platform <=D-Bus=> non audio      |
| |audio-manager | |     | | libraries| |   |  related things  |
| \-----^--------/ |     | \----------/ |   |                  |
|       | uses     |     |              |   |                  |
| /-----V--------\ |     |              |   | /--------------\ |
| |libcanterbury-| |     |              |   | |libcanterbury-| |
| |platform      | |     |              |   | |platform      | |
| \--------------/ |     |              |   | \--------------/ |
|                  |     |              |   |                  |
|                  |     |              |   |                  |
\------------------/     \--------------/   \------------------/

Streams metadata in applications

PulseAudio provides API to attach metadata to a stream. The function pa_stream_new_with_proplist is used to attach metadata to a stream during creation. The proplist parameter should be the metadata stored as a pa_proplist.

The current convention in usage is to use a metadata named media.role. It can be set to values describing the nature of the stream, such as "music", "phone" or any other relevant value. PulseAudio defines a C constant for this usage: PA_PROP_MEDIA_ROLE.

See also GStreamer support and ClutterGst support.

Requesting permission to use audio roles in applications

Each audio role is associated with a permission. Before an application can start playback a stream, the audio manager will check whether it has the permission to do so. See Identification of applications. Application bundle metadata describes how to manage the permissions requested by an application. The application can also use bundle metadata to store the default role used by all streams in the application if this is not specified at the stream level.

Audio routing principles

The request to open an audio route is emitted in two cases:

  • when a new stream is created
  • before a stream changes state from Paused to Playing (uncork)

In both cases, before starting playback, the audio manager must check the priority against the business rules, or request the appropriate priority to the external audio router. If the authorization is not granted, the application should stop trying to request the stream and notify the user that an undesirable event occured.

If an application stops playback, the audio manager will be informed. It will in turn notify the external audio router of the new situation, or handle it according to business rules.

An application that has playback can be requested to pause by the audio manager, for example if a higher priority sound must be heard.

Applications can use the PulseAudio event API to subscribe to events. In particular, applications can be notified about their mute status. pa_stream_get_index associates a server index (sink-input index) to the stream. pa_context_subscribe and pa_context_set_subscribe_callback can be used after that to register a callback on notifications. If an event occurs, such as mute or unmute, the callback will be executed. For example, an application playing media from a source such as a CD or USB storage would typically respond to the mute event by pausing playback, so that it can later resume from the same place. An application playing a live source such as on-air FM radio cannot pause in a way that can later be resumed from the same place, but would typically respond to the mute event by ceasing to decode the source, so that it does not waste CPU cycles by decoding audio that the user will not hear.

Standalone routing module maps streams metadata to priority

An internal priority module can be written. This module would associate a priority to all differents streams' metadata. It is loaded statically from the config file. See Routing data structure example for an example of data structure.

Hybrid routing module maps stream metadata to external audio router calls

In the hybrid setup, the audio routing functions could be implemented in a separate module that maps audio events to automotive domain calls. However this module does not perform the priority checks. Those are executed in the automotive domain because they can involve a different feature set.

An external library wraps calls to audio router

Audio routing is already subject to latency. It is not recommended to add additional latency by hiding audio routing protocols behind a D-Bus proxy. Note that this would add some latency on stream setup and teardown, not on each sample sent on the stream. Instead, if the policy module is implemented in a library, accessing audio routing features could be implemented there without impacting the PulseAudio build.

Identification of applications

The PulseAudio process must determine the identity of the connecting process. It should do so based on uid and AppArmor label.

If it used any identifier not derived from those, it would be possible for the connecting client to fake its identity, which is unacceptable as it would mean the wrong policy is used. It is the PulseAudio process that receives connections as socket file descriptors, so it is only the PulseAudio process that can use aa_getpeercon() or the equivalent lower-level system call getsockopt(..., SO_PEERSEC, ...) to determine the AppArmor label in a race-free way.

Using the native protocol, PulseAudio calls pa_native_protocol_connect upon a new client connection. This is a core PulseAudio API. It will fire PA_NATIVE_HOOK_CONNECTION_PUT after successful establishment of the new connection, however PulseAudio does not provide functions to access the client file descriptor in the hook. Hence some modification of PulseAudio itself is needed. Either by adding new APIs to access the needed data, or by testing for AppArmor before firing the hook.

Web application support

Web applications are just like any other application. However, the web engine JavaScript API does not provide a way to select the media role. All streams emanating from the same web application bundle would thus have the same role. Since each web application is running in its own process, AppArmor can be used to differentiate them. Web application support for corking depends on the underlying engine. WebKitGTK+ has the necessary support. See changeset 145811.

Implementation of priority within streams

By placing appropriate hooks in PulseAudio, the audio manager will be able to monitor and control streams individually. module-role-cork demonstrates a similar feature: When a new stream with a certain role is started, all other streams within a user defined list of roles are corked. The audio management plugin will use the same mechanisms as module-role-cork to implement the desired business rules.

Corking streams

Depending on the audio routing policy, audio streams might be "corked", "ducked" or simply silenced (moved to a null sink).

As long as the role is properly defined, the application developer does not have to worry about what happens to the stream except corking. Corking is part of PulseAudio API and can happen at any time. Corking should be supported by applications. It is even possible that a stream is corked before being started. See PulseAudio Corking for reference.

It is hence recommended to use the PulseAudio Asynchronous API.

A PulseAudio module can request a sink input to be corked using: PA_STREAM_EVENT_REQUEST_CORK, this event is sent to the application so that the application itself applies corking.

pacat.c contains a sample code for corking and uncorking.

If an application is not able to cork itself, the audio manager should enforce corking by muting the stream as soon as possible. However, this has the side effect that the stream between the corking request and the effective corking in the application will be lost. A threshold delay can be implemented to give an application enough time to cork itself. The policy of the external audio manager must also be considered: if this audio manager has already closed the audio route when notifying the user, then the data will already be discarded. If the audio manager synchronously requests pause, then the application can take appropriate time to shutdown.

Ensuring a process does not overrides its priorities

Additionally to request a stream to cork, a stream could be muted so any data still being received would be silenced.

GStreamer support

GStreamer support is straightforward. pulsesink support the stream-properties parameter. This parameter can be used to specify the media.role. The GStreamer pipeline states already changes from GST_STATE_PLAYING to GST_STATE_PAUSED when corking is requested.

See also GStPulseSink stream-properties.

ClutterGst support

ClutterGst is a frequently used API. When using ClutterGst with a user defined pipeline, the pulsesink element can receive customized properties directly and we fallback into the GStreamer case. When using the existing ClutterGstPlayer actor, the pipeline can be retrieved in order to listen to the pulsesink element creation. PulseAudio clients also support usage of the PULSE_PROP environment variable.

$ PULSE_PROP='media.role=music' paplay file.wav

See also PulseAudio application properties.

Remembering the previously playing stream

If a stream was playing and has been preempted, it may be desirable to switch back to this stream after the higher priority stream is terminated. To that effect, when a new stream start playing, a pointer to the stream that was currently playing (or an id) could be stored in a stack. The termination of a playing stream could restore playback of the previously suspended stream.

Using different sinks

A specific media.role metadata value should be associated to a priority and a target sink. This allows to implement requirements of a sink per stream category. For example, one sink for main streams and another sink for interrupt streams. In PulseAudio, the default behavior is to mix together all streams sent to the same sink.

Routing data structure example

The following table document routing data for defining a A-IVI inspired stream routing. This is an example, and in an OEM variant of Apertis it would be replaced with the business rules that would fulfill the OEM's requirements

App-bundle metadata defines whether the application is allowed to use this audio role, if not defined, the application is not allowed to use the role. From the role, priorities between stream could be defined as follows:

In a standalone setup:

role priority sink action
music 0 (lowest) main_sink cork
phone 7 (highest) main_sink cork
ringtone 7 (highest) alert_sink mix
customringtone 7 (highest) main_sink cork
new_email 1 alert_sink mix
traffic_info 6 alert_sink mix
gps 5 main_sink duck

In a hybrid setup, the priority would be expressed in a data understandable by the automotive domain. The action meaning would be only internal to CE domain. Since the CE domain do not know what is happening in the automotive domain.

role priority sink action
music MAIN_APP1 main_sink cork
phone MAIN_APP2 main_sink cork
ringtone MAIN_APP3 alert_sink mix
customringtone MAIN_APP3 main_sink cork
new_email ALERT1 alert_sink mix
traffic_info INFO1 alert_sink mix
gps INFO2 main_sink mix

Testability

The key point to keep in mind for testing is that several applications can execute in parallel and use PulseAudio APIs (and the library API) concurrently. The testing should try to replicate this. However testing possibilities are limited because the testing result depends on the audio policy.

Application developer testing

The application developer is requested to implement corking and error path. Testing those features will depend on the policy in use.

Having a way to identify the lowest and highest priority definition in the policy could be enough for the application developer. Starting a stream with the lowest priority would not succeed if a stream is already running. Starting a stream with the highest priority would cork all running streams.

PulseAudio command line tools could be enhanced to provide the options to cork a stream.

The developer may benefit from the possibility to customize the running policy.

Testing audio manager library

Testing the audio manager library can be done by an external audio manager library tool, that loads the library only and exercises the implementation of the library. It should both try to use reproducible patterns and try to push the library to the limits by exercising the API intensively and concurrently, testing error case, invalid case and explicitely sending corrupted data. This kind of testing should not depend on the audio policy. Testing that the audio policy responds with values in an appropriate range can be done, but checking each returned value is not feasible, unless the policy module can be provisioned with a dedicated test policy for which the results are already known. This tool could be useful to the OEM for its own audio manager library.

Testing the complete design

Testability of the complete design must be exercised from application level. It consist of emulating several applications each creating independent connections with different priorities, and verifying that the interactions are reliable. The policy module could be provisionned with a dedicated test policy for which the results are already known.

Requirements

This design fullfill the following requirements:

Suggested roadmap

The key element to implement here is the AppArmor support in PulseAudio. However, once an application is connected, and once PulseAudio knows that the application is allowed to use an audio role, the rest of PulseAudio behaves normally. The audio manager library can thus be implemented in an independent fashion. However, an implementation supporting only corking and mixing is sufficient. Given that the OEM needs to customize the audio manager library, the audio manager library should be made first.

The following plan can serve as a base:

  • Implement PulseAudio module core and library core building
  • Implement mock audio manager library listening and printing stream events
  • Implement static policy (lowest, normal, highest). It should use a separate source file to reduce rebase issues.

At this point the OEM could start deriving their library. In parallel, Apertis developers could continue the development as follows:

  • Implement automated tests
  • Implement PulseAudio AppArmor support
  • Implement PulseAudio reading policy for each role
  • Implement a generic policy engine that reads OEM-specific policies from a declarative text file, making it unnecessary for custom code to be written for the majority of OEM policies

Open questions

Roles

  • Do we need to define roles that the application developer can use?

    It's not possible to guarantee that an OEM's policies will not nullify an audio role that is included in Apertis. However, if we do not provide some roles, there is no hope of ever having an application designed for one system work gracefully on another.

  • Should we define roles for input?

    Probably, yes, speech recognition input could have a higher priority than phone call input. (Imagine the use case where someone is taking a call, is not currently talking on the call, and wants to change their navigation destination: they press the speech recognition hard-key, tell the navigation system to change destination, then input switches back to the phone call.)

  • Should we define one or several audio roles not requiring permission for use?

    No, it is explicitely recommended that every audio role requires permission. An app-store curator from the OEM could still give permission to every application requesting a role.

Policies

  • How can we ensure matching between the policy and application defined roles?

    Each permission in the permission set should be matched with a media role. The number of different permissions should be kept to a minimum.

  • Should applications start stream corked?

    It must be done on both the application side and the audio manager side. Applications cannot be trusted. As soon as a stream opens, the PulseAudio process must cork it - before the first sample comes out. Otherwise a malicious application could play undesirable sounds or noises while the audio manager is still thinking about what to do with that stream. The audio manager might be making this decision asynchronously, by asking for permission from the automotive domain. The audio manager can choose uncork, leave corked or kill, according to its policies. On the application side, it is only possible to suggest the best way for an application to behave in order to obtain the best user experience.

  • Should we use media.role or define an apertis specific stream property?

Summary of recommendations

  • A PulseAudio module must be implemented to load a library with a minimal interface.
  • That library source can be stored in the canterbury source tree.
  • An initial implementation of that library implements a priority table.
  • Each OEM must derive from that library to implement their business rules.
  • Static sets of configuration files can load different modules depending on hybrid setup or standalone setup.
  • PulseAudio must be modified to check for AppArmor identity of client applications. It will also check that the application have the permission to use the requested role. Additionnaly, if the media.role is not provided in the stream, PulseAudio must check if a default value is provided in Application bundle metadata.
  • Application bundle metadata contains a default audio role for all streams within an application.
  • Application bundle metadata must contain a permission request for each audio role in use in an application.
  • For each stream, an application can choose an audio role and communicate it to PulseAudio at stream creation.
  • The audio manager library monitors creation and state changes of streams.
  • Depending on business rules, the audio manager library can request an application to cork or mute.
  • Application should use the PulseAudio asynchronous API to get the finest control possibilities such as being informed of their mute status.
  • GStreamer's pulsesink support a stream.properties parameter.
  • ClutterGst applications should retrieve the pipeline to add media.role to the pulsesink.
  • Web applications must specify their media.role using Application bundle metadata.
  • A tool for corking a stream could be implemented.

The results of the search are