On Apr 16, 2021, at 6:22 AM, Marvin Häuser <mhaeuser@posteo.de> wrote:

Good day,

Sorry for the nitpicking.

- Protocols always need a "Revision" field as first member. This is used to be able to expand its capabilities in later revisions without introducing a new, distinct protocol.
- Consider the name EFI_SIMPLE_AUDIO_OUTPUT(!)_PROTOCOL, to not cause confusion if input is ever added. Input in my opinion should be a separate protocol as there is no reason why they would necessarily be coupled topology-wise (think of an USB microphone, it will never have any sort of output).
- To make code safety a bit easier, try to use "CONST" for "IN" (non-OUT) pointers, so that CONST can be propagated where possible.
- Please do *not* make the events caller-owned. We had it multiple times already on production firmware that events are left dangling and may be polled/signaled after ExitBS(). The caller should be able to decide on some policy maybe (i.e. abort or block on ExitBS() until the playback finished), as cut-off audio may be awkward; but the callee definitely should implement "event safety" itself. Maybe avoid exposing events directly at all and provide nice abstractions the caller cannot misuse.
- I don't think audio should be required at all, the required subset should firstly consider minimalism and security. Accessibility will not be of concern for some IoT device, the audio code would simply eat space, and introduce a larger surface for bugs.

Marvin,

Generally how we work this in the UEFI Specification is we make it optional via the following wording: “If a platform includes the ability to play audio in EFI then the EFI_SIMPLE_AUDIO_OUTPUT_PROTOCOL must be implemented.

Basically this requirement will get added to UEFI Specification 2.6.2 Platform-Specific Elements.

Thanks,

Andrew Fish

Best regards,
Marvin

On 16.04.21 01:42, Ethin Probst wrote:
Hi Andrew,

What would that protocol interface look like if we utilized your idea?
With mine (though I need to add channel mapping as well), your
workflow for playing a stereo sound from left to right would probably
be something like this:
1) Encode the sound using a standard tool into a Wave PCM 16.
2) Place the Wave file in the Firmware Volume using a given UUID as
the name. As simple as editing the platform FDF file.
3) Write some BDS code
  a) Lookup Wave file by UUID and read it into memory.
  b) Decode the audio file (audio devices will not do this decoding
for you, you have to do that yourself).
  c) Call EFI_AUDIO_PROTOCOL.LoadBuffer(), passing in the sample rate
of your audio, EFI_AUDIO_PROTOCOL_SAMPLE_FORMAT_S16 for signed 16-bit
PCM audio, the channel mapping, the number of samples, and the samples
themselves.
  d) call EFI_BOOT_SERVICES.CreateEvent()/EFI_BOOT_SERVICES.CreateEventEx()
for a playback event to signal.
  e) call EFI_AUDIO_PROTOCOL.StartPlayback(), passing in the event you
just created.
The reason that LoadBuffer() takes so many parameters is because the
device does not know the audio that your passing in. If I'm given an
array of 16-bit audio samples, its impossible to know the parameters
(sample rate, sample format, channel mapping, etc.) from that alone.
Using your idea, though, my protocol could be greatly simplified.
Forcing a particular channel mapping, sample rate and sample format on
everyone would complicate application code. From an application point
of view, one would, with that type of protocol, need to do the
following:
1) Load an audio file in any audio file format from any storage mechanism.
2) Decode the audio file format to extract the samples and audio metadata.
3) Resample the (now decoded) audio samples and convert (quantize) the
audio samples into signed 16-bit PCM audio.
4) forward the samples onto the EFI audio protocol.
There is another option. (I'm happy we're discussing this now -- we
can hammer out all the details now which will make a lot of things
easier.) Since I'll most likely end up splitting the device-specific
interfaces to different audio protocols, we could make a simple audio
protocol that makes various assumptions about the audio samples your
giving it (e.g.: sample rate, format, ...). This would just allow
audio output and input in signed 16-bit PCM audio, as you've
suggested, and would be a simple and easy to use interface. Something
like:
typedef struct EFI_SIMPLE_AUDIO_PROTOCOL {
  EFI_SIMPLE_AUDIO_PROTOCOL_RESET Reset;
  EFI_SIMPLE_AUDIO_PROTOCOL_START Start;
  EFI_SIMPLE_AUDIO_PROTOCOL_STOP Stop;
} EFI_SIMPLE_AUDIO_PROTOCOL;
This way, users and driver developers have a simple audio protocol
they can implement if they like. It would assume signed 16-bit PCM
audio and mono channel mappings at 44100 Hz. Then, we can have an
advanced protocol for each device type (HDA, USB, VirtIO, ...) that
exposes all the knobs for sample formats, sample rates, that kind of
thing. Obviously, like the majority (if not all) UEFI protocols, these
advanced protocols would be optional. I think, however, that the
simple audio protocol should be a required protocol in all UEFI
implementations. But that might not be possible. So would this simpler
interface work as a starting point?

On 4/15/21, Andrew Fish <afish@apple.com> wrote:

On Apr 15, 2021, at 1:11 PM, Ethin Probst <harlydavidsen@gmail.com>
wrote:

Is there any necessity for audio input and output to be implemented
within the same protocol? Unlike a network device (which is
intrinsically bidirectional), it seems natural to conceptually separate
audio input from audio output.
Nope, there isn't a necessity to make them in one, they can be
separated into two.

The code controlling volume/mute may not have any access to the sample
buffer. The most natural implementation would seem to allow for a
platform to notice volume up/down keypresses and use those to control the
overall system volume, without any knowledge of which samples (if any)
are currently being played by other code in the system.
Your assuming that the audio device your implementing the
volume/muting has volume control and muting functionality within
itself, then.
Not really. We are assuming that audio hardware has a better understanding
of how that system implements volume than some generic EFI Code that is by
definition platform agnostic.

This may not be the case, and so we'd need to
effectively simulate it within the driver, which isn't too hard to do.
As an example, the VirtIO driver does not have a request type for
muting or for volume control (this would, most likely, be within the
VIRTIO_SND_R_PCM_SET_PARAMS request, see sec. 5.14.6.4.3). Therefore,
either the driver would have to simulate the request or return
EFI_UNSUPPORTED this instance.

So this is an example of above since the audio hardware knows it is routing
the sound output into another subsystem, and that subsystem controls the
volume. So the VirtIo Sound Driver know best how to bstract volume/mute for
this platform.

Consider also the point of view of the developer implementing a driver
for some other piece of audio hardware that happens not to support
precisely the same sample rates etc as VirtIO. It would be extremely
ugly to force all future hardware to pretend to have the same
capabilities as VirtIO just because the API was initially designed with
VirtIO in mind.
Precisely, but the brilliance of VirtIO
The brilliance of VirtIO is that it just needs to implement a generic device
driver for a given operating system. In most cases these operating systems
have sounds subsystems that manage sound and want fine granularity of
control on what is going on. So the drivers are implemented to maximizes
flexibility since the OS has lots of generic code that deals with sound, and
even user configurable knobs to control audio. In our case that extra layer
does not exist in EFI and the end user code just want to tell the driver do
some simple things.

Maybe it is easier to think about with an example. Lets say I want to play a
cool sound on every boot. What would be the workflow to make the happen.
1) Encode the sound using a standard tool into a Wave PCM 16.
2) Place the Wave file in the Firmware Volume using a given UUID as the
name. As simple as editing the platform FDF file.
3) Write some BDS code
  a) Lookup Wave file by UUID and read it into memory.
  b) Point the EFI Sound Protocol at the buffer with the Wave file
  c) Tell the EFI Sound Protocol to play the sound.

If you start adding in a lot of perimeters that work flow starts getting
really complicated really quickly.

is that the sample rate,
sample format, &c., do not have to all be supported by a VirtIO
device. Notice, also, how in my protocol proposal I noted that the
sample rates, at least, were "recommended," not "required." Should a
device not happen to support a sample rate or sample format, all it
needs to do is return EFI_INVALID_PARAMETER. Section 5.14.6.2.1
(VIRTIO_SND_R_JACK_GET_CONFIG) describes how a jack tells you what
sample rates it supports, channel mappings, &c.

I do understand how just using a predefined sample rate and sample
format might be a good idea, and if that's the best way, then that's
what we'll do. The protocol can always be revised at a later time if
necessary. I apologize if my stance seems obstinate.

I think if we add the version into the protocol and make sure we have a
separate load and play operation we could add a member to set the extra
perimeters if needed. There might also be some platform specific generic
tunables that might be interesting for a future member function.

Thanks,

Andrew Fish

Also, thank you, Laszlo, for your advice -- I hadn't considered that a
network driver would be another good way of figuring out how async
works in UEFI.

On 4/15/21, Andrew Fish <afish@apple.com> wrote:

On Apr 15, 2021, at 5:07 AM, Michael Brown <mcb30@ipxe.org> wrote:

On 15/04/2021 06:28, Ethin Probst wrote:
- I hoped to add recording in case we in future want to add
accessibility aids like speech recognition (that was one of the todo
tasks on the EDK2 tasks list)
Is there any necessity for audio input and output to be implemented
within
the same protocol? Unlike a network device (which is intrinsically
bidirectional), it seems natural to conceptually separate audio input
from
audio output.

- Muting and volume control could easily be added by just replacing
the sample buffer with silence and by multiplying all the samples by a
percentage.
The code controlling volume/mute may not have any access to the sample
buffer. The most natural implementation would seem to allow for a
platform to notice volume up/down keypresses and use those to control
the
overall system volume, without any knowledge of which samples (if any)
are
currently being played by other code in the system.

I’ve also thought of adding NVRAM variable that would let setup, UEFI
Shell,
or even the OS set the current volume, and Mute. This how it would be
consumed concept is why I proposed mute and volume being separate APIs.
The
volume up/down API in addition to fixed percentage might be overkill, but
it
does allow a non liner mapping to the volume up/down keys. You would be
surprised how picky audiophiles can be and it seems they like to file
Bugzillas.

- Finally, the reason I used enumerations for specifying parameters
like sample rate and stuff was that I was looking at this protocol
from a general UEFI applications point of view. VirtIO supports all of
the sample configurations listed in my gist, and it seems reasonable
to allow the application to control those parameters instead of
forcing a particular parameter configuration onto the developer.
Consider also the point of view of the developer implementing a driver
for
some other piece of audio hardware that happens not to support
precisely
the same sample rates etc as VirtIO. It would be extremely ugly to
force
all future hardware to pretend to have the same capabilities as VirtIO
just because the API was initially designed with VirtIO in mind.

As a developer on the other side of the API, writing code to play sound
files on an arbitrary unknown platform, I would prefer to simply
consume
as simple as possible an abstraction of an audio output protocol and
not
have to care about what hardware is actually implementing it.

It may make sense to have an API to load the buffer/stream and other APIs
to
play or pause. This could allow an optional API to configure how the
stream
is played back. If we add a version to the Protocol that would at least
future proof us.

We did get feedback that it is very common to speed up the auto playback
rates for accessibility. I’m not sure if that is practical with a simple
PCM
16 wave file with the firmware audio implementation. I guess that is
something we could investigate.

In terms of maybe adding text to speech there is an open source project
that
conceptually we could port to EFI. It would likely be a binary that
would
have to live on the EFI System Partition due to size. I was thinking
that
words per minute could be part of that API and it would produce a PCM 16
wave file that the audio protocol we are discussing could play.

Both of these argue in favour of defining a very simple API that
expresses
only a common baseline capability that is plausibly implementable for
every piece of audio hardware ever made.

Coupled with the relatively minimalistic requirements for boot-time
audio,
I'd probably suggest supporting only a single format for audio data,
with
a fixed sample rate (and possibly only mono output).

In my world the folks that work for Jony asked for a stereo boot bong to
transition from left to right :). This is not the CODEC you are looking
for
was our response…. I also did not mention that some languages are right
to
left, as the only thing worse than one complex thing is two complex
things
to implement.

As always: perfection is achieved, not when there is nothing more to
add,
but when there is nothing left to take away. :)

"Simplicity is the ultimate sophistication”

Thanks,

Andrew Fish

Thanks,

Michael

--
Signed,
Ethin D. Probst