From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mout01.posteo.de (mout01.posteo.de [185.67.36.65])
 by mx.groups.io with SMTP id smtpd.web08.10258.1618585425207882247
 for <devel@edk2.groups.io>;
 Fri, 16 Apr 2021 08:03:46 -0700
Authentication-Results: mx.groups.io;
 dkim=fail reason="body hash did not verify" header.i=@posteo.de header.s=2017 header.b=Y7EcQuyF;
 spf=pass (domain: posteo.de, ip: 185.67.36.65, mailfrom: mhaeuser@posteo.de)
Received: from submission (posteo.de [89.146.220.130]) 
	by mout01.posteo.de (Postfix) with ESMTPS id 6E40624002A
	for <devel@edk2.groups.io>; Fri, 16 Apr 2021 17:03:42 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.de; s=2017;
	t=1618585422; bh=Ue+VUgMU2rwwylZIcvgTAEDPFOqVHn3vZlU+rAZyalw=;
	h=Subject:To:Cc:From:Date:From;
	b=Y7EcQuyF5jQl/7v49W3KDCg6G7xtAf/0JpquzUo/ky6P0L7huH440cKUR9NdtIHaA
	 HOjqWoTUJUWqz6ouwtatltBltk5SKmFzMyx7gWKAwfS7qosBF+k5IFglTxOZHdHEST
	 6nzg0bxzMh1J931RrPpxlkVMNKPEfUf7tQ1nWIU5hFgK3dMiF8Fi3gk4GgsARqjlWW
	 6nVjLyf71Ek4mKuYovgyBLQa9IeYEMndw8u1GuA/UsDtFsz6vAcEKxFJr8K8nJ2kDd
	 A0bASRWKklMoC3HZI6J2k1vJqVSCdhxALPBnc9OMyzNcPIfSSrFPqMRlNXRkS7TSGo
	 8MKT3IFlRYaxg==
Received: from customer (localhost [127.0.0.1])
	by submission (posteo.de) with ESMTPSA id 4FMKGh2gJyz9rxL;
	Fri, 16 Apr 2021 17:03:40 +0200 (CEST)
Subject: Re: [edk2-devel] VirtIO Sound Driver (GSoC 2021)
To: devel@edk2.groups.io, afish@apple.com
Cc: Ethin Probst <harlydavidsen@gmail.com>, Michael Brown <mcb30@ipxe.org>,
 Mike Kinney <michael.d.kinney@intel.com>, Leif Lindholm <leif@nuviainc.com>,
 Laszlo Ersek <lersek@redhat.com>,
 "Desimone, Nathaniel L" <nathaniel.l.desimone@intel.com>,
 Rafael Rodrigues Machado <rafaelrodrigues.machado@gmail.com>,
 Gerd Hoffmann <kraxel@redhat.com>
References: <CAJQtwF2aOTztmMOW-QFHovdFkoQHZnPqPxgSbKd+HfqeumD2Fw@mail.gmail.com>
 <CAJQtwF0pcYzFHjNp5JWqNDecwcxuAM3_MbUkZ5FruBUGp=BSCw@mail.gmail.com>
 <A454E2B2-7569-443D-AADF-60384005BE47@apple.com>
 <CAJQtwF0wVsXvN3uHHHmWzHPTmh-Mkyei4mpc_bsaP21bMB9+PA@mail.gmail.com>
 <CO1PR11MB4929B44F6CC94FA661AA1055D24E9@CO1PR11MB4929.namprd11.prod.outlook.com>
 <B2260448-028D-4659-98D6-C695CF3D738A@apple.com>
 <CAJQtwF3-REoeETR46BKr0Q6=b6ZHALbeiBBFtp4fEdFxKz8gwA@mail.gmail.com>
 <4AEC1784-99AF-47EF-B7DD-77F91EA3D7E9@apple.com>
 <CAJQtwF2e4dACRVyibbLOmOEmy34xMdGb1s0YdPFcHmZ2NMoBDA@mail.gmail.com>
 <309cc5ca-2ecd-79dd-b183-eec0572ea982@ipxe.org>
 <A139650C-A76F-4471-AFCC-FFF1BE2E35BB@apple.com>
 <CAJQtwF3kuOD3C2arUfZu_xDbkHq5HHz+LYNB2=AeV8x+q_cPtw@mail.gmail.com>
 <CCB65CBC-304C-42B3-810D-0AC8BEAE29D1@apple.com>
 <CAJQtwF08Apihdsitw2Vs-+iV9rrqgpqkwONSbaY-yb_BP1xCYw@mail.gmail.com>
 <406e5bdb-f6aa-21ce-c96a-b16fb07c181d@posteo.de>
 <2296BE7E-ACEE-4286-9A5C-408B2D1ADC2E@apple.com>
From: =?UTF-8?B?TWFydmluIEjDpHVzZXI=?= <mhaeuser@posteo.de>
Message-ID: <22f0c186-b762-7790-92f6-e8347b73e26b@posteo.de>
Date: Fri, 16 Apr 2021 15:03:39 +0000
MIME-Version: 1.0
In-Reply-To: <2296BE7E-ACEE-4286-9A5C-408B2D1ADC2E@apple.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable

On 16.04.21 16:34, Andrew Fish via groups.io wrote:
>
>
>> On Apr 16, 2021, at 6:22 AM, Marvin H=C3=A4user <mhaeuser@posteo.de=20
>> <mailto:mhaeuser@posteo.de>> wrote:
>>
>> Good day,
>>
>> Sorry for the nitpicking.
>>
>> - Protocols always need a "Revision" field as first member. This is=20
>> used to be able to expand its capabilities in later revisions without=
=20
>> introducing a new, distinct protocol.
>> - Consider the name EFI_SIMPLE_AUDIO_OUTPUT(!)_PROTOCOL, to not cause=
=20
>> confusion if input is ever added. Input in my opinion should be a=20
>> separate protocol as there is no reason why they would necessarily be=
=20
>> coupled topology-wise (think of an USB microphone, it will never have=
=20
>> any sort of output).
>> - To make code safety a bit easier, try to use "CONST" for "IN"=20
>> (non-OUT) pointers, so that CONST can be propagated where possible.
>> - Please do *not* make the events caller-owned. We had it multiple=20
>> times already on production firmware that events are left dangling=20
>> and may be polled/signaled after ExitBS(). The caller should be able=20
>> to decide on some policy maybe (i.e. abort or block on ExitBS() until=
=20
>> the playback finished), as cut-off audio may be awkward; but the=20
>> callee definitely should implement "event safety" itself. Maybe avoid=
=20
>> exposing events directly at all and provide nice abstractions the=20
>> caller cannot misuse.
>> - I don't think audio should be required at all, the required subset=20
>> should firstly consider minimalism and security. Accessibility will=20
>> not be of concern for some IoT device, the audio code would simply=20
>> eat space, and introduce a larger surface for bugs.
>>
>
> Marvin,
>
> Generally how we work this in the UEFI Specification is we make it=20
> optional via the following wording: =E2=80=9CIf a platform includes the=
=20
> ability to play audio in EFI then the EFI_SIMPLE_AUDIO_OUTPUT_PROTOCOL=
=20
> must be implemented.

Yes, this sounds good. Just saying it should not be strictly mandatory=20
on all platforms. :) Thanks!

Best regards,
Marvin

>
> Basically this requirement will get added to UEFI Specification 2.6.2=20
> Platform-Specific Elements.
>
> Thanks,
>
> Andrew Fish
>
>> Best regards,
>> Marvin
>>
>> On 16.04.21 01:42, Ethin Probst wrote:
>>> Hi Andrew,
>>>
>>> What would that protocol interface look like if we utilized your idea?
>>> With mine (though I need to add channel mapping as well), your
>>> workflow for playing a stereo sound from left to right would probably
>>> be something like this:
>>> 1) Encode the sound using a standard tool into a Wave PCM 16.
>>> 2) Place the Wave file in the Firmware Volume using a given UUID as
>>> the name. As simple as editing the platform FDF file.
>>> 3) Write some BDS code
>>> a) Lookup Wave file by UUID and read it into memory.
>>> b) Decode the audio file (audio devices will not do this decoding
>>> for you, you have to do that yourself).
>>> c) Call EFI_AUDIO_PROTOCOL.LoadBuffer(), passing in the sample rate
>>> of your audio, EFI_AUDIO_PROTOCOL_SAMPLE_FORMAT_S16 for signed 16-bit
>>> PCM audio, the channel mapping, the number of samples, and the samples
>>> themselves.
>>> =C2=A0=C2=A0d) call=20
>>> EFI_BOOT_SERVICES.CreateEvent()/EFI_BOOT_SERVICES.CreateEventEx()
>>> for a playback event to signal.
>>> =C2=A0=C2=A0e) call EFI_AUDIO_PROTOCOL.StartPlayback(), passing in the=
 event you
>>> just created.
>>> The reason that LoadBuffer() takes so many parameters is because the
>>> device does not know the audio that your passing in. If I'm given an
>>> array of 16-bit audio samples, its impossible to know the parameters
>>> (sample rate, sample format, channel mapping, etc.) from that alone.
>>> Using your idea, though, my protocol could be greatly simplified.
>>> Forcing a particular channel mapping, sample rate and sample format on
>>> everyone would complicate application code. From an application point
>>> of view, one would, with that type of protocol, need to do the
>>> following:
>>> 1) Load an audio file in any audio file format from any storage=20
>>> mechanism.
>>> 2) Decode the audio file format to extract the samples and audio=20
>>> metadata.
>>> 3) Resample the (now decoded) audio samples and convert (quantize) the
>>> audio samples into signed 16-bit PCM audio.
>>> 4) forward the samples onto the EFI audio protocol.
>>> There is another option. (I'm happy we're discussing this now -- we
>>> can hammer out all the details now which will make a lot of things
>>> easier.) Since I'll most likely end up splitting the device-specific
>>> interfaces to different audio protocols, we could make a simple audio
>>> protocol that makes various assumptions about the audio samples your
>>> giving it (e.g.: sample rate, format, ...). This would just allow
>>> audio output and input in signed 16-bit PCM audio, as you've
>>> suggested, and would be a simple and easy to use interface. Something
>>> like:
>>> typedef struct EFI_SIMPLE_AUDIO_PROTOCOL {
>>> =C2=A0=C2=A0EFI_SIMPLE_AUDIO_PROTOCOL_RESET Reset;
>>> =C2=A0=C2=A0EFI_SIMPLE_AUDIO_PROTOCOL_START Start;
>>> =C2=A0=C2=A0EFI_SIMPLE_AUDIO_PROTOCOL_STOP Stop;
>>> } EFI_SIMPLE_AUDIO_PROTOCOL;
>>> This way, users and driver developers have a simple audio protocol
>>> they can implement if they like. It would assume signed 16-bit PCM
>>> audio and mono channel mappings at 44100 Hz. Then, we can have an
>>> advanced protocol for each device type (HDA, USB, VirtIO, ...) that
>>> exposes all the knobs for sample formats, sample rates, that kind of
>>> thing. Obviously, like the majority (if not all) UEFI protocols, these
>>> advanced protocols would be optional. I think, however, that the
>>> simple audio protocol should be a required protocol in all UEFI
>>> implementations. But that might not be possible. So would this simpler
>>> interface work as a starting point?
>>>
>>> On 4/15/21, Andrew Fish <afish@apple.com <mailto:afish@apple.com>>=20
>>> wrote:
>>>>
>>>>> On Apr 15, 2021, at 1:11 PM, Ethin Probst <harlydavidsen@gmail.com=
=20
>>>>> <mailto:harlydavidsen@gmail.com>>
>>>>> wrote:
>>>>>
>>>>>> Is there any necessity for audio input and output to be implemented
>>>>>> within the same protocol? =C2=A0Unlike a network device (which is
>>>>>> intrinsically bidirectional), it seems natural to conceptually=20
>>>>>> separate
>>>>>> audio input from audio output.
>>>>> Nope, there isn't a necessity to make them in one, they can be
>>>>> separated into two.
>>>>>
>>>>>> The code controlling volume/mute may not have any access to the=20
>>>>>> sample
>>>>>> buffer. =C2=A0The most natural implementation would seem to allow f=
or a
>>>>>> platform to notice volume up/down keypresses and use those to=20
>>>>>> control the
>>>>>> overall system volume, without any knowledge of which samples (if=
=20
>>>>>> any)
>>>>>> are currently being played by other code in the system.
>>>>> Your assuming that the audio device your implementing the
>>>>> volume/muting has volume control and muting functionality within
>>>>> itself, then.
>>>> Not really. We are assuming that audio hardware has a better=20
>>>> understanding
>>>> of how that system implements volume than some generic EFI Code=20
>>>> that is by
>>>> definition platform agnostic.
>>>>
>>>>> This may not be the case, and so we'd need to
>>>>> effectively simulate it within the driver, which isn't too hard to d=
o.
>>>>> As an example, the VirtIO driver does not have a request type for
>>>>> muting or for volume control (this would, most likely, be within the
>>>>> VIRTIO_SND_R_PCM_SET_PARAMS request, see sec. 5.14.6.4.3). Therefore=
,
>>>>> either the driver would have to simulate the request or return
>>>>> EFI_UNSUPPORTED this instance.
>>>>>
>>>> So this is an example of above since the audio hardware knows it is=
=20
>>>> routing
>>>> the sound output into another subsystem, and that subsystem=20
>>>> controls the
>>>> volume. So the VirtIo Sound Driver know best how to bstract=20
>>>> volume/mute for
>>>> this platform.
>>>>
>>>>>> Consider also the point of view of the developer implementing a=20
>>>>>> driver
>>>>>> for some other piece of audio hardware that happens not to support
>>>>>> precisely the same sample rates etc as VirtIO. =C2=A0It would be ex=
tremely
>>>>>> ugly to force all future hardware to pretend to have the same
>>>>>> capabilities as VirtIO just because the API was initially=20
>>>>>> designed with
>>>>>> VirtIO in mind.
>>>>> Precisely, but the brilliance of VirtIO
>>>> The brilliance of VirtIO is that it just needs to implement a=20
>>>> generic device
>>>> driver for a given operating system. In most cases these operating=20
>>>> systems
>>>> have sounds subsystems that manage sound and want fine granularity of
>>>> control on what is going on. So the drivers are implemented to=20
>>>> maximizes
>>>> flexibility since the OS has lots of generic code that deals with=20
>>>> sound, and
>>>> even user configurable knobs to control audio. In our case that=20
>>>> extra layer
>>>> does not exist in EFI and the end user code just want to tell the=20
>>>> driver do
>>>> some simple things.
>>>>
>>>> Maybe it is easier to think about with an example. Lets say I want=20
>>>> to play a
>>>> cool sound on every boot. What would be the workflow to make the=20
>>>> happen.
>>>> 1) Encode the sound using a standard tool into a Wave PCM 16.
>>>> 2) Place the Wave file in the Firmware Volume using a given UUID as t=
he
>>>> name. As simple as editing the platform FDF file.
>>>> 3) Write some BDS code
>>>> =C2=A0=C2=A0a) Lookup Wave file by UUID and read it into memory.
>>>> =C2=A0=C2=A0b) Point the EFI Sound Protocol at the buffer with the Wa=
ve file
>>>> =C2=A0=C2=A0c) Tell the EFI Sound Protocol to play the sound.
>>>>
>>>> If you start adding in a lot of perimeters that work flow starts=20
>>>> getting
>>>> really complicated really quickly.
>>>>
>>>>> is that the sample rate,
>>>>> sample format, &c., do not have to all be supported by a VirtIO
>>>>> device. Notice, also, how in my protocol proposal I noted that the
>>>>> sample rates, at least, were "recommended," not "required." Should a
>>>>> device not happen to support a sample rate or sample format, all it
>>>>> needs to do is return EFI_INVALID_PARAMETER. Section 5.14.6.2.1
>>>>> (VIRTIO_SND_R_JACK_GET_CONFIG) describes how a jack tells you what
>>>>> sample rates it supports, channel mappings, &c.
>>>>>
>>>>> I do understand how just using a predefined sample rate and sample
>>>>> format might be a good idea, and if that's the best way, then that's
>>>>> what we'll do. The protocol can always be revised at a later time if
>>>>> necessary. I apologize if my stance seems obstinate.
>>>>>
>>>> I think if we add the version into the protocol and make sure we have=
 a
>>>> separate load and play operation we could add a member to set the ext=
ra
>>>> perimeters if needed. There might also be some platform specific=20
>>>> generic
>>>> tunables that might be interesting for a future member function.
>>>>
>>>> Thanks,
>>>>
>>>> Andrew Fish
>>>>
>>>>> Also, thank you, Laszlo, for your advice -- I hadn't considered that=
 a
>>>>> network driver would be another good way of figuring out how async
>>>>> works in UEFI.
>>>>>
>>>>> On 4/15/21, Andrew Fish <afish@apple.com <mailto:afish@apple.com>>=
=20
>>>>> wrote:
>>>>>>
>>>>>>> On Apr 15, 2021, at 5:07 AM, Michael Brown <mcb30@ipxe.org=20
>>>>>>> <mailto:mcb30@ipxe.org>> wrote:
>>>>>>>
>>>>>>> On 15/04/2021 06:28, Ethin Probst wrote:
>>>>>>>> - I hoped to add recording in case we in future want to add
>>>>>>>> accessibility aids like speech recognition (that was one of the=
=20
>>>>>>>> todo
>>>>>>>> tasks on the EDK2 tasks list)
>>>>>>> Is there any necessity for audio input and output to be implemente=
d
>>>>>>> within
>>>>>>> the same protocol? =C2=A0Unlike a network device (which is intrins=
ically
>>>>>>> bidirectional), it seems natural to conceptually separate audio=20
>>>>>>> input
>>>>>>> from
>>>>>>> audio output.
>>>>>>>
>>>>>>>> - Muting and volume control could easily be added by just replaci=
ng
>>>>>>>> the sample buffer with silence and by multiplying all the=20
>>>>>>>> samples by a
>>>>>>>> percentage.
>>>>>>> The code controlling volume/mute may not have any access to the=20
>>>>>>> sample
>>>>>>> buffer. =C2=A0The most natural implementation would seem to allow =
for a
>>>>>>> platform to notice volume up/down keypresses and use those to=20
>>>>>>> control
>>>>>>> the
>>>>>>> overall system volume, without any knowledge of which samples=20
>>>>>>> (if any)
>>>>>>> are
>>>>>>> currently being played by other code in the system.
>>>>>>>
>>>>>> I=E2=80=99ve also thought of adding NVRAM variable that would let s=
etup, UEFI
>>>>>> Shell,
>>>>>> or even the OS set the current volume, and Mute. This how it would =
be
>>>>>> consumed concept is why I proposed mute and volume being separate=
=20
>>>>>> APIs.
>>>>>> The
>>>>>> volume up/down API in addition to fixed percentage might be=20
>>>>>> overkill, but
>>>>>> it
>>>>>> does allow a non liner mapping to the volume up/down keys. You=20
>>>>>> would be
>>>>>> surprised how picky audiophiles can be and it seems they like to fi=
le
>>>>>> Bugzillas.
>>>>>>
>>>>>>>> - Finally, the reason I used enumerations for specifying paramete=
rs
>>>>>>>> like sample rate and stuff was that I was looking at this protoco=
l
>>>>>>>> from a general UEFI applications point of view. VirtIO supports=
=20
>>>>>>>> all of
>>>>>>>> the sample configurations listed in my gist, and it seems=20
>>>>>>>> reasonable
>>>>>>>> to allow the application to control those parameters instead of
>>>>>>>> forcing a particular parameter configuration onto the developer.
>>>>>>> Consider also the point of view of the developer implementing a=20
>>>>>>> driver
>>>>>>> for
>>>>>>> some other piece of audio hardware that happens not to support
>>>>>>> precisely
>>>>>>> the same sample rates etc as VirtIO. =C2=A0It would be extremely u=
gly to
>>>>>>> force
>>>>>>> all future hardware to pretend to have the same capabilities as=20
>>>>>>> VirtIO
>>>>>>> just because the API was initially designed with VirtIO in mind.
>>>>>>>
>>>>>>> As a developer on the other side of the API, writing code to=20
>>>>>>> play sound
>>>>>>> files on an arbitrary unknown platform, I would prefer to simply
>>>>>>> consume
>>>>>>> as simple as possible an abstraction of an audio output protocol a=
nd
>>>>>>> not
>>>>>>> have to care about what hardware is actually implementing it.
>>>>>>>
>>>>>> It may make sense to have an API to load the buffer/stream and=20
>>>>>> other APIs
>>>>>> to
>>>>>> play or pause. This could allow an optional API to configure how th=
e
>>>>>> stream
>>>>>> is played back. If we add a version to the Protocol that would at=
=20
>>>>>> least
>>>>>> future proof us.
>>>>>>
>>>>>> We did get feedback that it is very common to speed up the auto=20
>>>>>> playback
>>>>>> rates for accessibility. I=E2=80=99m not sure if that is practical =
with a=20
>>>>>> simple
>>>>>> PCM
>>>>>> 16 wave file with the firmware audio implementation. I guess that i=
s
>>>>>> something we could investigate.
>>>>>>
>>>>>> In terms of maybe adding text to speech there is an open source=20
>>>>>> project
>>>>>> that
>>>>>> conceptually we could port to EFI. It would likely be a binary that
>>>>>> would
>>>>>> have to live on the EFI System Partition due to size. I was thinkin=
g
>>>>>> that
>>>>>> words per minute could be part of that API and it would produce a=
=20
>>>>>> PCM 16
>>>>>> wave file that the audio protocol we are discussing could play.
>>>>>>
>>>>>>> Both of these argue in favour of defining a very simple API that
>>>>>>> expresses
>>>>>>> only a common baseline capability that is plausibly=20
>>>>>>> implementable for
>>>>>>> every piece of audio hardware ever made.
>>>>>>>
>>>>>>> Coupled with the relatively minimalistic requirements for boot-tim=
e
>>>>>>> audio,
>>>>>>> I'd probably suggest supporting only a single format for audio dat=
a,
>>>>>>> with
>>>>>>> a fixed sample rate (and possibly only mono output).
>>>>>>>
>>>>>> In my world the folks that work for Jony asked for a stereo boot=20
>>>>>> bong to
>>>>>> transition from left to right :). This is not the CODEC you are=20
>>>>>> looking
>>>>>> for
>>>>>> was our response=E2=80=A6. I also did not mention that some languag=
es are=20
>>>>>> right
>>>>>> to
>>>>>> left, as the only thing worse than one complex thing is two complex
>>>>>> things
>>>>>> to implement.
>>>>>>
>>>>>>> As always: perfection is achieved, not when there is nothing more =
to
>>>>>>> add,
>>>>>>> but when there is nothing left to take away. =C2=A0:)
>>>>>>>
>>>>>> "Simplicity is the ultimate sophistication=E2=80=9D
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Andrew Fish
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Signed,
>>>>> Ethin D. Probst
>>>>
>>>
>>
>>
>>
>
>=20