From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52])
 by mx.groups.io with SMTP id smtpd.web08.2132.1618530136017908427
 for <devel@edk2.groups.io>;
 Thu, 15 Apr 2021 16:42:16 -0700
Authentication-Results: mx.groups.io;
 dkim=pass header.i=@gmail.com header.s=20161025 header.b=EbfoqV9E;
 spf=pass (domain: gmail.com, ip: 209.85.218.52, mailfrom: harlydavidsen@gmail.com)
Received: by mail-ej1-f52.google.com with SMTP id r12so39430759ejr.5
        for <devel@edk2.groups.io>; Thu, 15 Apr 2021 16:42:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc:content-transfer-encoding;
        bh=Csf+o4rkGCHYW0UvJbHWO3hs+ST5gfUrBZqQ8aH8SHE=;
        b=EbfoqV9EYKp7bj5JbOQ83dVeL0CEjkUgISf7Et48KkbEkpb2d7ON7f+rdC6eVXS98K
         Ni4iinkcJVoFQL2W9gbYdjK5TqE893sWGYckRclWq7JIV8zTwwmL0fGHgl4KLgKosxgZ
         dEG1JaWdx7C25ygDbT7SxAUq9vqdgeFon3A5WzFT1FE8MTBYSGbd0MaEJ1ZUxM3Hgzj4
         KgJfK9T3KE6yI7UIX/568fYTZ190NMvH459T/DfZ2ycXFsuHEt4uv6IiVnorOC9kDBXt
         n5aqtjS6YXhYhqP8UR2S6ERlgumnVQTG6/GKtTyOYaTSOv6QZo+TYVs8sFHD9PL27et9
         fJJQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc:content-transfer-encoding;
        bh=Csf+o4rkGCHYW0UvJbHWO3hs+ST5gfUrBZqQ8aH8SHE=;
        b=q+aKmRDDLeQhvYVIRX0uSDtSbVLKYXMdmSbYyQv4hzL9xTBoQMaG2ifv5LHlVUEe3C
         C2hUhBSNkWBPrDUg69rkkH71WV8a2QSXd2Hbs29W/FhJDZfM9NLLnHKIdhPiic51nxod
         ccjPB745W5HF80BBjtBnV8DcfZI/ojH7dkw5icVKchM0YAJLVv1t8kELRgB3jmFCyvUv
         rAzCe9/XG85XttnXgGwC3YZg2YjNlESB8/Pqrw0XGMqcjlyVP/v+SLrEXpm2x5/Gui5Q
         7oDS1gJ3Gc1Gf9lL7Wi5YcvbHdBgHqYMdRIut2ePPsoNSmKpns5p7pwHUCJAa5haEHnv
         Yn+g==
X-Gm-Message-State: AOAM5321STo7o6nsCQ1Mbogz1JilbJKs5eVmbv6qEpUskfoamiW+LRgG
	g1qXe3G9uj111GDqmHES50InyXGCl7RRodc+LgM=
X-Google-Smtp-Source: ABdhPJwDNAo7yMlde5dniMALvGWEplDin9MtlBWYJUh6JmQi0s4tKgVuSbOkYcyeYPsSQD1YmA22roAERgXotJUZBAk=
X-Received: by 2002:a17:906:3111:: with SMTP id 17mr5748977ejx.403.1618530134459;
 Thu, 15 Apr 2021 16:42:14 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:ab4:9a0a:0:0:0:0:0 with HTTP; Thu, 15 Apr 2021 16:42:13
 -0700 (PDT)
In-Reply-To: <CCB65CBC-304C-42B3-810D-0AC8BEAE29D1@apple.com>
References: <CAJQtwF2aOTztmMOW-QFHovdFkoQHZnPqPxgSbKd+HfqeumD2Fw@mail.gmail.com>
 <66e073bb-366b-0559-4a78-fc5e8215aca1@redhat.com> <CAJQtwF12c6df6gEywFF7f1hk8RwgmrELHMQ9yk6tqNXUst0vGw@mail.gmail.com>
 <CALDovBuE0f0JNfkkr-=dDrQEnDbq+jKyp=ui03uD9K5XWhLQ2w@mail.gmail.com>
 <F9F6AB1F-5817-4F22-8B68-CD07B36E5876@apple.com> <CAJQtwF1gVStQeCPMyK3f82q9pp0Z_bBjr_pCgc8veuWP-2eSNQ@mail.gmail.com>
 <BE3D8FA9-1BFE-43C3-B69F-38A44EA36ACE@apple.com> <16758FB6436B1195.32393@groups.io>
 <CAJQtwF0pcYzFHjNp5JWqNDecwcxuAM3_MbUkZ5FruBUGp=BSCw@mail.gmail.com>
 <A454E2B2-7569-443D-AADF-60384005BE47@apple.com> <CAJQtwF0wVsXvN3uHHHmWzHPTmh-Mkyei4mpc_bsaP21bMB9+PA@mail.gmail.com>
 <CO1PR11MB4929B44F6CC94FA661AA1055D24E9@CO1PR11MB4929.namprd11.prod.outlook.com>
 <B2260448-028D-4659-98D6-C695CF3D738A@apple.com> <CAJQtwF3-REoeETR46BKr0Q6=b6ZHALbeiBBFtp4fEdFxKz8gwA@mail.gmail.com>
 <4AEC1784-99AF-47EF-B7DD-77F91EA3D7E9@apple.com> <CAJQtwF2e4dACRVyibbLOmOEmy34xMdGb1s0YdPFcHmZ2NMoBDA@mail.gmail.com>
 <309cc5ca-2ecd-79dd-b183-eec0572ea982@ipxe.org> <A139650C-A76F-4471-AFCC-FFF1BE2E35BB@apple.com>
 <CAJQtwF3kuOD3C2arUfZu_xDbkHq5HHz+LYNB2=AeV8x+q_cPtw@mail.gmail.com> <CCB65CBC-304C-42B3-810D-0AC8BEAE29D1@apple.com>
From: "Ethin Probst" <harlydavidsen@gmail.com>
Date: Thu, 15 Apr 2021 18:42:13 -0500
Message-ID: <CAJQtwF08Apihdsitw2Vs-+iV9rrqgpqkwONSbaY-yb_BP1xCYw@mail.gmail.com>
Subject: Re: [edk2-devel] VirtIO Sound Driver (GSoC 2021)
To: Andrew Fish <afish@apple.com>
Cc: edk2-devel-groups-io <devel@edk2.groups.io>, Michael Brown <mcb30@ipxe.org>, 
	Mike Kinney <michael.d.kinney@intel.com>, Leif Lindholm <leif@nuviainc.com>, 
	Laszlo Ersek <lersek@redhat.com>, 
	"Desimone, Nathaniel L" <nathaniel.l.desimone@intel.com>, 
	Rafael Rodrigues Machado <rafaelrodrigues.machado@gmail.com>, Gerd Hoffmann <kraxel@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Andrew,

What would that protocol interface look like if we utilized your idea?
With mine (though I need to add channel mapping as well), your
workflow for playing a stereo sound from left to right would probably
be something like this:
1) Encode the sound using a standard tool into a Wave PCM 16.
2) Place the Wave file in the Firmware Volume using a given UUID as
the name. As simple as editing the platform FDF file.
3) Write some BDS code
=C2=A0 a) Lookup Wave file by UUID and read it into memory.
=C2=A0 b) Decode the audio file (audio devices will not do this decoding
for you, you have to do that yourself).
=C2=A0 c) Call EFI_AUDIO_PROTOCOL.LoadBuffer(), passing in the sample rate
of your audio, EFI_AUDIO_PROTOCOL_SAMPLE_FORMAT_S16 for signed 16-bit
PCM audio, the channel mapping, the number of samples, and the samples
themselves.
  d) call EFI_BOOT_SERVICES.CreateEvent()/EFI_BOOT_SERVICES.CreateEventEx(=
)
for a playback event to signal.
  e) call EFI_AUDIO_PROTOCOL.StartPlayback(), passing in the event you
just created.
The reason that LoadBuffer() takes so many parameters is because the
device does not know the audio that your passing in. If I'm given an
array of 16-bit audio samples, its impossible to know the parameters
(sample rate, sample format, channel mapping, etc.) from that alone.
Using your idea, though, my protocol could be greatly simplified.
Forcing a particular channel mapping, sample rate and sample format on
everyone would complicate application code. From an application point
of view, one would, with that type of protocol, need to do the
following:
1) Load an audio file in any audio file format from any storage mechanism.
2) Decode the audio file format to extract the samples and audio metadata.
3) Resample the (now decoded) audio samples and convert (quantize) the
audio samples into signed 16-bit PCM audio.
4) forward the samples onto the EFI audio protocol.
There is another option. (I'm happy we're discussing this now -- we
can hammer out all the details now which will make a lot of things
easier.) Since I'll most likely end up splitting the device-specific
interfaces to different audio protocols, we could make a simple audio
protocol that makes various assumptions about the audio samples your
giving it (e.g.: sample rate, format, ...). This would just allow
audio output and input in signed 16-bit PCM audio, as you've
suggested, and would be a simple and easy to use interface. Something
like:
typedef struct EFI_SIMPLE_AUDIO_PROTOCOL {
  EFI_SIMPLE_AUDIO_PROTOCOL_RESET Reset;
  EFI_SIMPLE_AUDIO_PROTOCOL_START Start;
  EFI_SIMPLE_AUDIO_PROTOCOL_STOP Stop;
} EFI_SIMPLE_AUDIO_PROTOCOL;
This way, users and driver developers have a simple audio protocol
they can implement if they like. It would assume signed 16-bit PCM
audio and mono channel mappings at 44100 Hz. Then, we can have an
advanced protocol for each device type (HDA, USB, VirtIO, ...) that
exposes all the knobs for sample formats, sample rates, that kind of
thing. Obviously, like the majority (if not all) UEFI protocols, these
advanced protocols would be optional. I think, however, that the
simple audio protocol should be a required protocol in all UEFI
implementations. But that might not be possible. So would this simpler
interface work as a starting point?

On 4/15/21, Andrew Fish <afish@apple.com> wrote:
>
>
>> On Apr 15, 2021, at 1:11 PM, Ethin Probst <harlydavidsen@gmail.com>
>> wrote:
>>
>>> Is there any necessity for audio input and output to be implemented
>>> within the same protocol?  Unlike a network device (which is
>>> intrinsically bidirectional), it seems natural to conceptually separat=
e
>>> audio input from audio output.
>>
>> Nope, there isn't a necessity to make them in one, they can be
>> separated into two.
>>
>>> The code controlling volume/mute may not have any access to the sample
>>> buffer.  The most natural implementation would seem to allow for a
>>> platform to notice volume up/down keypresses and use those to control =
the
>>> overall system volume, without any knowledge of which samples (if any)
>>> are currently being played by other code in the system.
>>
>> Your assuming that the audio device your implementing the
>> volume/muting has volume control and muting functionality within
>> itself, then.
>
> Not really. We are assuming that audio hardware has a better understandi=
ng
> of how that system implements volume than some generic EFI Code that is =
by
> definition platform agnostic.
>
>> This may not be the case, and so we'd need to
>> effectively simulate it within the driver, which isn't too hard to do.
>> As an example, the VirtIO driver does not have a request type for
>> muting or for volume control (this would, most likely, be within the
>> VIRTIO_SND_R_PCM_SET_PARAMS request, see sec. 5.14.6.4.3). Therefore,
>> either the driver would have to simulate the request or return
>> EFI_UNSUPPORTED this instance.
>>
>
> So this is an example of above since the audio hardware knows it is rout=
ing
> the sound output into another subsystem, and that subsystem controls the
> volume. So the VirtIo Sound Driver know best how to bstract volume/mute =
for
> this platform.
>
>>> Consider also the point of view of the developer implementing a driver
>>> for some other piece of audio hardware that happens not to support
>>> precisely the same sample rates etc as VirtIO.  It would be extremely
>>> ugly to force all future hardware to pretend to have the same
>>> capabilities as VirtIO just because the API was initially designed wit=
h
>>> VirtIO in mind.
>>
>> Precisely, but the brilliance of VirtIO
>
> The brilliance of VirtIO is that it just needs to implement a generic de=
vice
> driver for a given operating system. In most cases these operating syste=
ms
> have sounds subsystems that manage sound and want fine granularity of
> control on what is going on. So the drivers are implemented to maximizes
> flexibility since the OS has lots of generic code that deals with sound,=
 and
> even user configurable knobs to control audio. In our case that extra la=
yer
> does not exist in EFI and the end user code just want to tell the driver=
 do
> some simple things.
>
> Maybe it is easier to think about with an example. Lets say I want to pl=
ay a
> cool sound on every boot. What would be the workflow to make the happen.
> 1) Encode the sound using a standard tool into a Wave PCM 16.
> 2) Place the Wave file in the Firmware Volume using a given UUID as the
> name. As simple as editing the platform FDF file.
> 3) Write some BDS code
>   a) Lookup Wave file by UUID and read it into memory.
>   b) Point the EFI Sound Protocol at the buffer with the Wave file
>   c) Tell the EFI Sound Protocol to play the sound.
>
> If you start adding in a lot of perimeters that work flow starts getting
> really complicated really quickly.
>
>> is that the sample rate,
>> sample format, &c., do not have to all be supported by a VirtIO
>> device. Notice, also, how in my protocol proposal I noted that the
>> sample rates, at least, were "recommended," not "required." Should a
>> device not happen to support a sample rate or sample format, all it
>> needs to do is return EFI_INVALID_PARAMETER. Section 5.14.6.2.1
>> (VIRTIO_SND_R_JACK_GET_CONFIG) describes how a jack tells you what
>> sample rates it supports, channel mappings, &c.
>>
>> I do understand how just using a predefined sample rate and sample
>> format might be a good idea, and if that's the best way, then that's
>> what we'll do. The protocol can always be revised at a later time if
>> necessary. I apologize if my stance seems obstinate.
>>
>
> I think if we add the version into the protocol and make sure we have a
> separate load and play operation we could add a member to set the extra
> perimeters if needed. There might also be some platform specific generic
> tunables that might be interesting for a future member function.
>
> Thanks,
>
> Andrew Fish
>
>> Also, thank you, Laszlo, for your advice -- I hadn't considered that a
>> network driver would be another good way of figuring out how async
>> works in UEFI.
>>
>> On 4/15/21, Andrew Fish <afish@apple.com> wrote:
>>>
>>>
>>>> On Apr 15, 2021, at 5:07 AM, Michael Brown <mcb30@ipxe.org> wrote:
>>>>
>>>> On 15/04/2021 06:28, Ethin Probst wrote:
>>>>> - I hoped to add recording in case we in future want to add
>>>>> accessibility aids like speech recognition (that was one of the todo
>>>>> tasks on the EDK2 tasks list)
>>>>
>>>> Is there any necessity for audio input and output to be implemented
>>>> within
>>>> the same protocol?  Unlike a network device (which is intrinsically
>>>> bidirectional), it seems natural to conceptually separate audio input
>>>> from
>>>> audio output.
>>>>
>>>>> - Muting and volume control could easily be added by just replacing
>>>>> the sample buffer with silence and by multiplying all the samples by=
 a
>>>>> percentage.
>>>>
>>>> The code controlling volume/mute may not have any access to the sampl=
e
>>>> buffer.  The most natural implementation would seem to allow for a
>>>> platform to notice volume up/down keypresses and use those to control
>>>> the
>>>> overall system volume, without any knowledge of which samples (if any=
)
>>>> are
>>>> currently being played by other code in the system.
>>>>
>>>
>>> I=E2=80=99ve also thought of adding NVRAM variable that would let setu=
p, UEFI
>>> Shell,
>>> or even the OS set the current volume, and Mute. This how it would be
>>> consumed concept is why I proposed mute and volume being separate APIs=
.
>>> The
>>> volume up/down API in addition to fixed percentage might be overkill, =
but
>>> it
>>> does allow a non liner mapping to the volume up/down keys. You would b=
e
>>> surprised how picky audiophiles can be and it seems they like to file
>>> Bugzillas.
>>>
>>>>> - Finally, the reason I used enumerations for specifying parameters
>>>>> like sample rate and stuff was that I was looking at this protocol
>>>>> from a general UEFI applications point of view. VirtIO supports all =
of
>>>>> the sample configurations listed in my gist, and it seems reasonable
>>>>> to allow the application to control those parameters instead of
>>>>> forcing a particular parameter configuration onto the developer.
>>>>
>>>> Consider also the point of view of the developer implementing a drive=
r
>>>> for
>>>> some other piece of audio hardware that happens not to support
>>>> precisely
>>>> the same sample rates etc as VirtIO.  It would be extremely ugly to
>>>> force
>>>> all future hardware to pretend to have the same capabilities as VirtI=
O
>>>> just because the API was initially designed with VirtIO in mind.
>>>>
>>>> As a developer on the other side of the API, writing code to play sou=
nd
>>>> files on an arbitrary unknown platform, I would prefer to simply
>>>> consume
>>>> as simple as possible an abstraction of an audio output protocol and
>>>> not
>>>> have to care about what hardware is actually implementing it.
>>>>
>>>
>>> It may make sense to have an API to load the buffer/stream and other A=
PIs
>>> to
>>> play or pause. This could allow an optional API to configure how the
>>> stream
>>> is played back. If we add a version to the Protocol that would at leas=
t
>>> future proof us.
>>>
>>> We did get feedback that it is very common to speed up the auto playba=
ck
>>> rates for accessibility. I=E2=80=99m not sure if that is practical wit=
h a simple
>>> PCM
>>> 16 wave file with the firmware audio implementation. I guess that is
>>> something we could investigate.
>>>
>>> In terms of maybe adding text to speech there is an open source projec=
t
>>> that
>>> conceptually we could port to EFI. It would likely be a binary that
>>> would
>>> have to live on the EFI System Partition due to size. I was thinking
>>> that
>>> words per minute could be part of that API and it would produce a PCM =
16
>>> wave file that the audio protocol we are discussing could play.
>>>
>>>> Both of these argue in favour of defining a very simple API that
>>>> expresses
>>>> only a common baseline capability that is plausibly implementable for
>>>> every piece of audio hardware ever made.
>>>>
>>>> Coupled with the relatively minimalistic requirements for boot-time
>>>> audio,
>>>> I'd probably suggest supporting only a single format for audio data,
>>>> with
>>>> a fixed sample rate (and possibly only mono output).
>>>>
>>>
>>> In my world the folks that work for Jony asked for a stereo boot bong =
to
>>> transition from left to right :). This is not the CODEC you are lookin=
g
>>> for
>>> was our response=E2=80=A6. I also did not mention that some languages =
are right
>>> to
>>> left, as the only thing worse than one complex thing is two complex
>>> things
>>> to implement.
>>>
>>>> As always: perfection is achieved, not when there is nothing more to
>>>> add,
>>>> but when there is nothing left to take away.  :)
>>>>
>>>
>>> "Simplicity is the ultimate sophistication=E2=80=9D
>>>
>>> Thanks,
>>>
>>> Andrew Fish
>>>
>>>> Thanks,
>>>>
>>>> Michael
>>>>
>>>>
>>>>=20
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Signed,
>> Ethin D. Probst
>
>


--=20
Signed,
Ethin D. Probst