From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mout01.posteo.de (mout01.posteo.de [185.67.36.65]) by mx.groups.io with SMTP id smtpd.web09.15292.1618773076596905409 for ; Sun, 18 Apr 2021 12:11:17 -0700 Authentication-Results: mx.groups.io; dkim=fail reason="body hash did not verify" header.i=@posteo.de header.s=2017 header.b=krGVc93t; spf=pass (domain: posteo.de, ip: 185.67.36.65, mailfrom: mhaeuser@posteo.de) Received: from submission (posteo.de [89.146.220.130]) by mout01.posteo.de (Postfix) with ESMTPS id B31EE240029 for ; Sun, 18 Apr 2021 21:11:12 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.de; s=2017; t=1618773072; bh=EgmwlYV75ODg8D/oKc9tTj74O3zkmm19EjU/g5Os9Hc=; h=Subject:To:Cc:From:Date:From; b=krGVc93t7cs2yYxI5iNVEftLx1aEOnNeMAUEQOjkBCr8OiWv7LUqMpAGegRbFq2+b OxlBaUztsN68LqwCxRnvx4w8AUSqcwOa9oZtrkgtcQ4CRO5YNaG+J/AfzQheyh5OtE BkO/DKg8ScYNGUdPkbPpoTTuK+syYudWxyW5vCBJnOXNdvzG+Gj3+Krcwa1cYeZDsi 7mZq7itU4cMPZ2CKi4xcXzrFN760T12R92RXy1PgVvEBIv33/6mpS0yHiazs/zby5o twFI2du/kboZREgz55uzqwRYbV6ilC1BjZLRO4yXkqssWmUCgSNd8lq9La6V3T4SlS JEZ2JAZSDgqRw== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4FNfgM0T3Xz9rxP; Sun, 18 Apr 2021 21:11:10 +0200 (CEST) Subject: Re: [edk2-devel] VirtIO Sound Driver (GSoC 2021) To: devel@edk2.groups.io, afish@apple.com, Ethin Probst Cc: Leif Lindholm , Michael Brown , Mike Kinney , Laszlo Ersek , "Desimone, Nathaniel L" , Rafael Rodrigues Machado , Gerd Hoffmann References: <4AEC1784-99AF-47EF-B7DD-77F91EA3D7E9@apple.com> <309cc5ca-2ecd-79dd-b183-eec0572ea982@ipxe.org> <33e37977-2d27-36a0-89a6-36e513d06b2f@ipxe.org> <6F69BEA6-5B7A-42E5-B6DA-D819ECC85EE5@apple.com> <20210416113447.GG1664@vanye> <10E3436C-D743-4B2F-8E4B-7AD93B82FC92@apple.com> <7459B8C0-EDF0-4760-97E7-D3338312B3DF@apple.com> <9b5f25d9-065b-257d-1d2d-7f80d14dec64@posteo.de> From: =?UTF-8?B?TWFydmluIEjDpHVzZXI=?= Message-ID: <6c0a4bf5-482e-b4f2-5df4-74930f4d979c@posteo.de> Date: Sun, 18 Apr 2021 19:11:10 +0000 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: quoted-printable On 18.04.21 17:22, Andrew Fish via groups.io wrote: > > >> On Apr 18, 2021, at 1:55 AM, Ethin Probst > > wrote: >> >>> I think it would be best to sketch use-cases for audio and design=20 >>> the solutions closely to the requirements. Why do we need to know=20 >>> when audio finished? What will happen when we queue audio twice?=20 >>> There are many layers (UX, interface, implementation details) of=20 >>> questions to coming up with a pleasant and stable design. > > We are not using EFI to listen to music in the background. Any audio=20 > being played is part of a UI element and there might be=20 > synchronization requirements. Maybe I communicated that wrong, I'm not asking because I don't know=20 what audio is used for, I am saying ideally there is a written-down list= =20 of usage requirements before the protocol is designed, because that is=20 what the design targets. The details should follow the needs. > > For example playing a boot bong on boot you want that to be=20 > asynchronous as you don=E2=80=99t want to delay boot to play the sound, = but=20 > you may want to chose to gate some UI elements on the boot bong=20 > completing. If you are building a menu UI that is accessible you may=20 > need to synchronize playback with UI update, but you may not want to=20 > make the slow sound playback blocking as you can get other UI work=20 > done in parallel. > > The overhead for a caller making an async call is not much [1], but=20 > not having the capability could really restrict the API for its=20 > intended use. I=E2=80=99d also point out we picked the same pattern as t= he=20 > async BlockIO and there is something to said for having consistency in= =20 > the UEFI Spec and have similar APIs work in similar ways. I'm not saying there should be *no* async playback, I am saying it may=20 be worth considering implementing it differently from caller-owned=20 events. I'm not concerned with overhead, I'm concerned with points of=20 failure (e.g. leaks). I very briefly discussed some things with Ethin and it seems like the=20 default EDK II timer interval of 10 ms may be problematic, but I am not=20 sure. Just leaving it here as something to keep it mind. Best regards, Marvin > > [1] Overhead for making an asynchronous call. > AUDIO_TOKEN AudioToken; > gBS->CreateEvent =C2=A0(EVT_NOTIFY_SIGNAL, TPL_CALLBACK, NULL, NULL,=20 > &AudioToken.Event); > > Thanks, > > Andrew Fish > >> I would be happy to discuss this with you on the UEFI talkbox. I'm >> draeand on there. >> As for your questions: >> >> 1. The only reason I recommend using an event to signal audio >> completion is because I do not want this protocol to be blocking at >> all. (So, perhaps removing the token entirely is a good idea.) The >> VirtIO audio device says nothing about synchronization, but I imagine >> its asynchronous because every audio specification I've seen out there >> is asynchronous. Similarly, every audio API in existence -- at least, >> every low-level OS-specific one -- is asynchronous/non-blocking. >> (Usually, audio processing is handled on a separate thread.) However, >> UEFI has no concept of threads or processes. Though we could use the >> MP PI package to spin up a separate processor, that would fail on >> uniprocessor, unicore systems. Audio processing needs a high enough >> priority that it gets first in the list of tasks served while >> simultaneously not getting a priority that's so high that it blocks >> everything else. This is primarily because of the way an audio >> subsystem is designed and the way an audio device functions: the audio >> subsystem needs to know, immediately, when the audio buffer has ran >> out of samples and needs more, and it needs to react immediately to >> refill the buffer if required, especially when streaming large amounts >> of audio (e.g.: music). Similarly, the audio subsystem needs the >> ability to react as soon as is viable when playback is requested, >> because any significant delay will be noticeable by the end-user. In >> more complex systems like FMOD or OpenAL, the audio processing thread >> also needs a high priority to ensure that audio effects, positioning >> information, dithering, etc., can be configured immediately because >> the user will notice if any glitches or delays occur. The UEFI audio >> protocols obviously will be nowhere near as complex, or as advanced, >> because no one will need audio effects in a preboot environment. >> Granted, its possible to make small audio effects, for example delays, >> even if the protocol doesn't have functions to do that, but if an >> end-user wants to go absolutely crazy with the audio samples and mix >> in a really nice-sounding reverb or audio filter before sending the >> samples to the audio engine, well, that's what they want to do and >> that's out of our hands as driver/protocol developers. But I digress. >> UEFI only has four TPLs, and so what we hopefully want is an engine >> that is able to manage sample buffering and transmission, but also >> doesn't block the application that's using the protocol. For some >> things, blocking might be acceptable, but for speech synthesis or the >> playing of startup sounds, this would not be an acceptable result and >> would make the protocol pretty much worthless in the majority of >> scenarios. So that's why I had an event to signal audio completion -- >> it was (perhaps) a cheap hack around the cooperatively-scheduled task >> architecture of UEFI. (At least, I think its cooperative multitasking, >> correct me if I'm wrong.) >> 2. The VirtIO specification does not specify what occurs in the event >> that a request is received to play a stream that's already being >> played. However, it does provide enough information for extrapolation. >> Every request that's sent to a VirtIO sound device must come with two >> things: a stream ID and a buffer of samples. The sample data must >> immediately follow the request. Therefore, for VirtIO in particular, >> the device will simply stop playing the old set of samples and play >> the new set instead. This goes along with what I've seen in other >> specifications like the HDA one: unless the device in question >> supports more than one stream, it is impossible to play two sounds on >> a single stream simultaneously, and an HDA controller (for example) is >> not going to perform any mixing; mixing is done purely in software. >> Similarly, if a device does support multiple streams, it is >> unspecified whether the device will play two or more streams >> simultaneously or whether it will pause/abort the playback of one >> while it plays another. Therefore, I believe (though cannot confirm) >> that OSes like Windows simply use a single stream, even if the device >> supports multiple streams, and just makes the applications believe >> that unlimited streams are possible. >> >> I apologize for this really long-winded email, and I hope no one=20 >> minds. :-) >> >> On 4/17/21, Marvin H=C3=A4user > > wrote: >>> On 17.04.21 19:31, Andrew Fish via groups.io wrote: >>>> >>>> >>>>> On Apr 17, 2021, at 9:51 AM, Marvin H=C3=A4user >>>> >>>>> >> wrote: >>>>> >>>>> On 16.04.21 19:45, Ethin Probst wrote: >>>>>> Yes, three APIs (maybe like this) would work well: >>>>>> - Start, Stop: begin playback of a stream >>>>>> - SetVolume, GetVolume, Mute, Unmute: control volume of output and >>>>>> enable muting >>>>>> - CreateStream, ReleaseStream, SetStreamSampleRate: Control sample >>>>>> rate of stream (but not sample format since Signed 16-bit PCM is >>>>>> enough) >>>>>> Marvin, how do you suggest we make the events then? We need some wa= y >>>>>> of notifying the caller that the stream has concluded. We could mak= e >>>>>> the driver create the event and pass it back to the caller as an >>>>>> event, but you'd still have dangling pointers (this is C, after all= ). >>>>>> We could just make a IsPlaying() function and WaitForCompletion() >>>>>> function and allow the driver to do the event handling -- would tha= t >>>>>> work? >>>>> >>>>> I do not know enough about the possible use-cases to tell. Aside fro= m >>>>> the two functions you already mentioned, you could also take in an >>>>> (optional) notification function. >>>>> Which possible use-cases does determining playback end have? If it's >>>>> too much effort, just use EFI_EVENT I guess, just the less code can >>>>> mess it up, the better. >>>>> >>>> >>>> In UEFI EFI_EVENT works much better. There is a gBS-WaitForEvent() >>>> function that lets a caller wait on an event. That is basically what >>>> the UEFI Shell is doing at the Shell prompt. A GUI in UEFI/C is >>>> basically an event loop. >>>> >>>> Fun fact: I ended up adding=C2=A0gIdleLoopEventGuid to the MdeModuleP= kg so >>>> the DXE Core could signal gIdleLoopEventGuid if you are sitting in >>>> gBS-WaitForEvent() and no event is signaled. Basically in EFI nothing >>>> is going to happen until the next timer tick so the gIdleLoopEventGui= d >>>> lets you idle the CPU until the next timer tick. I was forced to do >>>> this as the 1st MacBook Air had a bad habit of thermal tripping when >>>> sitting at the UEFI Shell prompt. After all another name for a loop i= n >>>> C code running on bare metal is a power virus. >>> >>> Mac EFI is one of the best implementations we know of, frankly. I'm >>> traumatised by Aptio 4 and alike, where (some issues are OEM-specific = I >>> think) you can have timer events signalling after ExitBS, there is eve= nt >>> clutter on IO polling to the point where everything lags no matter wha= t >>> you do, and even in "smooth" scenarios there may be nothing worth the >>> description "granularity" (events scheduled to run every 10 ms may run >>> every 50 ms). Events are the last resort for us, if there really is no >>> other way. My first GUI implementation worked without events at all fo= r >>> this reason, but as our workarounds got better, we did start using the= m >>> for keyboard and mouse polling. >>> >>> Timers do not apply here, but what does apply is resource management. >>> Using EFI_EVENT directly means (to the outside) the introduction of a >>> new resource to maintain, for each caller separately. On the other sid= e, >>> there is no resource to misuse or leak if none such is exposed. Yet, i= f >>> you argue with APIs like WaitForEvent, something has to signal it. In = a >>> simple environment this would mean, some timer event is running and ma= y >>> signal the event the main code waits for, where above's concern actual= ly >>> do apply. :) Again, the recommendation assumes the use-cases are simpl= e >>> enough to easily avoid them. >>> >>> I think it would be best to sketch use-cases for audio and design the >>> solutions closely to the requirements. Why do we need to know when aud= io >>> finished? What will happen when we queue audio twice? There are many >>> layers (UX, interface, implementation details) of questions to coming = up >>> with a pleasant and stable design. >>> >>> Best regards, >>> Marvin >>> >>>> >>>> Thanks, >>>> >>>> Andrew Fish. >>>> >>>>> If I remember correctly you mentioned the UEFI Talkbox before, if >>>>> that is more convenient for you, I'm there as mhaeuser. >>>>> >>>>> Best regards, >>>>> Marvin >>>>> >>>>>> >>>>>> On 4/16/21, Andrew Fish = =20 >>>>>> >> >>>>>> wrote: >>>>>>> >>>>>>>> On Apr 16, 2021, at 4:34 AM, Leif Lindholm >>>>>>> >>>>>>>> >> wrote: >>>>>>>> >>>>>>>> Hi Ethin, >>>>>>>> >>>>>>>> I think we also want to have a SetMode function, even if we=20 >>>>>>>> don't get >>>>>>>> around to implement proper support for it as part of GSoC=20 >>>>>>>> (although I >>>>>>>> expect at least for virtio, that should be pretty straightforward= ). >>>>>>>> >>>>>>> Leif, >>>>>>> >>>>>>> I=E2=80=99m think if we have an API to load the buffer and a 2nd A= PI to >>>>>>> play the >>>>>>> buffer an optional 3rd API could configure the streams. >>>>>>> >>>>>>>> It's quite likely that speech for UI would be stored as 8kHz (or >>>>>>>> 20kHz) in some systems, whereas the example for playing a tune in >>>>>>>> GRUB >>>>>>>> would more likely be a 44.1 kHz mp3/wav/ogg/flac. >>>>>>>> >>>>>>>> For the GSoC project, I think it would be quite reasonable to >>>>>>>> pre-generate pure PCM streams for testing rather than decoding >>>>>>>> anything on the fly. >>>>>>>> >>>>>>>> Porting/writing decoders is really a separate task from=20 >>>>>>>> enabling the >>>>>>>> output. I would much rather see USB *and* HDA support able to pla= y >>>>>>>> pcm >>>>>>>> streams before worrying about decoding. >>>>>>>> >>>>>>> I agree it might turn out it is easier to have the text to speech >>>>>>> code just >>>>>>> encode a PCM directly. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Andrew Fish >>>>>>> >>>>>>>> / >>>>>>>> =C2=A0=C2=A0=C2=A0Leif >>>>>>>> >>>>>>>> On Fri, Apr 16, 2021 at 00:33:06 -0500, Ethin Probst wrote: >>>>>>>>> Thanks for that explanation (I missed Mike's message). Earlier I >>>>>>>>> sent >>>>>>>>> a summary of those things that we can agree on: mainly, that=20 >>>>>>>>> we have >>>>>>>>> mute, volume control, a load buffer, (maybe) an unload buffer,= =20 >>>>>>>>> and a >>>>>>>>> start/stop stream function. Now that I fully understand the >>>>>>>>> ramifications of this I don't mind settling for a specific forma= t >>>>>>>>> and >>>>>>>>> sample rate, and signed 16-bit PCM audio is, I think, the most >>>>>>>>> widely >>>>>>>>> used one out there, besides 64-bit floating point samples, which >>>>>>>>> I've >>>>>>>>> only seen used in DAWs, and that's something we don't need. >>>>>>>>> Are you sure you want the firmware itself to handle the=20 >>>>>>>>> decoding of >>>>>>>>> WAV audio? I can make a library class for that, but I'll=20 >>>>>>>>> definitely >>>>>>>>> need help with the security aspect. >>>>>>>>> >>>>>>>>> On 4/16/21, Andrew Fish via groups.io =20 >>>>>>>>> > >>>>>>>>> =20 >>>>>>>>> >>>>>>>> >> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On Apr 15, 2021, at 5:59 PM, Michael Brown >>>>>>>>>> >>>>>>>>>>> >> wrote: >>>>>>>>>>> >>>>>>>>>>> On 16/04/2021 00:42, Ethin Probst wrote: >>>>>>>>>>>> Forcing a particular channel mapping, sample rate and sample >>>>>>>>>>>> format >>>>>>>>>>>> on >>>>>>>>>>>> everyone would complicate application code. From an >>>>>>>>>>>> application point >>>>>>>>>>>> of view, one would, with that type of protocol, need to do th= e >>>>>>>>>>>> following: >>>>>>>>>>>> 1) Load an audio file in any audio file format from any stora= ge >>>>>>>>>>>> mechanism. >>>>>>>>>>>> 2) Decode the audio file format to extract the samples and=20 >>>>>>>>>>>> audio >>>>>>>>>>>> metadata. >>>>>>>>>>>> 3) Resample the (now decoded) audio samples and convert >>>>>>>>>>>> (quantize) >>>>>>>>>>>> the >>>>>>>>>>>> audio samples into signed 16-bit PCM audio. >>>>>>>>>>>> 4) forward the samples onto the EFI audio protocol. >>>>>>>>>>> You have made an incorrect assumption that there exists a >>>>>>>>>>> requirement >>>>>>>>>>> to >>>>>>>>>>> be able to play audio files in arbitrary formats. =C2=A0This >>>>>>>>>>> requirement >>>>>>>>>>> does >>>>>>>>>>> not exist. >>>>>>>>>>> >>>>>>>>>>> With a protocol-mandated fixed baseline set of audio parameter= s >>>>>>>>>>> (sample >>>>>>>>>>> rate etc), what would happen in practice is that the audio >>>>>>>>>>> files would >>>>>>>>>>> be >>>>>>>>>>> encoded in that format at *build* time, using tools entirely >>>>>>>>>>> external >>>>>>>>>>> to >>>>>>>>>>> UEFI. =C2=A0The application code is then trivially simple: it= =20 >>>>>>>>>>> just does >>>>>>>>>>> "load >>>>>>>>>>> blob, pass blob to audio protocol". >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ethin, >>>>>>>>>> >>>>>>>>>> Given the goal is an industry standard we value interoperabilit= y >>>>>>>>>> more >>>>>>>>>> that >>>>>>>>>> flexibility. >>>>>>>>>> >>>>>>>>>> How about another use case. Lets say the Linux OS loader (Grub) >>>>>>>>>> wants >>>>>>>>>> to >>>>>>>>>> have an accessible UI so it decides to sore sound files on=20 >>>>>>>>>> the EFI >>>>>>>>>> System >>>>>>>>>> Partition and use our new fancy UEFI Audio Protocol to add audi= o >>>>>>>>>> to the >>>>>>>>>> OS >>>>>>>>>> loader GUI. So that version of Grub needs to work on 1,000 of >>>>>>>>>> different >>>>>>>>>> PCs >>>>>>>>>> and a wide range of UEFI Audio driver implementations. It is=20 >>>>>>>>>> a much >>>>>>>>>> easier >>>>>>>>>> world if Wave PCM 16 bit just works every place. You could add = a >>>>>>>>>> lot of >>>>>>>>>> complexity and try to encode the audio on the fly, maybe even i= n >>>>>>>>>> Linux >>>>>>>>>> proper but that falls down if you are booting from read only >>>>>>>>>> media like >>>>>>>>>> a >>>>>>>>>> DVD or backup tape (yes people still do that in server land). >>>>>>>>>> >>>>>>>>>> The other problem with flexibility is you just made the test=20 >>>>>>>>>> matrix >>>>>>>>>> very >>>>>>>>>> large for every driver that needs to get implemented. For >>>>>>>>>> something as >>>>>>>>>> complex as Intel HDA how you hook up the hardware and what >>>>>>>>>> CODECs you >>>>>>>>>> use >>>>>>>>>> may impact the quality of the playback for a given board. Your >>>>>>>>>> EFI is >>>>>>>>>> likely >>>>>>>>>> going to pick a single encoding at that will get tested all the >>>>>>>>>> time if >>>>>>>>>> your >>>>>>>>>> system has audio, but all 50 other things you support not so >>>>>>>>>> much. So >>>>>>>>>> that >>>>>>>>>> will required testing, and some one with audiophile ears (or=20 >>>>>>>>>> an AI >>>>>>>>>> program) >>>>>>>>>> to test all the combinations. I=E2=80=99m not kidding I get BZs= on the >>>>>>>>>> quality >>>>>>>>>> of >>>>>>>>>> the boot bong on our systems. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> typedef struct EFI_SIMPLE_AUDIO_PROTOCOL { >>>>>>>>>>>> =C2=A0EFI_SIMPLE_AUDIO_PROTOCOL_RESET Reset; >>>>>>>>>>>> =C2=A0EFI_SIMPLE_AUDIO_PROTOCOL_START Start; >>>>>>>>>>>> =C2=A0EFI_SIMPLE_AUDIO_PROTOCOL_STOP Stop; >>>>>>>>>>>> } EFI_SIMPLE_AUDIO_PROTOCOL; >>>>>>>>>>> This is now starting to look like something that belongs in >>>>>>>>>>> boot-time >>>>>>>>>>> firmware. =C2=A0:) >>>>>>>>>>> >>>>>>>>>> I think that got a little too simple I=E2=80=99d go back and lo= ok at the >>>>>>>>>> example >>>>>>>>>> I >>>>>>>>>> posted to the thread but add an API to load the buffer, and the= n >>>>>>>>>> play >>>>>>>>>> the >>>>>>>>>> buffer (that way we can an API in the future to twiddle knobs). >>>>>>>>>> That >>>>>>>>>> API >>>>>>>>>> also implements the async EFI interface. Trust me the 1st thing >>>>>>>>>> that is >>>>>>>>>> going to happen when we add audio is some one is going to >>>>>>>>>> complain in >>>>>>>>>> xyz >>>>>>>>>> state we should mute audio, or we should honer audio volume and >>>>>>>>>> mute >>>>>>>>>> settings from setup, or from values set in the OS. Or some one >>>>>>>>>> is going >>>>>>>>>> to >>>>>>>>>> want the volume keys on the keyboard to work in EFI. >>>>>>>>>> >>>>>>>>>> Also if you need to pick apart the Wave PCM 16 byte file to fee= d >>>>>>>>>> it into >>>>>>>>>> the >>>>>>>>>> audio hardware that probably means we should have a library tha= t >>>>>>>>>> does >>>>>>>>>> that >>>>>>>>>> work, so other Audio drivers can share that code. Also having a >>>>>>>>>> library >>>>>>>>>> makes it easier to write a unit test. We need to be security >>>>>>>>>> conscious >>>>>>>>>> as we >>>>>>>>>> need to treat the Audo file as attacker controlled data. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Andrew Fish >>>>>>>>>> >>>>>>>>>>> Michael >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Signed, >>>>>>>>> Ethin D. Probst >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>> >>> >> >> >> -- >> Signed, >> Ethin D. Probst > >=20