From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from rn-mailsvcp-ppex-lapp45.apple.com (rn-mailsvcp-ppex-lapp45.apple.com [17.179.253.49]) by mx.groups.io with SMTP id smtpd.web10.458.1618596579902450617 for ; Fri, 16 Apr 2021 11:09:39 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@apple.com header.s=20180706 header.b=G27pv6yj; spf=pass (domain: apple.com, ip: 17.179.253.49, mailfrom: afish@apple.com) Received: from pps.filterd (rn-mailsvcp-ppex-lapp45.rno.apple.com [127.0.0.1]) by rn-mailsvcp-ppex-lapp45.rno.apple.com (8.16.1.2/8.16.1.2) with SMTP id 13GI2As0009936; Fri, 16 Apr 2021 11:09:39 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=20180706; bh=4isYm28gewsW/BbPgplOZTXPdFUoagd69DgxH9j2Kpw=; b=G27pv6yj80UvN5qDh8vRdCuPhIUfkmwcP3u8HKVwZvJNygfF9v4vQZLaYHEJXkUQMTWd WwYHL2h484kgojIqJtHbmQAu/arwXMiPuL/NCapfzA2KzK2QXhQiDg0azpUHPIOqPqtX GTQsaXlVB3/7j3yXtdJuMMG8gwZpWW8Grz/RbzlytqNdVXJFTkSifUbCPGdV6wM7XezE n5/QydbhyCy0G7AKf89NwzRoH8EoR5bvlJIK4bJUCJRTKK2OzuK7p2lHNUkuwM5WHcYq Vl1F/R6waru6dWXMU8psU3W0BUt3h33l9UfmDdctRc/rum96oCqhhQAHRWgQbgwJFDxX IQ== Received: from rn-mailsvcp-mta-lapp04.rno.apple.com (rn-mailsvcp-mta-lapp04.rno.apple.com [10.225.203.152]) by rn-mailsvcp-ppex-lapp45.rno.apple.com with ESMTP id 37yehfrq8n-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Fri, 16 Apr 2021 11:09:39 -0700 Received: from rn-mailsvcp-mmp-lapp03.rno.apple.com (rn-mailsvcp-mmp-lapp03.rno.apple.com [17.179.253.16]) by rn-mailsvcp-mta-lapp04.rno.apple.com (Oracle Communications Messaging Server 8.1.0.7.20201203 64bit (built Dec 3 2020)) with ESMTPS id <0QRO00R18542BZF0@rn-mailsvcp-mta-lapp04.rno.apple.com>; Fri, 16 Apr 2021 11:09:38 -0700 (PDT) Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp03.rno.apple.com by rn-mailsvcp-mmp-lapp03.rno.apple.com (Oracle Communications Messaging Server 8.1.0.7.20201203 64bit (built Dec 3 2020)) id <0QRO00Q004S8L000@rn-mailsvcp-mmp-lapp03.rno.apple.com>; Fri, 16 Apr 2021 11:09:38 -0700 (PDT) X-Va-A: X-Va-T-CD: 494ba4982f0b5c44357f821690cbd4e9 X-Va-E-CD: 4730c80ee67030d4f2c83e40b4ab0357 X-Va-R-CD: 6f0325faf294bd23a6d751c620be9d51 X-Va-CD: 0 X-Va-ID: 33098708-fcf2-4f4d-b24e-77d8c80555df X-V-A: X-V-T-CD: 494ba4982f0b5c44357f821690cbd4e9 X-V-E-CD: 4730c80ee67030d4f2c83e40b4ab0357 X-V-R-CD: 6f0325faf294bd23a6d751c620be9d51 X-V-CD: 0 X-V-ID: 0b18718e-fff6-43be-a850-46e6c603492c X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-16_09:2021-04-16,2021-04-16 signatures=0 Received: from [17.235.19.21] (unknown [17.235.19.21]) by rn-mailsvcp-mmp-lapp03.rno.apple.com (Oracle Communications Messaging Server 8.1.0.7.20201203 64bit (built Dec 3 2020)) with ESMTPSA id <0QRO00QJG5404300@rn-mailsvcp-mmp-lapp03.rno.apple.com>; Fri, 16 Apr 2021 11:09:38 -0700 (PDT) MIME-version: 1.0 (Mac OS X Mail 14.0 \(3654.20.0.2.1\)) Subject: Re: [edk2-devel] VirtIO Sound Driver (GSoC 2021) From: "Andrew Fish" In-reply-to: Date: Fri, 16 Apr 2021 11:09:36 -0700 Cc: Leif Lindholm , Michael Brown , Mike Kinney , Laszlo Ersek , "Desimone, Nathaniel L" , Rafael Rodrigues Machado , Gerd Hoffmann Message-id: <2B5C3A05-95BC-4239-A84D-B262F1A2D6DC@apple.com> References: <4AEC1784-99AF-47EF-B7DD-77F91EA3D7E9@apple.com> <309cc5ca-2ecd-79dd-b183-eec0572ea982@ipxe.org> <33e37977-2d27-36a0-89a6-36e513d06b2f@ipxe.org> <6F69BEA6-5B7A-42E5-B6DA-D819ECC85EE5@apple.com> <20210416113447.GG1664@vanye> <10E3436C-D743-4B2F-8E4B-7AD93B82FC92@apple.com> To: edk2-devel-groups-io , Ethin Probst X-Mailer: Apple Mail (2.3654.20.0.2.1) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-16_09:2021-04-16,2021-04-16 signatures=0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: quoted-printable > On Apr 16, 2021, at 10:55 AM, Ethin Probst wro= te: >=20 > Also, forgot to add this before sending: yes, speech synthesizers > usually generate the PCM audio on the fly. They can write it to an > output file, but if your just using it in a screen reader, then you > end up streaming it to the audio device. This raises another issue I > was pondering, but I (don't think) we need to handle that quite yet. > The problem involves output generated by the HII, console, etc.: when > we're using a speech synthesizer, it might be configured to speak at a > faster or slower rate depending on the preferences of the user (we > definitely want to let the user control these things because different > people have different preferences on the speed of speech synthesizers: > some can understand it at really fast rates and others can't, for > example). The problem arises when we want to forward output from the > screen (say, the simple text output protocol). Assume that a user is > running the EFI shell as an example, which, if I'm not mistaken, uses > this protocol. The shell calls > EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL.OutputString(). We probably then want > this function to forward the string passed in onto the speech > synthesizer, assuming that accessibility features are enabled (I'm > assuming we'd want to make that a toggle). The problem is that one can > call EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL.OutputString() many times. During > all these calls, text is being sent to the synthesizer, its generating > samples, and forwarding those samples onto the audio output protocol. > The problem is: how do we ensure that these samples don't overlap or > cause other problems (e.g.: interrupt streams that are still being > processed)? As I said, these are things we don't need to necessarily > consider now, but this is a problem we'll need to tackle in the > future. >=20 I=E2=80=99m not sure converting EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL to audio d= irectly would work from a practical point of view. If you look at the Hii b= ased setup and lot of text and a lot of text based graphics characters get = dumped into a screen with not particular order. For the UEFI Shell there is= a lot of cursor control and the prompt could get redrawn at arbitrary time= s. Not sure how well that maps to pure text to speech. Also in a text based= GUI you are just changing the attributes of what is selected as you redraw= it. Thus for Hii I was thinking it may make more sense to just have the Hi= i forms browser pick the spots to do text to speech.=20 I=E2=80=99ve also thought about mechanisms that would let the OS encode au= to associated with the boot options so the Hii UI could use that. But I=E2= =80=99m guessing speak on select in the UI might be more useful than raw E= FI_SIMPLE_TEXT_OUTPUT_PROTOCOL to text. You can always chose to make the Au= dio playback synchronous, or block certain UI progress on audio completion.= = =20 Thanks, Andrew Fish PS The edk2 has a ConSpliter so adding and extra EFI_SIMPLE_TEXT_OUTPUT_PR= OTOCOL would be easy, I=E2=80=99m just not sure it would work out well.=20 > On 4/16/21, Ethin Probst wrote: >> Yes, three APIs (maybe like this) would work well: >> - Start, Stop: begin playback of a stream >> - SetVolume, GetVolume, Mute, Unmute: control volume of output and enab= le >> muting >> - CreateStream, ReleaseStream, SetStreamSampleRate: Control sample >> rate of stream (but not sample format since Signed 16-bit PCM is >> enough) >> Marvin, how do you suggest we make the events then? We need some way >> of notifying the caller that the stream has concluded. We could make >> the driver create the event and pass it back to the caller as an >> event, but you'd still have dangling pointers (this is C, after all). >> We could just make a IsPlaying() function and WaitForCompletion() >> function and allow the driver to do the event handling -- would that >> work? >>=20 >> On 4/16/21, Andrew Fish wrote: >>>=20 >>>=20 >>>> On Apr 16, 2021, at 4:34 AM, Leif Lindholm wrote: >>>>=20 >>>> Hi Ethin, >>>>=20 >>>> I think we also want to have a SetMode function, even if we don't get >>>> around to implement proper support for it as part of GSoC (although I >>>> expect at least for virtio, that should be pretty straightforward). >>>>=20 >>>=20 >>> Leif, >>>=20 >>> I=E2=80=99m think if we have an API to load the buffer and a 2nd API t= o play the >>> buffer an optional 3rd API could configure the streams. >>>=20 >>>> It's quite likely that speech for UI would be stored as 8kHz (or >>>> 20kHz) in some systems, whereas the example for playing a tune in GRU= B >>>> would more likely be a 44.1 kHz mp3/wav/ogg/flac. >>>>=20 >>>> For the GSoC project, I think it would be quite reasonable to >>>> pre-generate pure PCM streams for testing rather than decoding >>>> anything on the fly. >>>>=20 >>>> Porting/writing decoders is really a separate task from enabling the >>>> output. I would much rather see USB *and* HDA support able to play pc= m >>>> streams before worrying about decoding. >>>>=20 >>>=20 >>> I agree it might turn out it is easier to have the text to speech code >>> just >>> encode a PCM directly. >>>=20 >>> Thanks, >>>=20 >>> Andrew Fish >>>=20 >>>> / >>>> Leif >>>>=20 >>>> On Fri, Apr 16, 2021 at 00:33:06 -0500, Ethin Probst wrote: >>>>> Thanks for that explanation (I missed Mike's message). Earlier I sen= t >>>>> a summary of those things that we can agree on: mainly, that we have >>>>> mute, volume control, a load buffer, (maybe) an unload buffer, and a >>>>> start/stop stream function. Now that I fully understand the >>>>> ramifications of this I don't mind settling for a specific format an= d >>>>> sample rate, and signed 16-bit PCM audio is, I think, the most widel= y >>>>> used one out there, besides 64-bit floating point samples, which I'v= e >>>>> only seen used in DAWs, and that's something we don't need. >>>>> Are you sure you want the firmware itself to handle the decoding of >>>>> WAV audio? I can make a library class for that, but I'll definitely >>>>> need help with the security aspect. >>>>>=20 >>>>> On 4/16/21, Andrew Fish via groups.io >>>>> wrote: >>>>>>=20 >>>>>>=20 >>>>>>> On Apr 15, 2021, at 5:59 PM, Michael Brown wrote: >>>>>>>=20 >>>>>>> On 16/04/2021 00:42, Ethin Probst wrote: >>>>>>>> Forcing a particular channel mapping, sample rate and sample form= at >>>>>>>> on >>>>>>>> everyone would complicate application code. From an application >>>>>>>> point >>>>>>>> of view, one would, with that type of protocol, need to do the >>>>>>>> following: >>>>>>>> 1) Load an audio file in any audio file format from any storage >>>>>>>> mechanism. >>>>>>>> 2) Decode the audio file format to extract the samples and audio >>>>>>>> metadata. >>>>>>>> 3) Resample the (now decoded) audio samples and convert (quantize= ) >>>>>>>> the >>>>>>>> audio samples into signed 16-bit PCM audio. >>>>>>>> 4) forward the samples onto the EFI audio protocol. >>>>>>>=20 >>>>>>> You have made an incorrect assumption that there exists a requirem= ent >>>>>>> to >>>>>>> be able to play audio files in arbitrary formats. This requiremen= t >>>>>>> does >>>>>>> not exist. >>>>>>>=20 >>>>>>> With a protocol-mandated fixed baseline set of audio parameters >>>>>>> (sample >>>>>>> rate etc), what would happen in practice is that the audio files >>>>>>> would >>>>>>> be >>>>>>> encoded in that format at *build* time, using tools entirely exter= nal >>>>>>> to >>>>>>> UEFI. The application code is then trivially simple: it just does >>>>>>> "load >>>>>>> blob, pass blob to audio protocol". >>>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> Ethin, >>>>>>=20 >>>>>> Given the goal is an industry standard we value interoperability mo= re >>>>>> that >>>>>> flexibility. >>>>>>=20 >>>>>> How about another use case. Lets say the Linux OS loader (Grub) wan= ts >>>>>> to >>>>>> have an accessible UI so it decides to sore sound files on the EFI >>>>>> System >>>>>> Partition and use our new fancy UEFI Audio Protocol to add audio to >>>>>> the >>>>>> OS >>>>>> loader GUI. So that version of Grub needs to work on 1,000 of >>>>>> different >>>>>> PCs >>>>>> and a wide range of UEFI Audio driver implementations. It is a much >>>>>> easier >>>>>> world if Wave PCM 16 bit just works every place. You could add a lo= t >>>>>> of >>>>>> complexity and try to encode the audio on the fly, maybe even in Li= nux >>>>>> proper but that falls down if you are booting from read only media >>>>>> like >>>>>> a >>>>>> DVD or backup tape (yes people still do that in server land). >>>>>>=20 >>>>>> The other problem with flexibility is you just made the test matrix >>>>>> very >>>>>> large for every driver that needs to get implemented. For something= as >>>>>> complex as Intel HDA how you hook up the hardware and what CODECs y= ou >>>>>> use >>>>>> may impact the quality of the playback for a given board. Your EFI = is >>>>>> likely >>>>>> going to pick a single encoding at that will get tested all the tim= e >>>>>> if >>>>>> your >>>>>> system has audio, but all 50 other things you support not so much. = So >>>>>> that >>>>>> will required testing, and some one with audiophile ears (or an AI >>>>>> program) >>>>>> to test all the combinations. I=E2=80=99m not kidding I get BZs on = the quality >>>>>> of >>>>>> the boot bong on our systems. >>>>>>=20 >>>>>>=20 >>>>>>>> typedef struct EFI_SIMPLE_AUDIO_PROTOCOL { >>>>>>>> EFI_SIMPLE_AUDIO_PROTOCOL_RESET Reset; >>>>>>>> EFI_SIMPLE_AUDIO_PROTOCOL_START Start; >>>>>>>> EFI_SIMPLE_AUDIO_PROTOCOL_STOP Stop; >>>>>>>> } EFI_SIMPLE_AUDIO_PROTOCOL; >>>>>>>=20 >>>>>>> This is now starting to look like something that belongs in boot-t= ime >>>>>>> firmware. :) >>>>>>>=20 >>>>>>=20 >>>>>> I think that got a little too simple I=E2=80=99d go back and look a= t the >>>>>> example >>>>>> I >>>>>> posted to the thread but add an API to load the buffer, and then pl= ay >>>>>> the >>>>>> buffer (that way we can an API in the future to twiddle knobs). Tha= t >>>>>> API >>>>>> also implements the async EFI interface. Trust me the 1st thing tha= t >>>>>> is >>>>>> going to happen when we add audio is some one is going to complain = in >>>>>> xyz >>>>>> state we should mute audio, or we should honer audio volume and mut= e >>>>>> settings from setup, or from values set in the OS. Or some one is >>>>>> going >>>>>> to >>>>>> want the volume keys on the keyboard to work in EFI. >>>>>>=20 >>>>>> Also if you need to pick apart the Wave PCM 16 byte file to feed it >>>>>> into >>>>>> the >>>>>> audio hardware that probably means we should have a library that do= es >>>>>> that >>>>>> work, so other Audio drivers can share that code. Also having a >>>>>> library >>>>>> makes it easier to write a unit test. We need to be security consci= ous >>>>>> as we >>>>>> need to treat the Audo file as attacker controlled data. >>>>>>=20 >>>>>> Thanks, >>>>>>=20 >>>>>> Andrew Fish >>>>>>=20 >>>>>>> Michael >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>=20 >>>>>=20 >>>>> -- >>>>> Signed, >>>>> Ethin D. Probst >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>=20 >>>=20 >>=20 >>=20 >> -- >> Signed, >> Ethin D. Probst >>=20 >=20 >=20 > --=20 > Signed, > Ethin D. Probst >=20 >=20 >=20 >=20 >=20