Re: [edk2-devel] VirtIO Sound Driver (GSoC 2021)

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

From: "Ethin Probst" <harlydavidsen@gmail.com>
To: Andrew Fish <afish@apple.com>
Cc: edk2-devel-groups-io <devel@edk2.groups.io>,
	Leif Lindholm <leif@nuviainc.com>,
	 Michael Brown <mcb30@ipxe.org>,
	Mike Kinney <michael.d.kinney@intel.com>,
	 Laszlo Ersek <lersek@redhat.com>,
	 "Desimone, Nathaniel L" <nathaniel.l.desimone@intel.com>,
	 Rafael Rodrigues Machado <rafaelrodrigues.machado@gmail.com>,
	Gerd Hoffmann <kraxel@redhat.com>
Subject: Re: [edk2-devel] VirtIO Sound Driver (GSoC 2021)
Date: Fri, 16 Apr 2021 18:50:01 -0500	[thread overview]
Message-ID: <CAJQtwF21Op48ZqCaK8RM_8ZtUO_+MTxesf55++NkLPTbfLuOGw@mail.gmail.com> (raw)
In-Reply-To: <2B5C3A05-95BC-4239-A84D-B262F1A2D6DC@apple.com>

Andrew,
Thanks for that example -- that makes things a lot easier to
understand now. I think, anyway. As for the simple text output
protocol, I was just using that as an example; I understand that the
HII infrastructure draws graphical characters, as does the simple text
output protocol, and the screen is redrawn quite often. I was just
using that as an example of the problems that we might face. I'd
prefer that the protocols themselves be made accessible instead of
requiring applications to do it; the advantage would be that very
little work, if any, would need to be done to make an application
accessible, and nothing would necessarily need to be rebuilt for
accessibility to be implemented. All someone would have to do is
update the firmware, enable accessibility features, and they
immediately get an accessible user interface, with no changes required
to user-facing applications such as bootloaders, secure boot
utilities, shells, and setup utilities. But that's a long way off, and
for now, we should focus primarily on the audio output protocol
instead of focusing on details that are still abstract.
Is the audio interface I proposed in a couple emails previously
sufficient and something that we can all agree on? We might also want
a GetVersion() function to return the revision number so that users of
it know what revision they're working with in case we decide to add
functions to it later on.
As always, I've updated my gist, and as before, the new version can be
found here: https://gist.github.com/ethindp/56066e0561b901c1e276e82ff469436d

On 4/16/21, Andrew Fish <afish@apple.com> wrote:
>
>
>> On Apr 16, 2021, at 10:55 AM, Ethin Probst <harlydavidsen@gmail.com>
>> wrote:
>>
>> Also, forgot to add this before sending: yes, speech synthesizers
>> usually generate the PCM audio on the fly. They can write it to an
>> output file, but if your just using it in a screen reader, then you
>> end up streaming it to the audio device. This raises another issue I
>> was pondering, but I (don't think) we need to handle that quite yet.
>> The problem involves output generated by the HII, console, etc.: when
>> we're using a speech synthesizer, it might be configured to speak at a
>> faster or slower rate depending on the preferences of the user (we
>> definitely want to let the user control these things because different
>> people have different preferences on the speed of speech synthesizers:
>> some can understand it at really fast rates and others can't, for
>> example). The problem arises when we want to forward output from the
>> screen (say, the simple text output protocol). Assume that a user is
>> running the EFI shell as an example, which, if I'm not mistaken, uses
>> this protocol. The shell calls
>> EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL.OutputString(). We probably then want
>> this function to forward the string passed in onto the speech
>> synthesizer, assuming that accessibility features are enabled (I'm
>> assuming we'd want to make that a toggle). The problem is that one can
>> call EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL.OutputString() many times. During
>> all these calls, text is being sent to the synthesizer, its generating
>> samples, and forwarding those samples onto the audio output protocol.
>> The problem is: how do we ensure that these samples don't overlap or
>> cause other problems (e.g.: interrupt streams that are still being
>> processed)? As I said, these are things we don't need to necessarily
>> consider now, but this is a problem we'll need to tackle in the
>> future.
>>
>
> I’m not sure converting EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL to audio directly
> would work from a practical point of view. If you look at the Hii based
> setup and lot of text and a lot of text based graphics characters get dumped
> into a screen with not particular order. For the UEFI Shell there is a lot
> of cursor control and the prompt could get redrawn at arbitrary times. Not
> sure how well that maps to pure text to speech. Also in a text based GUI you
> are just changing the attributes of what is selected as you redraw it. Thus
> for Hii I was thinking it may make more sense to just have the Hii forms
> browser pick the spots to do text to speech.
>
> I’ve also thought about mechanisms that would let the OS encode auto
> associated with the boot options so the Hii UI could use that. But I’m
> guessing speak on select in the UI might be more useful than raw
> EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL to text. You can always chose to make the
> Audio playback synchronous, or block certain UI progress on audio
> completion.
>
> Thanks,
>
> Andrew Fish
>
> PS The edk2 has a ConSpliter so adding and extra
> EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL would be easy, I’m just not sure it would
> work out well.
>
>> On 4/16/21, Ethin Probst <harlydavidsen@gmail.com> wrote:
>>> Yes, three APIs (maybe like this) would work well:
>>> - Start, Stop: begin playback of a stream
>>> - SetVolume, GetVolume, Mute, Unmute: control volume of output and
>>> enable
>>> muting
>>> - CreateStream, ReleaseStream, SetStreamSampleRate: Control sample
>>> rate of stream (but not sample format since Signed 16-bit PCM is
>>> enough)
>>> Marvin, how do you suggest we make the events then? We need some way
>>> of notifying the caller that the stream has concluded. We could make
>>> the driver create the event and pass it back to the caller as an
>>> event, but you'd still have dangling pointers (this is C, after all).
>>> We could just make a IsPlaying() function and WaitForCompletion()
>>> function and allow the driver to do the event handling -- would that
>>> work?
>>>
>>> On 4/16/21, Andrew Fish <afish@apple.com> wrote:
>>>>
>>>>
>>>>> On Apr 16, 2021, at 4:34 AM, Leif Lindholm <leif@nuviainc.com> wrote:
>>>>>
>>>>> Hi Ethin,
>>>>>
>>>>> I think we also want to have a SetMode function, even if we don't get
>>>>> around to implement proper support for it as part of GSoC (although I
>>>>> expect at least for virtio, that should be pretty straightforward).
>>>>>
>>>>
>>>> Leif,
>>>>
>>>> I’m think if we have an API to load the buffer and a 2nd API to play
>>>> the
>>>> buffer an optional 3rd API could configure the streams.
>>>>
>>>>> It's quite likely that speech for UI would be stored as 8kHz (or
>>>>> 20kHz) in some systems, whereas the example for playing a tune in GRUB
>>>>> would more likely be a 44.1 kHz mp3/wav/ogg/flac.
>>>>>
>>>>> For the GSoC project, I think it would be quite reasonable to
>>>>> pre-generate pure PCM streams for testing rather than decoding
>>>>> anything on the fly.
>>>>>
>>>>> Porting/writing decoders is really a separate task from enabling the
>>>>> output. I would much rather see USB *and* HDA support able to play pcm
>>>>> streams before worrying about decoding.
>>>>>
>>>>
>>>> I agree it might turn out it is easier to have the text to speech code
>>>> just
>>>> encode a PCM directly.
>>>>
>>>> Thanks,
>>>>
>>>> Andrew Fish
>>>>
>>>>> /
>>>>>   Leif
>>>>>
>>>>> On Fri, Apr 16, 2021 at 00:33:06 -0500, Ethin Probst wrote:
>>>>>> Thanks for that explanation (I missed Mike's message). Earlier I sent
>>>>>> a summary of those things that we can agree on: mainly, that we have
>>>>>> mute, volume control, a load buffer, (maybe) an unload buffer, and a
>>>>>> start/stop stream function. Now that I fully understand the
>>>>>> ramifications of this I don't mind settling for a specific format and
>>>>>> sample rate, and signed 16-bit PCM audio is, I think, the most widely
>>>>>> used one out there, besides 64-bit floating point samples, which I've
>>>>>> only seen used in DAWs, and that's something we don't need.
>>>>>> Are you sure you want the firmware itself to handle the decoding of
>>>>>> WAV audio? I can make a library class for that, but I'll definitely
>>>>>> need help with the security aspect.
>>>>>>
>>>>>> On 4/16/21, Andrew Fish via groups.io <afish=apple.com@groups.io>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Apr 15, 2021, at 5:59 PM, Michael Brown <mcb30@ipxe.org> wrote:
>>>>>>>>
>>>>>>>> On 16/04/2021 00:42, Ethin Probst wrote:
>>>>>>>>> Forcing a particular channel mapping, sample rate and sample
>>>>>>>>> format
>>>>>>>>> on
>>>>>>>>> everyone would complicate application code. From an application
>>>>>>>>> point
>>>>>>>>> of view, one would, with that type of protocol, need to do the
>>>>>>>>> following:
>>>>>>>>> 1) Load an audio file in any audio file format from any storage
>>>>>>>>> mechanism.
>>>>>>>>> 2) Decode the audio file format to extract the samples and audio
>>>>>>>>> metadata.
>>>>>>>>> 3) Resample the (now decoded) audio samples and convert (quantize)
>>>>>>>>> the
>>>>>>>>> audio samples into signed 16-bit PCM audio.
>>>>>>>>> 4) forward the samples onto the EFI audio protocol.
>>>>>>>>
>>>>>>>> You have made an incorrect assumption that there exists a
>>>>>>>> requirement
>>>>>>>> to
>>>>>>>> be able to play audio files in arbitrary formats.  This requirement
>>>>>>>> does
>>>>>>>> not exist.
>>>>>>>>
>>>>>>>> With a protocol-mandated fixed baseline set of audio parameters
>>>>>>>> (sample
>>>>>>>> rate etc), what would happen in practice is that the audio files
>>>>>>>> would
>>>>>>>> be
>>>>>>>> encoded in that format at *build* time, using tools entirely
>>>>>>>> external
>>>>>>>> to
>>>>>>>> UEFI.  The application code is then trivially simple: it just does
>>>>>>>> "load
>>>>>>>> blob, pass blob to audio protocol".
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Ethin,
>>>>>>>
>>>>>>> Given the goal is an industry standard we value interoperability
>>>>>>> more
>>>>>>> that
>>>>>>> flexibility.
>>>>>>>
>>>>>>> How about another use case. Lets say the Linux OS loader (Grub)
>>>>>>> wants
>>>>>>> to
>>>>>>> have an accessible UI so it decides to sore sound files on the EFI
>>>>>>> System
>>>>>>> Partition and use our new fancy UEFI Audio Protocol to add audio to
>>>>>>> the
>>>>>>> OS
>>>>>>> loader GUI. So that version of Grub needs to work on 1,000 of
>>>>>>> different
>>>>>>> PCs
>>>>>>> and a wide range of UEFI Audio driver implementations. It is a much
>>>>>>> easier
>>>>>>> world if Wave PCM 16 bit just works every place. You could add a lot
>>>>>>> of
>>>>>>> complexity and try to encode the audio on the fly, maybe even in
>>>>>>> Linux
>>>>>>> proper but that falls down if you are booting from read only media
>>>>>>> like
>>>>>>> a
>>>>>>> DVD or backup tape (yes people still do that in server land).
>>>>>>>
>>>>>>> The other problem with flexibility is you just made the test matrix
>>>>>>> very
>>>>>>> large for every driver that needs to get implemented. For something
>>>>>>> as
>>>>>>> complex as Intel HDA how you hook up the hardware and what CODECs
>>>>>>> you
>>>>>>> use
>>>>>>> may impact the quality of the playback for a given board. Your EFI
>>>>>>> is
>>>>>>> likely
>>>>>>> going to pick a single encoding at that will get tested all the time
>>>>>>> if
>>>>>>> your
>>>>>>> system has audio, but all 50 other things you support not so much.
>>>>>>> So
>>>>>>> that
>>>>>>> will required testing, and some one with audiophile ears (or an AI
>>>>>>> program)
>>>>>>> to test all the combinations. I’m not kidding I get BZs on the
>>>>>>> quality
>>>>>>> of
>>>>>>> the boot bong on our systems.
>>>>>>>
>>>>>>>
>>>>>>>>> typedef struct EFI_SIMPLE_AUDIO_PROTOCOL {
>>>>>>>>> EFI_SIMPLE_AUDIO_PROTOCOL_RESET Reset;
>>>>>>>>> EFI_SIMPLE_AUDIO_PROTOCOL_START Start;
>>>>>>>>> EFI_SIMPLE_AUDIO_PROTOCOL_STOP Stop;
>>>>>>>>> } EFI_SIMPLE_AUDIO_PROTOCOL;
>>>>>>>>
>>>>>>>> This is now starting to look like something that belongs in
>>>>>>>> boot-time
>>>>>>>> firmware.  :)
>>>>>>>>
>>>>>>>
>>>>>>> I think that got a little too simple I’d go back and look at the
>>>>>>> example
>>>>>>> I
>>>>>>> posted to the thread but add an API to load the buffer, and then
>>>>>>> play
>>>>>>> the
>>>>>>> buffer (that way we can an API in the future to twiddle knobs). That
>>>>>>> API
>>>>>>> also implements the async EFI interface. Trust me the 1st thing that
>>>>>>> is
>>>>>>> going to happen when we add audio is some one is going to complain
>>>>>>> in
>>>>>>> xyz
>>>>>>> state we should mute audio, or we should honer audio volume and mute
>>>>>>> settings from setup, or from values set in the OS. Or some one is
>>>>>>> going
>>>>>>> to
>>>>>>> want the volume keys on the keyboard to work in EFI.
>>>>>>>
>>>>>>> Also if you need to pick apart the Wave PCM 16 byte file to feed it
>>>>>>> into
>>>>>>> the
>>>>>>> audio hardware that probably means we should have a library that
>>>>>>> does
>>>>>>> that
>>>>>>> work, so other Audio drivers can share that code. Also having a
>>>>>>> library
>>>>>>> makes it easier to write a unit test. We need to be security
>>>>>>> conscious
>>>>>>> as we
>>>>>>> need to treat the Audo file as attacker controlled data.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Andrew Fish
>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Signed,
>>>>>> Ethin D. Probst
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Signed,
>>> Ethin D. Probst
>>>
>>
>>
>> --
>> Signed,
>> Ethin D. Probst
>>
>>
>> 
>>
>>
>
>


-- 
Signed,
Ethin D. Probst

next prev parent reply	other threads:[~2021-04-16 23:50 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29 20:28 VirtIO Sound Driver (GSoC 2021) Ethin Probst
2021-03-30  3:17 ` [edk2-devel] " Nate DeSimone
2021-03-30 14:41   ` Ethin Probst
2021-03-31  6:34     ` Nate DeSimone
2021-03-30 21:50 ` Rafael Machado
2021-03-30 22:12   ` Ethin Probst
     [not found]   ` <16713E6D64EE917D.25648@groups.io>
2021-03-31  0:01     ` Ethin Probst
2021-03-31  5:02       ` Andrew Fish
2021-03-31  6:41         ` Nate DeSimone
2021-04-01  5:05           ` Ethin Probst
2021-04-05  4:42             ` Andrew Fish
2021-04-05  4:43               ` Ethin Probst
2021-04-05  5:10                 ` Andrew Fish
2021-04-06 14:16           ` Laszlo Ersek
2021-04-06 15:17             ` Ethin Probst
2021-04-13 12:28               ` Leif Lindholm
2021-04-13 14:16                 ` Andrew Fish
2021-04-13 16:53                   ` Ethin Probst
2021-04-13 21:38                     ` Andrew Fish
2021-04-13 23:47                       ` Ethin Probst
     [not found]                       ` <16758FB6436B1195.32393@groups.io>
2021-04-14  1:20                         ` Ethin Probst
2021-04-14  3:52                           ` Andrew Fish
2021-04-14 20:47                             ` Ethin Probst
2021-04-14 22:30                               ` Michael D Kinney
2021-04-14 23:54                                 ` Andrew Fish
2021-04-15  4:01                                   ` Ethin Probst
2021-04-15  4:58                                     ` Andrew Fish
2021-04-15  5:28                                       ` Ethin Probst
2021-04-15 12:07                                         ` Michael Brown
2021-04-15 16:42                                           ` Andrew Fish
2021-04-15 20:11                                             ` Ethin Probst
2021-04-15 22:35                                               ` Andrew Fish
2021-04-15 23:42                                                 ` Ethin Probst
2021-04-16  0:59                                                   ` Michael Brown
2021-04-16  5:13                                                     ` Andrew Fish
2021-04-16  5:33                                                       ` Ethin Probst
2021-04-16 11:34                                                         ` Leif Lindholm
2021-04-16 17:03                                                           ` Andrew Fish
2021-04-16 17:45                                                             ` Ethin Probst
2021-04-16 17:55                                                               ` Ethin Probst
2021-04-16 18:09                                                                 ` Andrew Fish
2021-04-16 23:50                                                                   ` Ethin Probst [this message]
2021-04-16 18:02                                                               ` Andrew Fish
2021-04-17 16:51                                                               ` Marvin Häuser
2021-04-17 17:31                                                                 ` Andrew Fish
2021-04-17 18:04                                                                   ` Marvin Häuser
2021-04-18  8:55                                                                     ` Ethin Probst
2021-04-18 15:22                                                                       ` Andrew Fish
2021-04-18 19:11                                                                         ` Marvin Häuser
2021-04-18 19:22                                                                           ` Marvin Häuser
2021-04-18 21:00                                                                             ` Andrew Fish
2021-04-16 13:22                                                   ` Marvin Häuser
2021-04-16 14:34                                                     ` Andrew Fish
2021-04-16 15:03                                                       ` Marvin Häuser
     [not found]                                                 ` <16762C957671127A.12361@groups.io>
2021-04-16  0:59                                                   ` Ethin Probst
2021-04-16  1:03                                                     ` Michael Brown
2021-04-16  2:06                                                       ` Ethin Probst
2021-04-16  3:48                                                       ` Andrew Fish
2021-04-16  4:29                                                         ` Ethin Probst
2021-04-15  9:51                                     ` Laszlo Ersek
2021-04-15 16:01                                       ` Andrew Fish
2021-04-04 22:05 ` Marvin Häuser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJQtwF21Op48ZqCaK8RM_8ZtUO_+MTxesf55++NkLPTbfLuOGw@mail.gmail.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox