From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: mx.groups.io;
 dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com)
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by groups.io with SMTP; Mon, 01 Jul 2019 11:01:44 -0700
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 8C85F821F3;
	Mon,  1 Jul 2019 18:01:30 +0000 (UTC)
Received: from lacos-laptop-7.usersys.redhat.com (unknown [10.36.118.14])
	by smtp.corp.redhat.com (Postfix) with ESMTP id EA0DA17CF8;
	Mon,  1 Jul 2019 18:01:26 +0000 (UTC)
Subject: Re: [edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification
From: "Laszlo Ersek" <lersek@redhat.com>
To: devel@edk2.groups.io, zhichao.gao@intel.com
Cc: Michael D Kinney <michael.d.kinney@intel.com>,
 Liming Gao <liming.gao@intel.com>, Marvin Hauser <mhaeuser@outlook.de>
Reply-To: devel@edk2.groups.io, lersek@redhat.com
References: <20190628035746.24160-1-zhichao.gao@intel.com>
 <c495bd0b-ea4d-7206-8a4f-a7149760d19a@redhat.com>
Message-ID: <2327e0d5-f79d-59e9-9413-12a315203d28@redhat.com>
Date: Mon, 1 Jul 2019 20:01:25 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <c495bd0b-ea4d-7206-8a4f-a7149760d19a@redhat.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 01 Jul 2019 18:01:35 +0000 (UTC)
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit

On 07/01/19 13:02, Laszlo Ersek wrote:

> ... Honestly, at this point, I sort of wish we just rewrote this
> function from zero. The current *approach* of the function is wrong. The
> function currently forms a mental image of how the input data "should"
> look, and tries to parse that -- it tries to shoehorn the input into the
> "expected" format. If the input does not look like the expectation, we
> run into gaps here and there.
> 
> Instead, the function should follow a state machine approach, where the
> outermost loop scans input characters one by one, and makes *absolutely
> no assumption* about the character that has just been found. Every UINT8
> character in the input should be checked against the full possible UINT8
> domain (valid BASE64 range, the equal sign, tolerated whitespace, and
> the rest), and acted upon accordingly.
> 
> For example, valid BASE64 characters should be accumulated into a 24-bit
> value, and flushed when the latter becomes full, and also at the end of
> the scanning loop.
> 
> Counting vs. decoding can be implemented by making just the flushing
> operation conditional (do not write to memory).

If time allows, I'd like to attempt contributing a version like this.
Please give me a bit of time to work on that.

Thanks
Laszlo