From mboxrd@z Thu Jan  1 00:00:00 1970
Authentication-Results: mx.groups.io;
 dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com)
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by groups.io with SMTP; Mon, 01 Jul 2019 04:02:28 -0700
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 45A2781DE3;
	Mon,  1 Jul 2019 11:02:16 +0000 (UTC)
Received: from lacos-laptop-7.usersys.redhat.com (unknown [10.36.118.14])
	by smtp.corp.redhat.com (Postfix) with ESMTP id DE2F71001B27;
	Mon,  1 Jul 2019 11:02:12 +0000 (UTC)
Subject: Re: [edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification
To: devel@edk2.groups.io, zhichao.gao@intel.com
Cc: Michael D Kinney <michael.d.kinney@intel.com>,
 Liming Gao <liming.gao@intel.com>, Marvin Hauser <mhaeuser@outlook.de>
References: <20190628035746.24160-1-zhichao.gao@intel.com>
From: "Laszlo Ersek" <lersek@redhat.com>
Message-ID: <c495bd0b-ea4d-7206-8a4f-a7149760d19a@redhat.com>
Date: Mon, 1 Jul 2019 13:02:12 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <20190628035746.24160-1-zhichao.gao@intel.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 01 Jul 2019 11:02:27 +0000 (UTC)
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit

On 06/28/19 05:57, Gao, Zhichao wrote:
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1891
> 
> Adjust the coding style.
> Set DestinationSize before return.
> Add addition decription for the RETURN_SUCCESS case.
> 
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Marvin Hauser <mhaeuser@outlook.de>
> Cc: Laszlo Ersek <lersek@redhat.com>
> 
> Zhichao Gao (3):
>   MdePkg/BaseLib: Adjust the coding style in Base64Decode
>   MdePkg/BaseLib: Base64Decode: Make DestinationSize complied to spec
>   MdePkg/BaseLib: Base64Decode: Add decription for RETURN_SUCCESS
> 
>  MdePkg/Library/BaseLib/String.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 

Issues that have not been addressed by this patch set, but should be:

(1) the leading comment says, "Produce Null-terminated binary data in
the output buffer". That's bogus, the binary output is never
NUL-terminated (nor should it be).


(2) One of the RETURN_INVALID_PARAMETER cases is documented as:

"If SourceLength or DestinationSize is bigger than (MAX_ADDRESS
-(UINTN)Destination )."

There are two problems with this.


(2a) SourceLength has nothing to do with Destination. The comment should
be updated -- making sure that (Source + SourceLength) do not overflow
MAX_ADDRESS is worthwhile, but the comment is misleading.


(2b) The code that actually performs the check is off by one. What we
need is that the byte *one past* each buffer still be expressible as a
valid address, so mathematically we need

  (UINTN)Buffer + BufferLength <= MAX_ADDRESS

If you reorder this for a C expression that cannot overflow, you get

  BufferLength <= MAX_ADDRESS - (UINTN)Buffer

If you negate this, to express the failure condition, you get

  BufferLength > MAX_ADDRESS - (UINTN)Buffer

Therefore, the comment for RETURN_INVALID_PARAMETER that says

  DestinationSize is bigger than (MAX_ADDRESS -(UINTN)Destination)

is correct; *however*, the code is wrong:

  *DestinationSize >= (MAX_ADDRESS - (UINTN)Destination)

This failure condition is too lax (IOW, the success condition is too
strict), and shuld be restricted (IOW the success condition should be
relaxed).


(3) Maintaining a signed integer (INTN) BufferSize, and then doing
arithmetic on it with UINTN values, is really bad practice. In
particular, the following expression makes me nervous:

  BufferSize += ActualSourceLength / 4 * 3;

A separate UINTN variable called "EqualSigns" should be introduced,
BufferSize should be made an UINTN, and the logic should be reworked
using those.


(4) "DecodingTable" should be called "mDecodingTable".


(5) The decoding loop checks

  (SourceIndex < SourceLength) && (DestinationIndex < *DestinationSize)

and we have an inner loop

      do {
        Chr = DecodingTable[(UINT8) Source[SourceIndex++]];
      } while (Chr == BAD_V);

Now consider the following case. The caller passes in a valid
(*DestinationSize) that is larger than what is requried for the
decoding. In addition, assume that the input data (Source), which is
otherwise completely valid, is terminated with a space character (or
even with a NUL character, although NUL-termination is not required by
the function's specification).

In the above situation, the innermost loop, which scans for BAD_V, will
fall off the end of Source. The outermost loop condition will evaluate
to TRUE (we have some Source characters left -- namely, one space, or
NUL), and we have room in the Destination buffer too (the caller
specified / allocated a larger DestinationSize thatn what is required).
So we reach the innermost loop, and the space character (BAD_V) will
lead it right off the end of Source.

The outermost loop condition should be changed.

First, the SourceIndex subcondition should be dropped altogether.

Second, *DestinationSize should be set to the actual decoded data size
*before* starting the actual decoding. Then, if we still have output
bytes to produce, in the outermost loop, the Source scanning in the
innermost BAD_V loop is guaranteed to remain in-bounds.


... Honestly, at this point, I sort of wish we just rewrote this
function from zero. The current *approach* of the function is wrong. The
function currently forms a mental image of how the input data "should"
look, and tries to parse that -- it tries to shoehorn the input into the
"expected" format. If the input does not look like the expectation, we
run into gaps here and there.

Instead, the function should follow a state machine approach, where the
outermost loop scans input characters one by one, and makes *absolutely
no assumption* about the character that has just been found. Every UINT8
character in the input should be checked against the full possible UINT8
domain (valid BASE64 range, the equal sign, tolerated whitespace, and
the rest), and acted upon accordingly.

For example, valid BASE64 characters should be accumulated into a 24-bit
value, and flushed when the latter becomes full, and also at the end of
the scanning loop.

Counting vs. decoding can be implemented by making just the flushing
operation conditional (do not write to memory).

Thanks
Laszlo