From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Mon, 01 Jul 2019 04:02:28 -0700 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 45A2781DE3; Mon, 1 Jul 2019 11:02:16 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (unknown [10.36.118.14]) by smtp.corp.redhat.com (Postfix) with ESMTP id DE2F71001B27; Mon, 1 Jul 2019 11:02:12 +0000 (UTC) Subject: Re: [edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification To: devel@edk2.groups.io, zhichao.gao@intel.com Cc: Michael D Kinney , Liming Gao , Marvin Hauser References: <20190628035746.24160-1-zhichao.gao@intel.com> From: "Laszlo Ersek" Message-ID: Date: Mon, 1 Jul 2019 13:02:12 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190628035746.24160-1-zhichao.gao@intel.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 01 Jul 2019 11:02:27 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 06/28/19 05:57, Gao, Zhichao wrote: > REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1891 > > Adjust the coding style. > Set DestinationSize before return. > Add addition decription for the RETURN_SUCCESS case. > > Cc: Michael D Kinney > Cc: Liming Gao > Cc: Marvin Hauser > Cc: Laszlo Ersek > > Zhichao Gao (3): > MdePkg/BaseLib: Adjust the coding style in Base64Decode > MdePkg/BaseLib: Base64Decode: Make DestinationSize complied to spec > MdePkg/BaseLib: Base64Decode: Add decription for RETURN_SUCCESS > > MdePkg/Library/BaseLib/String.c | 28 ++++++++++++++++++---------- > 1 file changed, 18 insertions(+), 10 deletions(-) > Issues that have not been addressed by this patch set, but should be: (1) the leading comment says, "Produce Null-terminated binary data in the output buffer". That's bogus, the binary output is never NUL-terminated (nor should it be). (2) One of the RETURN_INVALID_PARAMETER cases is documented as: "If SourceLength or DestinationSize is bigger than (MAX_ADDRESS -(UINTN)Destination )." There are two problems with this. (2a) SourceLength has nothing to do with Destination. The comment should be updated -- making sure that (Source + SourceLength) do not overflow MAX_ADDRESS is worthwhile, but the comment is misleading. (2b) The code that actually performs the check is off by one. What we need is that the byte *one past* each buffer still be expressible as a valid address, so mathematically we need (UINTN)Buffer + BufferLength <= MAX_ADDRESS If you reorder this for a C expression that cannot overflow, you get BufferLength <= MAX_ADDRESS - (UINTN)Buffer If you negate this, to express the failure condition, you get BufferLength > MAX_ADDRESS - (UINTN)Buffer Therefore, the comment for RETURN_INVALID_PARAMETER that says DestinationSize is bigger than (MAX_ADDRESS -(UINTN)Destination) is correct; *however*, the code is wrong: *DestinationSize >= (MAX_ADDRESS - (UINTN)Destination) This failure condition is too lax (IOW, the success condition is too strict), and shuld be restricted (IOW the success condition should be relaxed). (3) Maintaining a signed integer (INTN) BufferSize, and then doing arithmetic on it with UINTN values, is really bad practice. In particular, the following expression makes me nervous: BufferSize += ActualSourceLength / 4 * 3; A separate UINTN variable called "EqualSigns" should be introduced, BufferSize should be made an UINTN, and the logic should be reworked using those. (4) "DecodingTable" should be called "mDecodingTable". (5) The decoding loop checks (SourceIndex < SourceLength) && (DestinationIndex < *DestinationSize) and we have an inner loop do { Chr = DecodingTable[(UINT8) Source[SourceIndex++]]; } while (Chr == BAD_V); Now consider the following case. The caller passes in a valid (*DestinationSize) that is larger than what is requried for the decoding. In addition, assume that the input data (Source), which is otherwise completely valid, is terminated with a space character (or even with a NUL character, although NUL-termination is not required by the function's specification). In the above situation, the innermost loop, which scans for BAD_V, will fall off the end of Source. The outermost loop condition will evaluate to TRUE (we have some Source characters left -- namely, one space, or NUL), and we have room in the Destination buffer too (the caller specified / allocated a larger DestinationSize thatn what is required). So we reach the innermost loop, and the space character (BAD_V) will lead it right off the end of Source. The outermost loop condition should be changed. First, the SourceIndex subcondition should be dropped altogether. Second, *DestinationSize should be set to the actual decoded data size *before* starting the actual decoding. Then, if we still have output bytes to produce, in the outermost loop, the Source scanning in the innermost BAD_V loop is guaranteed to remain in-bounds. ... Honestly, at this point, I sort of wish we just rewrote this function from zero. The current *approach* of the function is wrong. The function currently forms a mental image of how the input data "should" look, and tries to parse that -- it tries to shoehorn the input into the "expected" format. If the input does not look like the expectation, we run into gaps here and there. Instead, the function should follow a state machine approach, where the outermost loop scans input characters one by one, and makes *absolutely no assumption* about the character that has just been found. Every UINT8 character in the input should be checked against the full possible UINT8 domain (valid BASE64 range, the equal sign, tolerated whitespace, and the rest), and acted upon accordingly. For example, valid BASE64 characters should be accumulated into a 24-bit value, and flushed when the latter becomes full, and also at the end of the scanning loop. Counting vs. decoding can be implemented by making just the flushing operation conditional (do not write to memory). Thanks Laszlo