From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Mon, 01 Jul 2019 11:01:44 -0700 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8C85F821F3; Mon, 1 Jul 2019 18:01:30 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (unknown [10.36.118.14]) by smtp.corp.redhat.com (Postfix) with ESMTP id EA0DA17CF8; Mon, 1 Jul 2019 18:01:26 +0000 (UTC) Subject: Re: [edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification From: "Laszlo Ersek" To: devel@edk2.groups.io, zhichao.gao@intel.com Cc: Michael D Kinney , Liming Gao , Marvin Hauser Reply-To: devel@edk2.groups.io, lersek@redhat.com References: <20190628035746.24160-1-zhichao.gao@intel.com> Message-ID: <2327e0d5-f79d-59e9-9413-12a315203d28@redhat.com> Date: Mon, 1 Jul 2019 20:01:25 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 01 Jul 2019 18:01:35 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 07/01/19 13:02, Laszlo Ersek wrote: > ... Honestly, at this point, I sort of wish we just rewrote this > function from zero. The current *approach* of the function is wrong. The > function currently forms a mental image of how the input data "should" > look, and tries to parse that -- it tries to shoehorn the input into the > "expected" format. If the input does not look like the expectation, we > run into gaps here and there. > > Instead, the function should follow a state machine approach, where the > outermost loop scans input characters one by one, and makes *absolutely > no assumption* about the character that has just been found. Every UINT8 > character in the input should be checked against the full possible UINT8 > domain (valid BASE64 range, the equal sign, tolerated whitespace, and > the rest), and acted upon accordingly. > > For example, valid BASE64 characters should be accumulated into a 24-bit > value, and flushed when the latter becomes full, and also at the end of > the scanning loop. > > Counting vs. decoding can be implemented by making just the flushing > operation conditional (do not write to memory). If time allows, I'd like to attempt contributing a version like this. Please give me a bit of time to work on that. Thanks Laszlo