From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by mx.groups.io with SMTP id smtpd.web09.16164.1604510649199926489 for ; Wed, 04 Nov 2020 09:24:09 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g9BJF9HR; spf=pass (domain: redhat.com, ip: 63.128.21.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1604510648; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GV1h7TVsp6G2sZFkoqkkzpR43T71LVeoKnXHA2k65CA=; b=g9BJF9HRzc/gLIh/iGEPdPbkragwYfvvwmlpjgk4CKoXnvZ/sna+rHy8igq4ASBuhsCzZV Ve9v4cXqXUV65UBqonrUZwLNJtNPf7KFPcI1JNFnKtRiOlKp1H4cmazfehMqccNWIWhyu0 6bDtcvkIqPNA5oMoIDE/uCSpvDavUOo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-476-dxcOpCcrM5iC8WyFpiQ-hA-1; Wed, 04 Nov 2020 12:24:02 -0500 X-MC-Unique: dxcOpCcrM5iC8WyFpiQ-hA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 946DE805F1F; Wed, 4 Nov 2020 17:24:00 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-112-163.ams2.redhat.com [10.36.112.163]) by smtp.corp.redhat.com (Postfix) with ESMTP id D955C1007615; Wed, 4 Nov 2020 17:23:58 +0000 (UTC) Subject: Re: [edk] [PATCH] ShellPkg/edit: allow non-ASCII characters in edit To: Heinrich Schuchardt , Ray Ni , Zhichao Gao , Leif Lindholm , Ard Biesheuvel Cc: devel@edk2.groups.io, Liming Gao References: <20201102231114.31099-1-xypron.glpk@gmx.de> From: "Laszlo Ersek" Message-ID: Date: Wed, 4 Nov 2020 18:23:57 +0100 MIME-Version: 1.0 In-Reply-To: <20201102231114.31099-1-xypron.glpk@gmx.de> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=lersek@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Hi Heinrich, On 11/03/20 00:11, Heinrich Schuchardt wrote: > REF: https://bugzilla.tianocore.org/show_bug.cgi?id=2339 > > Currently it is not possible to add letters outside the range > U+0020 - U+007F to a file using the edit command. > > In Unicode the following are control characters: > > * U+0000—U+001F (C0 controls) > * U+007F (DEL) > * U+0080—U+009F (C1 controls). > > For reference see: > > * https://unicode.org/charts/PDF/U0000.pdf > * https://unicode.org/charts/PDF/U0080.pdf > > So the characters we should exclude from the file buffer are: > U+0000 - U+001f, U+007f - U009F > > Allow all other characters as input to the file buffer in Unicode mode. > Allow only ASCII characters as input in ASCII mode. > > When saving a file in ASCII mode replace non-ASCII characters by a question > mark ('?'). > > For editing texts with double width characters (e.g. Japanese) further > adjustments will be needed. > > Signed-off-by: Heinrich Schuchardt > --- > Resent. > Original message https://edk2.groups.io/g/devel/message/51205 thank you for the repost; the patch now looks well-formed to me! Thanks Laszlo > --- > .../Edit/FileBuffer.c | 21 +++++++++++++++---- > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/ShellPkg/Library/UefiShellDebug1CommandsLib/Edit/FileBuffer.c b/ShellPkg/Library/UefiShellDebug1CommandsLib/Edit/FileBuffer.c > index 5659ec981054..3923b83670fa 100644 > --- a/ShellPkg/Library/UefiShellDebug1CommandsLib/Edit/FileBuffer.c > +++ b/ShellPkg/Library/UefiShellDebug1CommandsLib/Edit/FileBuffer.c > @@ -1360,6 +1360,8 @@ GetNewLine ( > /** > Change a Unicode string to an ASCII string. > > + Non-ASCII characters are replaced by '?'. > + > @param[in] UStr The Unicode string. > @param[in] Length The maximum size of AStr. > @param[out] AStr ASCII string to pass out. > @@ -1378,8 +1380,12 @@ UnicodeToAscii ( > // > // just buffer copy, not character copy > // > - for (Index = 0; Index < Length; Index++) { > - *AStr++ = (CHAR8) *UStr++; > + for (Index = 0; Index < Length; Index++, UStr++) { > + if (*UStr < 0x80) { > + *AStr++ = (CHAR8) *UStr; > + } else { > + *AStr++ = '?'; > + } > } > > return Index; > @@ -2154,9 +2160,16 @@ FileBufferDoCharInput ( > > default: > // > - // DEAL WITH ASCII CHAR, filter out thing like ctrl+f > + // Do not add Unicode control characters to the file buffer: > // > - if (Char > 127 || Char < 32) { > + // * U+0000-U+001f (C0 controls) > + // * U+007f (DEL) > + // * U+0080-U+009f (C1 controls) > + // > + // Do not add non-ASCII characters in ASCII mode. > + // > + if (Char < 0x20 || (Char >= 0x7f && > + (Char <= 0x9f || FileBuffer.FileType == FileTypeAscii))) { > Status = StatusBarSetStatusString (L"Unknown Command"); > } else { > Status = FileBufferAddChar (Char); > -- > 2.28.0 >