[PATCH] BaseTools: VfrCompile/Pccts: Fix invalid bytes

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

From: "Joe Richey" <joerichey@google.com>
To: devel@edk2.groups.io
Cc: Bob Feng <bob.c.feng@intel.com>,
	Liming Gao <liming.gao@intel.com>,
	 Yonghong Zhu <yonghong.zhu@intel.com>
Subject: [PATCH] BaseTools: VfrCompile/Pccts: Fix invalid bytes
Date: Fri, 10 May 2019 21:24:01 -0700	[thread overview]
Message-ID: <20190511042401.115133-1-joerichey@google.com> (raw)

Three text files have invalid ASCII bytes, this can mess up tooling
that trys to operate on the repository, which will accidentally
classify them as binary data.

https://github.com/josephlr/edk2/tree/format

Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Yonghong Zhu <yonghong.zhu@intel.com>
Signed-off-by: Joe Richey <joerichey@google.com>
---
 BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt |  2 +-
 BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt   | 78 ++++++++++----------
 BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt       |  6 +-
 3 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt b/BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt
index 539cf775257b..f073e620ab68 100644
--- a/BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt
+++ b/BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt
@@ -40,7 +40,7 @@
     An bug (or at least an oddity) is that a reference to LT(1), LA(1),
     or LATEXT(1) in an action which immediately follows a token match
     in a rule refers to the token matched, not the token which is in
-    the lookahead buffer.  Consider:\x13
+    the lookahead buffer.  Consider:
 
         r : abc <<action alpha>> D <<action beta>> E;
 
diff --git a/BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt b/BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt
index 4a7d22e7f239..140b064217b7 100644
--- a/BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt
+++ b/BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt
@@ -9,48 +9,48 @@ NAME
      antlr - ANother Tool for Language Recognition
 
 SYNTAX
-     antlr [_\bo_\bp_\bt_\bi_\bo_\bn_\bs] _\bg_\br_\ba_\bm_\bm_\ba_\br__\bf_\bi_\bl_\be_\bs
+     antlr [options] grammar_files
 
 DESCRIPTION
-     _\bA_\bn_\bt_\bl_\br converts an extended form of context-free grammar into
+     Antlr converts an extended form of context-free grammar into
      a set of C functions which directly implement an efficient
      form of deterministic recursive-descent LL(k) parser.
      Context-free grammars may be augmented with predicates to
      allow semantics to influence parsing; this allows a form of
      context-sensitive parsing.  Selective backtracking is also
      available to handle non-LL(k) and even non-LALR(k) con-
-     structs.  _\bA_\bn_\bt_\bl_\br also produces a definition of a lexer which
+     structs.  Antlr also produces a definition of a lexer which
      can be automatically converted into C code for a DFA-based
-     lexer by _\bd_\bl_\bg.  Hence, _\ba_\bn_\bt_\bl_\br serves a function much like that
-     of _\by_\ba_\bc_\bc, however, it is notably more flexible and is more
-     integrated with a lexer generator (_\ba_\bn_\bt_\bl_\br directly generates
-     _\bd_\bl_\bg code, whereas _\by_\ba_\bc_\bc and _\bl_\be_\bx are given independent
-     descriptions).  Unlike _\by_\ba_\bc_\bc which accepts LALR(1) grammars,
-     _\ba_\bn_\bt_\bl_\br accepts LL(k) grammars in an extended BNF notation -
+     lexer by dlg.  Hence, antlr serves a function much like that
+     of yacc, however, it is notably more flexible and is more
+     integrated with a lexer generator (antlr directly generates
+     dlg code, whereas yacc and lex are given independent
+     descriptions).  Unlike yacc which accepts LALR(1) grammars,
+     antlr accepts LL(k) grammars in an extended BNF notation -
      which eliminates the need for precedence rules.
 
-     Like _\by_\ba_\bc_\bc grammars, _\ba_\bn_\bt_\bl_\br grammars can use automatically-
+     Like yacc grammars, antlr grammars can use automatically-
      maintained symbol attribute values referenced as dollar
-     variables.  Further, because _\ba_\bn_\bt_\bl_\br generates top-down
+     variables.  Further, because antlr generates top-down
      parsers, arbitrary values may be inherited from parent rules
-     (passed like function parameters).  _\bA_\bn_\bt_\bl_\br also has a mechan-
+     (passed like function parameters).  Antlr also has a mechan-
      ism for creating and manipulating abstract-syntax-trees.
 
-     There are various other niceties in _\ba_\bn_\bt_\bl_\br, including the
+     There are various other niceties in antlr, including the
      ability to spread one grammar over multiple files or even
      multiple grammars in a single file, the ability to generate
      a version of the grammar with actions stripped out (for
      documentation purposes), and lots more.
 
 OPTIONS
-     -ck _\bn
-          Use up to _\bn symbols of lookahead when using compressed
+     -ck n
+          Use up to n symbols of lookahead when using compressed
           (linear approximation) lookahead.  This type of looka-
           head is very cheap to compute and is attempted before
           full LL(k) lookahead, which is of exponential complex-
           ity in the worst case.  In general, the compressed loo-
-          kahead can be much deeper (e.g, -ck 10) _\bt_\bh_\ba_\bn _\bt_\bh_\be _\bf_\bu_\bl_\bl
-          _\bl_\bo_\bo_\bk_\ba_\bh_\be_\ba_\bd (_\bw_\bh_\bi_\bc_\bh _\bu_\bs_\bu_\ba_\bl_\bl_\by _\bm_\bu_\bs_\bt _\bb_\be _\bl_\be_\bs_\bs _\bt_\bh_\ba_\bn _\b4).
+          kahead can be much deeper (e.g, -ck 10) than the full
+          lookahead (which usually must be less than 4).
 
      -CC  Generate C++ output from both ANTLR and DLG.
 
@@ -86,20 +86,20 @@ OPTIONS
 
      -ga  Generate ANSI-compatible code (default case).  This has
           not been rigorously tested to be ANSI XJ11 C compliant,
-          but it is close.  The normal output of _\ba_\bn_\bt_\bl_\br is
+          but it is close.  The normal output of antlr is
           currently compilable under both K&R, ANSI C, and C++-
-          this option does nothing because _\ba_\bn_\bt_\bl_\br generates a
+          this option does nothing because antlr generates a
           bunch of #ifdef's to do the right thing depending on
           the language.
 
-     -gc  Indicates that _\ba_\bn_\bt_\bl_\br should generate no C code, i.e.,
+     -gc  Indicates that antlr should generate no C code, i.e.,
           only perform analysis on the grammar.
 
-     -gd  C code is inserted in each of the _\ba_\bn_\bt_\bl_\br generated pars-
+     -gd  C code is inserted in each of the antlr generated pars-
           ing functions to provide for user-defined handling of a
           detailed parse trace.  The inserted code consists of
           calls to the user-supplied macros or functions called
-          zzTRACEIN and zzTRACEOUT.  The only argument is a _\bc_\bh_\ba_\br
+          zzTRACEIN and zzTRACEOUT.  The only argument is a char
           * pointing to a C-style string which is the grammar
           rule recognized by the current parsing function.  If no
           definition is given for the trace functions, upon rule
@@ -110,17 +110,17 @@ OPTIONS
 
      -gh  Generate stdpccts.h for non-ANTLR-generated files to
           include.  This file contains all defines needed to
-          describe the type of parser generated by _\ba_\bn_\bt_\bl_\br (e.g.
+          describe the type of parser generated by antlr (e.g.
           how much lookahead is used and whether or not trees are
           constructed) and contains the header action specified
           by the user.
 
      -gk  Generate parsers that delay lookahead fetches until
-          needed.  Without this option, _\ba_\bn_\bt_\bl_\br generates parsers
-          which always have _\bk tokens of lookahead available.
+          needed.  Without this option, antlr generates parsers
+          which always have k tokens of lookahead available.
 
      -gl  Generate line info about grammar actions in C parser of
-          the form # _\bl_\bi_\bn_\be "_\bf_\bi_\bl_\be" which makes error messages from
+          the form # line "file" which makes error messages from
           the C/C++ compiler make more sense as they will point
           into the grammar file not the resulting C file.
           Debugging is easier as well, because you will step
@@ -128,18 +128,18 @@ OPTIONS
 
      -gs  Do not generate sets for token expression lists;
           instead generate a ||-separated sequence of
-          LA(1)==_\bt_\bo_\bk_\be_\bn__\bn_\bu_\bm_\bb_\be_\br.  The default is to generate sets.
+          LA(1)==token_number.  The default is to generate sets.
 
      -gt  Generate code for Abstract-Syntax Trees.
 
      -gx  Do not create the lexical analyzer files (dlg-related).
           This option should be given when the user wishes to
           provide a customized lexical analyzer.  It may also be
-          used in _\bm_\ba_\bk_\be scripts to cause only the parser to be
+          used in make scripts to cause only the parser to be
           rebuilt when a change not affecting the lexical struc-
           ture is made to the input grammars.
 
-     -k _\bn Set k of LL(k) to _\bn; i.e. set tokens of look-ahead
+     -k n Set k of LL(k) to n; i.e. set tokens of look-ahead
           (default==1).
 
      -o dir
@@ -171,9 +171,9 @@ OPTIONS
           release with option -pr on.  Context computation is off
           by default.
 
-     -rl _\bn
+     -rl n
           Limit the maximum number of tree nodes used by grammar
-          analysis to _\bn.  Occasionally, _\ba_\bn_\bt_\bl_\br is unable to
+          analysis to n.  Occasionally, antlr is unable to
           analyze a grammar submitted by the user.  This rare
           situation can only occur when the grammar is large and
           the amount of lookahead is greater than one.  A non-
@@ -184,14 +184,14 @@ OPTIONS
           the number of calls to the full LL(k) algorithm.  An
           error message will be displayed, if this limit is
           reached, which indicates the grammar construct being
-          analyzed when _\ba_\bn_\bt_\bl_\br hit a non-linearity.  Use this
-          option if _\ba_\bn_\bt_\bl_\br seems to go out to lunch and your disk
-          start thrashing; try _\bn=10000 to start.  Once the
+          analyzed when antlr hit a non-linearity.  Use this
+          option if antlr seems to go out to lunch and your disk
+          start thrashing; try n=10000 to start.  Once the
           offending construct has been identified, try to remove
-          the ambiguity that _\ba_\bn_\bt_\bl_\br was trying to overcome with
+          the ambiguity that antlr was trying to overcome with
           large lookahead analysis.  The introduction of (...)?
           backtracking blocks eliminates some of these problems -
-          _\ba_\bn_\bt_\bl_\br does not analyze alternatives that begin with
+          antlr does not analyze alternatives that begin with
           (...)? (it simply backtracks, if necessary, at run
           time).
 
@@ -208,7 +208,7 @@ OPTIONS
           as the parser file.
 
 SPECIAL CONSIDERATIONS
-     _\bA_\bn_\bt_\bl_\br works...  we think.  There is no implicit guarantee of
+     Antlr works...  we think.  There is no implicit guarantee of
      anything.  We reserve no legal rights to the software known
      as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS
      is in the public domain.  An individual or company may do
@@ -234,7 +234,7 @@ FILES
           output C++ parser when C++ mode is used.
 
      parser.dlg
-          output _\bd_\bl_\bg lexical analyzer.
+          output dlg lexical analyzer.
 
      err.c
           token string array, error sets and error support rou-
@@ -251,7 +251,7 @@ FILES
           erated by default.  Not used in C++ mode.
 
      tokens.h
-          output #_\bd_\be_\bf_\bi_\bn_\be_\bs for tokens used and function prototypes
+          output #defines for tokens used and function prototypes
           for functions generated for rules.
 
 
diff --git a/BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt b/BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt
index 06b320de2abb..5ea5e933c808 100644
--- a/BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt
+++ b/BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt
@@ -9,14 +9,14 @@ NAME
      dlg - DFA Lexical Analyzer Generator
 
 SYNTAX
-     dlg [_\bo_\bp_\bt_\bi_\bo_\bn_\bs] _\bl_\be_\bx_\bi_\bc_\ba_\bl__\bs_\bp_\be_\bc [_\bo_\bu_\bt_\bp_\bu_\bt__\bf_\bi_\bl_\be]
+     dlg [options] lexical_spec [output_file]
 
 DESCRIPTION
      dlg is a tool that produces fast deterministic finite auto-
      mata for recognizing regular expressions in input.
 
 OPTIONS
-     -CC  Generate C++ output.  The _\bo_\bu_\bt_\bp_\bu_\bt__\bf_\bi_\bl_\be is not specified
+     -CC  Generate C++ output.  The output_file is not specified
           in this case.
 
      -C[ level]
@@ -69,7 +69,7 @@ OPTIONS
           in or send output to standard out.
 
 SPECIAL CONSIDERATIONS
-     _\bD_\bl_\bg works...  we think.  There is no implicit guarantee of
+     Dlg works...  we think.  There is no implicit guarantee of
      anything.  We reserve no legal rights to the software known
      as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS
      is in the public domain.  An individual or company may do
-- 
2.21.0.1020.gf2820cf01a-goog

next             reply	other threads:[~2019-05-11  4:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-11  4:24 Joe Richey [this message]
2019-05-13  8:16 ` [PATCH] BaseTools: VfrCompile/Pccts: Fix invalid bytes Bob Feng

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:539cf775257 dfblob:f073e620ab6 dfblob:4a7d22e7f23
dfblob:140b064217b dfblob:06b320de2ab dfblob:5ea5e933c80 )
 OR (
bs:"[PATCH] BaseTools: VfrCompile/Pccts: Fix invalid bytes" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190511042401.115133-1-joerichey@google.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox