From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=pass header.i=@google.com header.s=20161025 header.b=Fs6+pbF8; spf=pass (domain: flex--joerichey.bounces.google.com, ip: 209.85.221.202, mailfrom: 3y07wxakkbuyrwmzqkpm6owwotm.kwulm3mtmlsa.ozw2x0.qw@flex--joerichey.bounces.google.com) Received: from mail-vk1-f202.google.com (mail-vk1-f202.google.com [209.85.221.202]) by groups.io with SMTP; Fri, 10 May 2019 21:24:05 -0700 Received: by mail-vk1-f202.google.com with SMTP id s139so3409173vkf.2 for ; Fri, 10 May 2019 21:24:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc :content-transfer-encoding; bh=FTY4nQ5CAJ53WGYd4Uew9ZzeHWjSugp0e+TyZzPl0mc=; b=Fs6+pbF8ewnauWnoPhc6A8oLkVYKpdo3hC8Z0yZygxDv2IgzpWglus7eDasRwfC8hw XaEVnPM4Y89mZiM/b+z91dloRsP7C8CfftW0vg2Bk2m+2FS7Vc6U/nyI0UelakPmaTAB qrpujtODKrt0EMCz0bi50i8s2/9jVEieF4rL2vdCqvC7Y+kuK6ev1fhIObvgOpqXJV+x 2dpPqZitVJi42toCot0JoT3GDQSJlTeFps1IG/cyPSkSOOYICNGcsaSUk40jMTiD1s/a QwaY276lT8b5KOmmJmryg46GhXN2+Nbnrr5xHNO+zaQA03in+sS43iJvtiIB101u1moC J4ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc :content-transfer-encoding; bh=FTY4nQ5CAJ53WGYd4Uew9ZzeHWjSugp0e+TyZzPl0mc=; b=hIhmlVXEte0lK/D6JWZJFRqfaCkr9n6J/akSC35Zurntjr9grv/TxMS0iTl95KDAlZ QKyLLbVFlTN1/HdMIUXjmLx4/jDrNMrVx7JmICAkVDozFfG7HdWyP/la6mOto48cpOxG 4ByuGxNIh0vJdrn4l5uko1QQyAp8KT8l3BVuY43yWF++CuMRt4cqwfTcrRRJc4KvS1Yc AnGsVtRy1IRO/G2Bc5PfgRItTMacOMujOCVbglhLgVbGKnSa4R6rLYu3AgSSeiGKKhEv keBNUePE8rBzh3IiWBbLbtuBazTX+iS9DB+Jg12coafX+JG76wMSo6PoyVGewl4Aohpq TiZA== X-Gm-Message-State: APjAAAW8ogOux8A6pXuPxpqJJLEwbQhZiF0/XO761s0jO5IYKbuGO9IO ZO2XmEBaUUWlA1P+LYdIa5qmfM0c1JXEK2UgpwxVwrYaygG5/AuJmOTL1v+JUBWiqAyko3zrk4G u6+QdIrvJd40osYkXClOgPOM/nfhREUROwKId23rd1ZmvH7zGUCng88jJfDi1MXOg8X4= X-Google-Smtp-Source: APXvYqyE8pBCMBWXu6ga+kV8gD1LukC0m4AsYhwjE2X28wIW+RZFzpZY+bAqzsnQwWZq8wqGhnjNucMWboLdNQg= X-Received: by 2002:a1f:a410:: with SMTP id n16mr7141770vke.73.1557548643787; Fri, 10 May 2019 21:24:03 -0700 (PDT) Date: Fri, 10 May 2019 21:24:01 -0700 Message-Id: <20190511042401.115133-1-joerichey@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.21.0.1020.gf2820cf01a-goog Subject: [PATCH] BaseTools: VfrCompile/Pccts: Fix invalid bytes From: "Joe Richey" To: devel@edk2.groups.io Cc: Bob Feng , Liming Gao , Yonghong Zhu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Three text files have invalid ASCII bytes, this can mess up tooling that trys to operate on the repository, which will accidentally classify them as binary data. https://github.com/josephlr/edk2/tree/format Cc: Bob Feng Cc: Liming Gao Cc: Yonghong Zhu Signed-off-by: Joe Richey --- BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt | 2 +- BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt | 78 ++++++++++----= ------ BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt | 6 +- 3 files changed, 43 insertions(+), 43 deletions(-) diff --git a/BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt b/BaseT= ools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt index 539cf775257b..f073e620ab68 100644 --- a/BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt +++ b/BaseTools/Source/C/VfrCompile/Pccts/KNOWN_PROBLEMS.txt @@ -40,7 +40,7 @@ An bug (or at least an oddity) is that a reference to LT(1), LA(1), or LATEXT(1) in an action which immediately follows a token match in a rule refers to the token matched, not the token which is in - the lookahead buffer. Consider:=13 + the lookahead buffer. Consider: =20 r : abc <> D <> E; =20 diff --git a/BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt b/BaseToo= ls/Source/C/VfrCompile/Pccts/antlr/antlr1.txt index 4a7d22e7f239..140b064217b7 100644 --- a/BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt +++ b/BaseTools/Source/C/VfrCompile/Pccts/antlr/antlr1.txt @@ -9,48 +9,48 @@ NAME antlr - ANother Tool for Language Recognition =20 SYNTAX - antlr [_=08o_=08p_=08t_=08i_=08o_=08n_=08s] _=08g_=08r_=08a_=08m_=08m= _=08a_=08r__=08f_=08i_=08l_=08e_=08s + antlr [options] grammar_files =20 DESCRIPTION - _=08A_=08n_=08t_=08l_=08r converts an extended form of context-free g= rammar into + Antlr converts an extended form of context-free grammar into a set of C functions which directly implement an efficient form of deterministic recursive-descent LL(k) parser. Context-free grammars may be augmented with predicates to allow semantics to influence parsing; this allows a form of context-sensitive parsing. Selective backtracking is also available to handle non-LL(k) and even non-LALR(k) con- - structs. _=08A_=08n_=08t_=08l_=08r also produces a definition of a l= exer which + structs. Antlr also produces a definition of a lexer which can be automatically converted into C code for a DFA-based - lexer by _=08d_=08l_=08g. Hence, _=08a_=08n_=08t_=08l_=08r serves a = function much like that - of _=08y_=08a_=08c_=08c, however, it is notably more flexible and is = more - integrated with a lexer generator (_=08a_=08n_=08t_=08l_=08r directly= generates - _=08d_=08l_=08g code, whereas _=08y_=08a_=08c_=08c and _=08l_=08e_=08= x are given independent - descriptions). Unlike _=08y_=08a_=08c_=08c which accepts LALR(1) gra= mmars, - _=08a_=08n_=08t_=08l_=08r accepts LL(k) grammars in an extended BNF n= otation - + lexer by dlg. Hence, antlr serves a function much like that + of yacc, however, it is notably more flexible and is more + integrated with a lexer generator (antlr directly generates + dlg code, whereas yacc and lex are given independent + descriptions). Unlike yacc which accepts LALR(1) grammars, + antlr accepts LL(k) grammars in an extended BNF notation - which eliminates the need for precedence rules. =20 - Like _=08y_=08a_=08c_=08c grammars, _=08a_=08n_=08t_=08l_=08r grammar= s can use automatically- + Like yacc grammars, antlr grammars can use automatically- maintained symbol attribute values referenced as dollar - variables. Further, because _=08a_=08n_=08t_=08l_=08r generates top-= down + variables. Further, because antlr generates top-down parsers, arbitrary values may be inherited from parent rules - (passed like function parameters). _=08A_=08n_=08t_=08l_=08r also ha= s a mechan- + (passed like function parameters). Antlr also has a mechan- ism for creating and manipulating abstract-syntax-trees. =20 - There are various other niceties in _=08a_=08n_=08t_=08l_=08r, includ= ing the + There are various other niceties in antlr, including the ability to spread one grammar over multiple files or even multiple grammars in a single file, the ability to generate a version of the grammar with actions stripped out (for documentation purposes), and lots more. =20 OPTIONS - -ck _=08n - Use up to _=08n symbols of lookahead when using compressed + -ck n + Use up to n symbols of lookahead when using compressed (linear approximation) lookahead. This type of looka- head is very cheap to compute and is attempted before full LL(k) lookahead, which is of exponential complex- ity in the worst case. In general, the compressed loo- - kahead can be much deeper (e.g, -ck 10) _=08t_=08h_=08a_=08n _= =08t_=08h_=08e _=08f_=08u_=08l_=08l - _=08l_=08o_=08o_=08k_=08a_=08h_=08e_=08a_=08d (_=08w_=08h_=08i_= =08c_=08h _=08u_=08s_=08u_=08a_=08l_=08l_=08y _=08m_=08u_=08s_=08t _=08b_= =08e _=08l_=08e_=08s_=08s _=08t_=08h_=08a_=08n _=084). + kahead can be much deeper (e.g, -ck 10) than the full + lookahead (which usually must be less than 4). =20 -CC Generate C++ output from both ANTLR and DLG. =20 @@ -86,20 +86,20 @@ OPTIONS =20 -ga Generate ANSI-compatible code (default case). This has not been rigorously tested to be ANSI XJ11 C compliant, - but it is close. The normal output of _=08a_=08n_=08t_=08l_=08r= is + but it is close. The normal output of antlr is currently compilable under both K&R, ANSI C, and C++- - this option does nothing because _=08a_=08n_=08t_=08l_=08r gener= ates a + this option does nothing because antlr generates a bunch of #ifdef's to do the right thing depending on the language. =20 - -gc Indicates that _=08a_=08n_=08t_=08l_=08r should generate no C co= de, i.e., + -gc Indicates that antlr should generate no C code, i.e., only perform analysis on the grammar. =20 - -gd C code is inserted in each of the _=08a_=08n_=08t_=08l_=08r gene= rated pars- + -gd C code is inserted in each of the antlr generated pars- ing functions to provide for user-defined handling of a detailed parse trace. The inserted code consists of calls to the user-supplied macros or functions called - zzTRACEIN and zzTRACEOUT. The only argument is a _=08c_=08h_=08= a_=08r + zzTRACEIN and zzTRACEOUT. The only argument is a char * pointing to a C-style string which is the grammar rule recognized by the current parsing function. If no definition is given for the trace functions, upon rule @@ -110,17 +110,17 @@ OPTIONS =20 -gh Generate stdpccts.h for non-ANTLR-generated files to include. This file contains all defines needed to - describe the type of parser generated by _=08a_=08n_=08t_=08l_= =08r (e.g. + describe the type of parser generated by antlr (e.g. how much lookahead is used and whether or not trees are constructed) and contains the header action specified by the user. =20 -gk Generate parsers that delay lookahead fetches until - needed. Without this option, _=08a_=08n_=08t_=08l_=08r generate= s parsers - which always have _=08k tokens of lookahead available. + needed. Without this option, antlr generates parsers + which always have k tokens of lookahead available. =20 -gl Generate line info about grammar actions in C parser of - the form # _=08l_=08i_=08n_=08e "_=08f_=08i_=08l_=08e" which mak= es error messages from + the form # line "file" which makes error messages from the C/C++ compiler make more sense as they will point into the grammar file not the resulting C file. Debugging is easier as well, because you will step @@ -128,18 +128,18 @@ OPTIONS =20 -gs Do not generate sets for token expression lists; instead generate a ||-separated sequence of - LA(1)=3D=3D_=08t_=08o_=08k_=08e_=08n__=08n_=08u_=08m_=08b_=08e_= =08r. The default is to generate sets. + LA(1)=3D=3Dtoken_number. The default is to generate sets. =20 -gt Generate code for Abstract-Syntax Trees. =20 -gx Do not create the lexical analyzer files (dlg-related). This option should be given when the user wishes to provide a customized lexical analyzer. It may also be - used in _=08m_=08a_=08k_=08e scripts to cause only the parser to= be + used in make scripts to cause only the parser to be rebuilt when a change not affecting the lexical struc- ture is made to the input grammars. =20 - -k _=08n Set k of LL(k) to _=08n; i.e. set tokens of look-ahead + -k n Set k of LL(k) to n; i.e. set tokens of look-ahead (default=3D=3D1). =20 -o dir @@ -171,9 +171,9 @@ OPTIONS release with option -pr on. Context computation is off by default. =20 - -rl _=08n + -rl n Limit the maximum number of tree nodes used by grammar - analysis to _=08n. Occasionally, _=08a_=08n_=08t_=08l_=08r is u= nable to + analysis to n. Occasionally, antlr is unable to analyze a grammar submitted by the user. This rare situation can only occur when the grammar is large and the amount of lookahead is greater than one. A non- @@ -184,14 +184,14 @@ OPTIONS the number of calls to the full LL(k) algorithm. An error message will be displayed, if this limit is reached, which indicates the grammar construct being - analyzed when _=08a_=08n_=08t_=08l_=08r hit a non-linearity. Us= e this - option if _=08a_=08n_=08t_=08l_=08r seems to go out to lunch and= your disk - start thrashing; try _=08n=3D10000 to start. Once the + analyzed when antlr hit a non-linearity. Use this + option if antlr seems to go out to lunch and your disk + start thrashing; try n=3D10000 to start. Once the offending construct has been identified, try to remove - the ambiguity that _=08a_=08n_=08t_=08l_=08r was trying to overc= ome with + the ambiguity that antlr was trying to overcome with large lookahead analysis. The introduction of (...)? backtracking blocks eliminates some of these problems - - _=08a_=08n_=08t_=08l_=08r does not analyze alternatives that beg= in with + antlr does not analyze alternatives that begin with (...)? (it simply backtracks, if necessary, at run time). =20 @@ -208,7 +208,7 @@ OPTIONS as the parser file. =20 SPECIAL CONSIDERATIONS - _=08A_=08n_=08t_=08l_=08r works... we think. There is no implicit g= uarantee of + Antlr works... we think. There is no implicit guarantee of anything. We reserve no legal rights to the software known as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS is in the public domain. An individual or company may do @@ -234,7 +234,7 @@ FILES output C++ parser when C++ mode is used. =20 parser.dlg - output _=08d_=08l_=08g lexical analyzer. + output dlg lexical analyzer. =20 err.c token string array, error sets and error support rou- @@ -251,7 +251,7 @@ FILES erated by default. Not used in C++ mode. =20 tokens.h - output #_=08d_=08e_=08f_=08i_=08n_=08e_=08s for tokens used and = function prototypes + output #defines for tokens used and function prototypes for functions generated for rules. =20 =20 diff --git a/BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt b/BaseTools/S= ource/C/VfrCompile/Pccts/dlg/dlg1.txt index 06b320de2abb..5ea5e933c808 100644 --- a/BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt +++ b/BaseTools/Source/C/VfrCompile/Pccts/dlg/dlg1.txt @@ -9,14 +9,14 @@ NAME dlg - DFA Lexical Analyzer Generator =20 SYNTAX - dlg [_=08o_=08p_=08t_=08i_=08o_=08n_=08s] _=08l_=08e_=08x_=08i_=08c_= =08a_=08l__=08s_=08p_=08e_=08c [_=08o_=08u_=08t_=08p_=08u_=08t__=08f_=08i_= =08l_=08e] + dlg [options] lexical_spec [output_file] =20 DESCRIPTION dlg is a tool that produces fast deterministic finite auto- mata for recognizing regular expressions in input. =20 OPTIONS - -CC Generate C++ output. The _=08o_=08u_=08t_=08p_=08u_=08t__=08f_= =08i_=08l_=08e is not specified + -CC Generate C++ output. The output_file is not specified in this case. =20 -C[ level] @@ -69,7 +69,7 @@ OPTIONS in or send output to standard out. =20 SPECIAL CONSIDERATIONS - _=08D_=08l_=08g works... we think. There is no implicit guarantee o= f + Dlg works... we think. There is no implicit guarantee of anything. We reserve no legal rights to the software known as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS is in the public domain. An individual or company may do --=20 2.21.0.1020.gf2820cf01a-goog