ANTLR(1) PCCTS Manual Pages ANTLR(1) | |
NAME | |
antlr - ANother Tool for Language Recognition | |
SYNTAX | |
antlr [_o_p_t_i_o_n_s] _g_r_a_m_m_a_r__f_i_l_e_s | |
DESCRIPTION | |
_A_n_t_l_r converts an extended form of context-free grammar into | |
a set of C functions which directly implement an efficient | |
form of deterministic recursive-descent LL(k) parser. | |
Context-free grammars may be augmented with predicates to | |
allow semantics to influence parsing; this allows a form of | |
context-sensitive parsing. Selective backtracking is also | |
available to handle non-LL(k) and even non-LALR(k) con- | |
structs. _A_n_t_l_r also produces a definition of a lexer which | |
can be automatically converted into C code for a DFA-based | |
lexer by _d_l_g. Hence, _a_n_t_l_r serves a function much like that | |
of _y_a_c_c, however, it is notably more flexible and is more | |
integrated with a lexer generator (_a_n_t_l_r directly generates | |
_d_l_g code, whereas _y_a_c_c and _l_e_x are given independent | |
descriptions). Unlike _y_a_c_c which accepts LALR(1) grammars, | |
_a_n_t_l_r accepts LL(k) grammars in an extended BNF notation - | |
which eliminates the need for precedence rules. | |
Like _y_a_c_c grammars, _a_n_t_l_r grammars can use automatically- | |
maintained symbol attribute values referenced as dollar | |
variables. Further, because _a_n_t_l_r generates top-down | |
parsers, arbitrary values may be inherited from parent rules | |
(passed like function parameters). _A_n_t_l_r also has a mechan- | |
ism for creating and manipulating abstract-syntax-trees. | |
There are various other niceties in _a_n_t_l_r, including the | |
ability to spread one grammar over multiple files or even | |
multiple grammars in a single file, the ability to generate | |
a version of the grammar with actions stripped out (for | |
documentation purposes), and lots more. | |
OPTIONS | |
-ck _n | |
Use up to _n symbols of lookahead when using compressed | |
(linear approximation) lookahead. This type of looka- | |
head is very cheap to compute and is attempted before | |
full LL(k) lookahead, which is of exponential complex- | |
ity in the worst case. In general, the compressed loo- | |
kahead can be much deeper (e.g, -ck 10) _t_h_a_n _t_h_e _f_u_l_l | |
_l_o_o_k_a_h_e_a_d (_w_h_i_c_h _u_s_u_a_l_l_y _m_u_s_t _b_e _l_e_s_s _t_h_a_n _4). | |
-CC Generate C++ output from both ANTLR and DLG. | |
-cr Generate a cross-reference for all rules. For each | |
rule, print a list of all other rules that reference | |
it. | |
-e1 Ambiguities/errors shown in low detail (default). | |
-e2 Ambiguities/errors shown in more detail. | |
-e3 Ambiguities/errors shown in excruciating detail. | |
-fe file | |
Rename err.c to file. | |
-fh file | |
Rename stdpccts.h header (turns on -gh) to file. | |
-fl file | |
Rename lexical output, parser.dlg, to file. | |
-fm file | |
Rename file with lexical mode definitions, mode.h, to | |
file. | |
-fr file | |
Rename file which remaps globally visible symbols, | |
remap.h, to file. | |
-ft file | |
Rename tokens.h to file. | |
-ga Generate ANSI-compatible code (default case). This has | |
not been rigorously tested to be ANSI XJ11 C compliant, | |
but it is close. The normal output of _a_n_t_l_r is | |
currently compilable under both K&R, ANSI C, and C++- | |
this option does nothing because _a_n_t_l_r generates a | |
bunch of #ifdef's to do the right thing depending on | |
the language. | |
-gc Indicates that _a_n_t_l_r should generate no C code, i.e., | |
only perform analysis on the grammar. | |
-gd C code is inserted in each of the _a_n_t_l_r generated pars- | |
ing functions to provide for user-defined handling of a | |
detailed parse trace. The inserted code consists of | |
calls to the user-supplied macros or functions called | |
zzTRACEIN and zzTRACEOUT. The only argument is a _c_h_a_r | |
* pointing to a C-style string which is the grammar | |
rule recognized by the current parsing function. If no | |
definition is given for the trace functions, upon rule | |
entry and exit, a message will be printed indicating | |
that a particular rule as been entered or exited. | |
-ge Generate an error class for each non-terminal. | |
-gh Generate stdpccts.h for non-ANTLR-generated files to | |
include. This file contains all defines needed to | |
describe the type of parser generated by _a_n_t_l_r (e.g. | |
how much lookahead is used and whether or not trees are | |
constructed) and contains the header action specified | |
by the user. | |
-gk Generate parsers that delay lookahead fetches until | |
needed. Without this option, _a_n_t_l_r generates parsers | |
which always have _k tokens of lookahead available. | |
-gl Generate line info about grammar actions in C parser of | |
the form # _l_i_n_e "_f_i_l_e" which makes error messages from | |
the C/C++ compiler make more sense as they will point | |
into the grammar file not the resulting C file. | |
Debugging is easier as well, because you will step | |
through the grammar not C file. | |
-gs Do not generate sets for token expression lists; | |
instead generate a ||-separated sequence of | |
LA(1)==_t_o_k_e_n__n_u_m_b_e_r. The default is to generate sets. | |
-gt Generate code for Abstract-Syntax Trees. | |
-gx Do not create the lexical analyzer files (dlg-related). | |
This option should be given when the user wishes to | |
provide a customized lexical analyzer. It may also be | |
used in _m_a_k_e scripts to cause only the parser to be | |
rebuilt when a change not affecting the lexical struc- | |
ture is made to the input grammars. | |
-k _n Set k of LL(k) to _n; i.e. set tokens of look-ahead | |
(default==1). | |
-o dir | |
Directory where output files should go (default="."). | |
This is very nice for keeping the source directory | |
clear of ANTLR and DLG spawn. | |
-p The complete grammar, collected from all input grammar | |
files and stripped of all comments and embedded | |
actions, is listed to stdout. This is intended to aid | |
in viewing the entire grammar as a whole and to elim- | |
inate the need to keep actions concisely stated so that | |
the grammar is easier to read. Hence, it is preferable | |
to embed even complex actions directly in the grammar, | |
rather than to call them as subroutines, since the sub- | |
routine call overhead will be saved. | |
-pa This option is the same as -p except that the output is | |
annotated with the first sets determined from grammar | |
analysis. | |
-prc on | |
Turn on the computation and hoisting of predicate con- | |
text. | |
-prc off | |
Turn off the computation and hoisting of predicate con- | |
text. This option makes 1.10 behave like the 1.06 | |
release with option -pr on. Context computation is off | |
by default. | |
-rl _n | |
Limit the maximum number of tree nodes used by grammar | |
analysis to _n. Occasionally, _a_n_t_l_r is unable to | |
analyze a grammar submitted by the user. This rare | |
situation can only occur when the grammar is large and | |
the amount of lookahead is greater than one. A non- | |
linear analysis algorithm is used by PCCTS to handle | |
the general case of LL(k) parsing. The average com- | |
plexity of analysis, however, is near linear due to | |
some fancy footwork in the implementation which reduces | |
the number of calls to the full LL(k) algorithm. An | |
error message will be displayed, if this limit is | |
reached, which indicates the grammar construct being | |
analyzed when _a_n_t_l_r hit a non-linearity. Use this | |
option if _a_n_t_l_r seems to go out to lunch and your disk | |
start thrashing; try _n=10000 to start. Once the | |
offending construct has been identified, try to remove | |
the ambiguity that _a_n_t_l_r was trying to overcome with | |
large lookahead analysis. The introduction of (...)? | |
backtracking blocks eliminates some of these problems - | |
_a_n_t_l_r does not analyze alternatives that begin with | |
(...)? (it simply backtracks, if necessary, at run | |
time). | |
-w1 Set low warning level. Do not warn if semantic | |
predicates and/or (...)? blocks are assumed to cover | |
ambiguous alternatives. | |
-w2 Ambiguous parsing decisions yield warnings even if | |
semantic predicates or (...)? blocks are used. Warn if | |
predicate context computed and semantic predicates | |
incompletely disambiguate alternative productions. | |
- Read grammar from standard input and generate stdin.c | |
as the parser file. | |
SPECIAL CONSIDERATIONS | |
_A_n_t_l_r works... we think. There is no implicit guarantee of | |
anything. We reserve no legal rights to the software known | |
as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS | |
is in the public domain. An individual or company may do | |
whatever they wish with source code distributed with PCCTS | |
or the code generated by PCCTS, including the incorporation | |
of PCCTS, or its output, into commercial software. We | |
encourage users to develop software with PCCTS. However, we | |
do ask that credit is given to us for developing PCCTS. By | |
"credit", we mean that if you incorporate our source code | |
into one of your programs (commercial product, research pro- | |
ject, or otherwise) that you acknowledge this fact somewhere | |
in the documentation, research report, etc... If you like | |
PCCTS and have developed a nice tool with the output, please | |
mention that you developed it using PCCTS. As long as these | |
guidelines are followed, we expect to continue enhancing | |
this system and expect to make other tools available as they | |
are completed. | |
FILES | |
*.c output C parser. | |
*.cpp | |
output C++ parser when C++ mode is used. | |
parser.dlg | |
output _d_l_g lexical analyzer. | |
err.c | |
token string array, error sets and error support rou- | |
tines. Not used in C++ mode. | |
remap.h | |
file that redefines all globally visible parser sym- | |
bols. The use of the #parser directive creates this | |
file. Not used in C++ mode. | |
stdpccts.h | |
list of definitions needed by C files, not generated by | |
PCCTS, that reference PCCTS objects. This is not gen- | |
erated by default. Not used in C++ mode. | |
tokens.h | |
output #_d_e_f_i_n_e_s for tokens used and function prototypes | |
for functions generated for rules. | |
SEE ALSO | |
dlg(1), pccts(1) | |