| ANTLR(1) PCCTS Manual Pages ANTLR(1) | |
| NAME | |
| antlr - ANother Tool for Language Recognition | |
| SYNTAX | |
| antlr [_o_p_t_i_o_n_s] _g_r_a_m_m_a_r__f_i_l_e_s | |
| DESCRIPTION | |
| _A_n_t_l_r converts an extended form of context-free grammar into | |
| a set of C functions which directly implement an efficient | |
| form of deterministic recursive-descent LL(k) parser. | |
| Context-free grammars may be augmented with predicates to | |
| allow semantics to influence parsing; this allows a form of | |
| context-sensitive parsing. Selective backtracking is also | |
| available to handle non-LL(k) and even non-LALR(k) con- | |
| structs. _A_n_t_l_r also produces a definition of a lexer which | |
| can be automatically converted into C code for a DFA-based | |
| lexer by _d_l_g. Hence, _a_n_t_l_r serves a function much like that | |
| of _y_a_c_c, however, it is notably more flexible and is more | |
| integrated with a lexer generator (_a_n_t_l_r directly generates | |
| _d_l_g code, whereas _y_a_c_c and _l_e_x are given independent | |
| descriptions). Unlike _y_a_c_c which accepts LALR(1) grammars, | |
| _a_n_t_l_r accepts LL(k) grammars in an extended BNF notation - | |
| which eliminates the need for precedence rules. | |
| Like _y_a_c_c grammars, _a_n_t_l_r grammars can use automatically- | |
| maintained symbol attribute values referenced as dollar | |
| variables. Further, because _a_n_t_l_r generates top-down | |
| parsers, arbitrary values may be inherited from parent rules | |
| (passed like function parameters). _A_n_t_l_r also has a mechan- | |
| ism for creating and manipulating abstract-syntax-trees. | |
| There are various other niceties in _a_n_t_l_r, including the | |
| ability to spread one grammar over multiple files or even | |
| multiple grammars in a single file, the ability to generate | |
| a version of the grammar with actions stripped out (for | |
| documentation purposes), and lots more. | |
| OPTIONS | |
| -ck _n | |
| Use up to _n symbols of lookahead when using compressed | |
| (linear approximation) lookahead. This type of looka- | |
| head is very cheap to compute and is attempted before | |
| full LL(k) lookahead, which is of exponential complex- | |
| ity in the worst case. In general, the compressed loo- | |
| kahead can be much deeper (e.g, -ck 10) _t_h_a_n _t_h_e _f_u_l_l | |
| _l_o_o_k_a_h_e_a_d (_w_h_i_c_h _u_s_u_a_l_l_y _m_u_s_t _b_e _l_e_s_s _t_h_a_n _4). | |
| -CC Generate C++ output from both ANTLR and DLG. | |
| -cr Generate a cross-reference for all rules. For each | |
| rule, print a list of all other rules that reference | |
| it. | |
| -e1 Ambiguities/errors shown in low detail (default). | |
| -e2 Ambiguities/errors shown in more detail. | |
| -e3 Ambiguities/errors shown in excruciating detail. | |
| -fe file | |
| Rename err.c to file. | |
| -fh file | |
| Rename stdpccts.h header (turns on -gh) to file. | |
| -fl file | |
| Rename lexical output, parser.dlg, to file. | |
| -fm file | |
| Rename file with lexical mode definitions, mode.h, to | |
| file. | |
| -fr file | |
| Rename file which remaps globally visible symbols, | |
| remap.h, to file. | |
| -ft file | |
| Rename tokens.h to file. | |
| -ga Generate ANSI-compatible code (default case). This has | |
| not been rigorously tested to be ANSI XJ11 C compliant, | |
| but it is close. The normal output of _a_n_t_l_r is | |
| currently compilable under both K&R, ANSI C, and C++- | |
| this option does nothing because _a_n_t_l_r generates a | |
| bunch of #ifdef's to do the right thing depending on | |
| the language. | |
| -gc Indicates that _a_n_t_l_r should generate no C code, i.e., | |
| only perform analysis on the grammar. | |
| -gd C code is inserted in each of the _a_n_t_l_r generated pars- | |
| ing functions to provide for user-defined handling of a | |
| detailed parse trace. The inserted code consists of | |
| calls to the user-supplied macros or functions called | |
| zzTRACEIN and zzTRACEOUT. The only argument is a _c_h_a_r | |
| * pointing to a C-style string which is the grammar | |
| rule recognized by the current parsing function. If no | |
| definition is given for the trace functions, upon rule | |
| entry and exit, a message will be printed indicating | |
| that a particular rule as been entered or exited. | |
| -ge Generate an error class for each non-terminal. | |
| -gh Generate stdpccts.h for non-ANTLR-generated files to | |
| include. This file contains all defines needed to | |
| describe the type of parser generated by _a_n_t_l_r (e.g. | |
| how much lookahead is used and whether or not trees are | |
| constructed) and contains the header action specified | |
| by the user. | |
| -gk Generate parsers that delay lookahead fetches until | |
| needed. Without this option, _a_n_t_l_r generates parsers | |
| which always have _k tokens of lookahead available. | |
| -gl Generate line info about grammar actions in C parser of | |
| the form # _l_i_n_e "_f_i_l_e" which makes error messages from | |
| the C/C++ compiler make more sense as they will point | |
| into the grammar file not the resulting C file. | |
| Debugging is easier as well, because you will step | |
| through the grammar not C file. | |
| -gs Do not generate sets for token expression lists; | |
| instead generate a ||-separated sequence of | |
| LA(1)==_t_o_k_e_n__n_u_m_b_e_r. The default is to generate sets. | |
| -gt Generate code for Abstract-Syntax Trees. | |
| -gx Do not create the lexical analyzer files (dlg-related). | |
| This option should be given when the user wishes to | |
| provide a customized lexical analyzer. It may also be | |
| used in _m_a_k_e scripts to cause only the parser to be | |
| rebuilt when a change not affecting the lexical struc- | |
| ture is made to the input grammars. | |
| -k _n Set k of LL(k) to _n; i.e. set tokens of look-ahead | |
| (default==1). | |
| -o dir | |
| Directory where output files should go (default="."). | |
| This is very nice for keeping the source directory | |
| clear of ANTLR and DLG spawn. | |
| -p The complete grammar, collected from all input grammar | |
| files and stripped of all comments and embedded | |
| actions, is listed to stdout. This is intended to aid | |
| in viewing the entire grammar as a whole and to elim- | |
| inate the need to keep actions concisely stated so that | |
| the grammar is easier to read. Hence, it is preferable | |
| to embed even complex actions directly in the grammar, | |
| rather than to call them as subroutines, since the sub- | |
| routine call overhead will be saved. | |
| -pa This option is the same as -p except that the output is | |
| annotated with the first sets determined from grammar | |
| analysis. | |
| -prc on | |
| Turn on the computation and hoisting of predicate con- | |
| text. | |
| -prc off | |
| Turn off the computation and hoisting of predicate con- | |
| text. This option makes 1.10 behave like the 1.06 | |
| release with option -pr on. Context computation is off | |
| by default. | |
| -rl _n | |
| Limit the maximum number of tree nodes used by grammar | |
| analysis to _n. Occasionally, _a_n_t_l_r is unable to | |
| analyze a grammar submitted by the user. This rare | |
| situation can only occur when the grammar is large and | |
| the amount of lookahead is greater than one. A non- | |
| linear analysis algorithm is used by PCCTS to handle | |
| the general case of LL(k) parsing. The average com- | |
| plexity of analysis, however, is near linear due to | |
| some fancy footwork in the implementation which reduces | |
| the number of calls to the full LL(k) algorithm. An | |
| error message will be displayed, if this limit is | |
| reached, which indicates the grammar construct being | |
| analyzed when _a_n_t_l_r hit a non-linearity. Use this | |
| option if _a_n_t_l_r seems to go out to lunch and your disk | |
| start thrashing; try _n=10000 to start. Once the | |
| offending construct has been identified, try to remove | |
| the ambiguity that _a_n_t_l_r was trying to overcome with | |
| large lookahead analysis. The introduction of (...)? | |
| backtracking blocks eliminates some of these problems - | |
| _a_n_t_l_r does not analyze alternatives that begin with | |
| (...)? (it simply backtracks, if necessary, at run | |
| time). | |
| -w1 Set low warning level. Do not warn if semantic | |
| predicates and/or (...)? blocks are assumed to cover | |
| ambiguous alternatives. | |
| -w2 Ambiguous parsing decisions yield warnings even if | |
| semantic predicates or (...)? blocks are used. Warn if | |
| predicate context computed and semantic predicates | |
| incompletely disambiguate alternative productions. | |
| - Read grammar from standard input and generate stdin.c | |
| as the parser file. | |
| SPECIAL CONSIDERATIONS | |
| _A_n_t_l_r works... we think. There is no implicit guarantee of | |
| anything. We reserve no legal rights to the software known | |
| as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS | |
| is in the public domain. An individual or company may do | |
| whatever they wish with source code distributed with PCCTS | |
| or the code generated by PCCTS, including the incorporation | |
| of PCCTS, or its output, into commercial software. We | |
| encourage users to develop software with PCCTS. However, we | |
| do ask that credit is given to us for developing PCCTS. By | |
| "credit", we mean that if you incorporate our source code | |
| into one of your programs (commercial product, research pro- | |
| ject, or otherwise) that you acknowledge this fact somewhere | |
| in the documentation, research report, etc... If you like | |
| PCCTS and have developed a nice tool with the output, please | |
| mention that you developed it using PCCTS. As long as these | |
| guidelines are followed, we expect to continue enhancing | |
| this system and expect to make other tools available as they | |
| are completed. | |
| FILES | |
| *.c output C parser. | |
| *.cpp | |
| output C++ parser when C++ mode is used. | |
| parser.dlg | |
| output _d_l_g lexical analyzer. | |
| err.c | |
| token string array, error sets and error support rou- | |
| tines. Not used in C++ mode. | |
| remap.h | |
| file that redefines all globally visible parser sym- | |
| bols. The use of the #parser directive creates this | |
| file. Not used in C++ mode. | |
| stdpccts.h | |
| list of definitions needed by C files, not generated by | |
| PCCTS, that reference PCCTS objects. This is not gen- | |
| erated by default. Not used in C++ mode. | |
| tokens.h | |
| output #_d_e_f_i_n_e_s for tokens used and function prototypes | |
| for functions generated for rules. | |
| SEE ALSO | |
| dlg(1), pccts(1) | |