| .TH ANTLR 1 "September 1995" "ANTLR" "PCCTS Manual Pages" | |
| .SH NAME | |
| antlr \- ANother Tool for Language Recognition | |
| .SH SYNTAX | |
| .LP | |
| \fBantlr\fR [\fIoptions\fR] \fIgrammar_files\fR | |
| .SH DESCRIPTION | |
| .PP | |
| \fIAntlr\fP converts an extended form of context-free grammar into a | |
| set of C functions which directly implement an efficient form of | |
| deterministic recursive-descent LL(k) parser. Context-free grammars | |
| may be augmented with predicates to allow semantics to influence | |
| parsing; this allows a form of context-sensitive parsing. Selective | |
| backtracking is also available to handle non-LL(k) and even | |
| non-LALR(k) constructs. \fIAntlr\fP also produces a definition of a | |
| lexer which can be automatically converted into C code for a DFA-based | |
| lexer by \fIdlg\fR. Hence, \fIantlr\fR serves a function much like | |
| that of \fIyacc\fR, however, it is notably more flexible and is more | |
| integrated with a lexer generator (\fIantlr\fR directly generates | |
| \fIdlg\fR code, whereas \fIyacc\fR and \fIlex\fR are given independent | |
| descriptions). Unlike \fIyacc\fR which accepts LALR(1) grammars, | |
| \fIantlr\fR accepts LL(k) grammars in an extended BNF notation \(em | |
| which eliminates the need for precedence rules. | |
| .PP | |
| Like \fIyacc\fR grammars, \fIantlr\fR grammars can use | |
| automatically-maintained symbol attribute values referenced as dollar | |
| variables. Further, because \fIantlr\fR generates top-down parsers, | |
| arbitrary values may be inherited from parent rules (passed like | |
| function parameters). \fIAntlr\fP also has a mechanism for creating | |
| and manipulating abstract-syntax-trees. | |
| .PP | |
| There are various other niceties in \fIantlr\fR, including the ability to | |
| spread one grammar over multiple files or even multiple grammars in a single | |
| file, the ability to generate a version of the grammar with actions stripped | |
| out (for documentation purposes), and lots more. | |
| .SH OPTIONS | |
| .IP "\fB-ck \fIn\fR" | |
| Use up to \fIn\fR symbols of lookahead when using compressed (linear | |
| approximation) lookahead. This type of lookahead is very cheap to | |
| compute and is attempted before full LL(k) lookahead, which is of | |
| exponential complexity in the worst case. In general, the compressed | |
| lookahead can be much deeper (e.g, \f(CW-ck 10\fP) than the full | |
| lookahead (which usually must be less than 4). | |
| .IP \fB-CC\fP | |
| Generate C++ output from both ANTLR and DLG. | |
| .IP \fB-cr\fP | |
| Generate a cross-reference for all rules. For each rule, print a list | |
| of all other rules that reference it. | |
| .IP \fB-e1\fP | |
| Ambiguities/errors shown in low detail (default). | |
| .IP \fB-e2\fP | |
| Ambiguities/errors shown in more detail. | |
| .IP \fB-e3\fP | |
| Ambiguities/errors shown in excruciating detail. | |
| .IP "\fB-fe\fP file" | |
| Rename \fBerr.c\fP to file. | |
| .IP "\fB-fh\fP file" | |
| Rename \fBstdpccts.h\fP header (turns on \fB-gh\fP) to file. | |
| .IP "\fB-fl\fP file" | |
| Rename lexical output, \fBparser.dlg\fP, to file. | |
| .IP "\fB-fm\fP file" | |
| Rename file with lexical mode definitions, \fBmode.h\fP, to file. | |
| .IP "\fB-fr\fP file" | |
| Rename file which remaps globally visible symbols, \fBremap.h\fP, to file. | |
| .IP "\fB-ft\fP file" | |
| Rename \fBtokens.h\fP to file. | |
| .IP \fB-ga\fP | |
| Generate ANSI-compatible code (default case). This has not been | |
| rigorously tested to be ANSI XJ11 C compliant, but it is close. The | |
| normal output of \fIantlr\fP is currently compilable under both K&R, | |
| ANSI C, and C++\(emthis option does nothing because \fIantlr\fP | |
| generates a bunch of #ifdef's to do the right thing depending on the | |
| language. | |
| .IP \fB-gc\fP | |
| Indicates that \fIantlr\fP should generate no C code, i.e., only | |
| perform analysis on the grammar. | |
| .IP \fB-gd\fP | |
| C code is inserted in each of the \fIantlr\fR generated parsing functions to | |
| provide for user-defined handling of a detailed parse trace. The inserted | |
| code consists of calls to the user-supplied macros or functions called | |
| \fBzzTRACEIN\fR and \fBzzTRACEOUT\fP. The only argument is a | |
| \fIchar *\fR pointing to a C-style string which is the grammar rule | |
| recognized by the current parsing function. If no definition is given | |
| for the trace functions, upon rule entry and exit, a message will be | |
| printed indicating that a particular rule as been entered or exited. | |
| .IP \fB-ge\fP | |
| Generate an error class for each non-terminal. | |
| .IP \fB-gh\fP | |
| Generate \fBstdpccts.h\fP for non-ANTLR-generated files to include. | |
| This file contains all defines needed to describe the type of parser | |
| generated by \fIantlr\fP (e.g. how much lookahead is used and whether | |
| or not trees are constructed) and contains the \fBheader\fP action | |
| specified by the user. | |
| .IP \fB-gk\fP | |
| Generate parsers that delay lookahead fetches until needed. Without | |
| this option, \fIantlr\fP generates parsers which always have \fIk\fP | |
| tokens of lookahead available. | |
| .IP \fB-gl\fP | |
| Generate line info about grammar actions in C parser of the form | |
| \fB#\ \fIline\fP\ "\fIfile\fP"\fR which makes error messages from | |
| the C/C++ compiler make more sense as they will \*Qpoint\*U into the | |
| grammar file not the resulting C file. Debugging is easier as well, | |
| because you will step through the grammar not C file. | |
| .IP \fB-gs\fR | |
| Do not generate sets for token expression lists; instead generate a | |
| \fB||\fP-separated sequence of \fBLA(1)==\fItoken_number\fR. The | |
| default is to generate sets. | |
| .IP \fB-gt\fP | |
| Generate code for Abstract-Syntax Trees. | |
| .IP \fB-gx\fP | |
| Do not create the lexical analyzer files (dlg-related). This option | |
| should be given when the user wishes to provide a customized lexical | |
| analyzer. It may also be used in \fImake\fR scripts to cause only the | |
| parser to be rebuilt when a change not affecting the lexical structure | |
| is made to the input grammars. | |
| .IP "\fB-k \fIn\fR" | |
| Set k of LL(k) to \fIn\fR; i.e. set tokens of look-ahead (default==1). | |
| .IP "\fB-o\fP dir | |
| Directory where output files should go (default="."). This is very | |
| nice for keeping the source directory clear of ANTLR and DLG spawn. | |
| .IP \fB-p\fP | |
| The complete grammar, collected from all input grammar files and | |
| stripped of all comments and embedded actions, is listed to | |
| \fBstdout\fP. This is intended to aid in viewing the entire grammar | |
| as a whole and to eliminate the need to keep actions concisely stated | |
| so that the grammar is easier to read. Hence, it is preferable to | |
| embed even complex actions directly in the grammar, rather than to | |
| call them as subroutines, since the subroutine call overhead will be | |
| saved. | |
| .IP \fB-pa\fP | |
| This option is the same as \fB-p\fP except that the output is | |
| annotated with the first sets determined from grammar analysis. | |
| .IP "\fB-prc on\fR | |
| Turn on the computation and hoisting of predicate context. | |
| .IP "\fB-prc off\fR | |
| Turn off the computation and hoisting of predicate context. This | |
| option makes 1.10 behave like the 1.06 release with option \fB-pr\fR | |
| on. Context computation is off by default. | |
| .IP "\fB-rl \fIn\fR | |
| Limit the maximum number of tree nodes used by grammar analysis to | |
| \fIn\fP. Occasionally, \fIantlr\fP is unable to analyze a grammar | |
| submitted by the user. This rare situation can only occur when the | |
| grammar is large and the amount of lookahead is greater than one. A | |
| nonlinear analysis algorithm is used by PCCTS to handle the general | |
| case of LL(k) parsing. The average complexity of analysis, however, is | |
| near linear due to some fancy footwork in the implementation which | |
| reduces the number of calls to the full LL(k) algorithm. An error | |
| message will be displayed, if this limit is reached, which indicates | |
| the grammar construct being analyzed when \fIantlr\fP hit a | |
| non-linearity. Use this option if \fIantlr\fP seems to go out to | |
| lunch and your disk start thrashing; try \fIn\fP=10000 to start. Once | |
| the offending construct has been identified, try to remove the | |
| ambiguity that \fIantlr\fP was trying to overcome with large lookahead | |
| analysis. The introduction of (...)? backtracking blocks eliminates | |
| some of these problems\ \(em \fIantlr\fP does not analyze alternatives | |
| that begin with (...)? (it simply backtracks, if necessary, at run | |
| time). | |
| .IP \fB-w1\fR | |
| Set low warning level. Do not warn if semantic predicates and/or | |
| (...)? blocks are assumed to cover ambiguous alternatives. | |
| .IP \fB-w2\fR | |
| Ambiguous parsing decisions yield warnings even if semantic predicates | |
| or (...)? blocks are used. Warn if predicate context computed and | |
| semantic predicates incompletely disambiguate alternative productions. | |
| .IP \fB-\fR | |
| Read grammar from standard input and generate \fBstdin.c\fP as the | |
| parser file. | |
| .SH "SPECIAL CONSIDERATIONS" | |
| .PP | |
| \fIAntlr\fP works... we think. There is no implicit guarantee of | |
| anything. We reserve no \fBlegal\fP rights to the software known as | |
| the Purdue Compiler Construction Tool Set (PCCTS) \(em PCCTS is in the | |
| public domain. An individual or company may do whatever they wish | |
| with source code distributed with PCCTS or the code generated by | |
| PCCTS, including the incorporation of PCCTS, or its output, into | |
| commercial software. We encourage users to develop software with | |
| PCCTS. However, we do ask that credit is given to us for developing | |
| PCCTS. By "credit", we mean that if you incorporate our source code | |
| into one of your programs (commercial product, research project, or | |
| otherwise) that you acknowledge this fact somewhere in the | |
| documentation, research report, etc... If you like PCCTS and have | |
| developed a nice tool with the output, please mention that you | |
| developed it using PCCTS. As long as these guidelines are followed, | |
| we expect to continue enhancing this system and expect to make other | |
| tools available as they are completed. | |
| .SH FILES | |
| .IP *.c | |
| output C parser. | |
| .IP *.cpp | |
| output C++ parser when C++ mode is used. | |
| .IP \fBparser.dlg\fP | |
| output \fIdlg\fR lexical analyzer. | |
| .IP \fBerr.c\fP | |
| token string array, error sets and error support routines. Not used in | |
| C++ mode. | |
| .IP \fBremap.h\fP | |
| file that redefines all globally visible parser symbols. The use of | |
| the #parser directive creates this file. Not used in | |
| C++ mode. | |
| .IP \fBstdpccts.h\fP | |
| list of definitions needed by C files, not generated by PCCTS, that | |
| reference PCCTS objects. This is not generated by default. Not used in | |
| C++ mode. | |
| .IP \fBtokens.h\fP | |
| output \fI#defines\fR for tokens used and function prototypes for | |
| functions generated for rules. | |
| .SH "SEE ALSO" | |
| .LP | |
| dlg(1), pccts(1) |