Since AMC is a preprocessor, chances are pretty high that it must deal with small sections of code in the native language. In order to support this, CGL routines are provided for operating on foreign code blocks. Currently, the only CGL routines that exist in this area are for the C language. However, others can be added easily.
The most frequent activity in AMC when pre-processing C code is simply reading in a code fragment and pasting it in the output surrounded by template code.
This is supported with the copy_c function. This function knows
how to scan through C code without getting confused on strings or comments.
An ``artificial'' keyword end is added to terminate a
foreign code block. The copy_c function also generates
the appropriate #line directives so that the native C
compiler generates correct error messages. This feature can be
turned off by using the set_copyc_opts CGL function.
Giving it a CGL boolean value as its argument turns this feature
on (if the argument is true) or off (if the argument is false).
In addition, it is very important that the resultant C code accuratly reflect the source line numbers of the original source file. Therefore, two routines are provided to write directives into the output stream.
c_sync_input will write out a line number synchronization
directive such that the output stream is synchronized with the
filename and line number of the input stream. Once this is no longer
necessary calling c_sync_output will write a directive reverting
the previous one (synchronizing with the absolute line number in the
output file).
Of course, sometimes it is necessary to do quite a bit of pre-processing on input code. For this, AMC includes a complete C lexical analyzer package.
The package includes several functions which provide a parser framework with token lookahead.
c_walk_tokens(expr, err)This requests a block of code (via the <- operator) and
executes expr for each token found. If the <-
operator is not the next token in the input module, an error is
generated.
During evaluation of expr, two variables are bound to hold
information about the current token.
c_cur_token is set to designate the class of a token, which
can be one of the following:
t_ident - An identifiert_flt_num - A floating-point numeric constant.t_int_num - An integral numeric constant.t_comment - A commentt_str_cnst - A string or quoted character constant.t_preproc - A pre-processor directivet_space - The exact whitespace separating a token.
This is very useful if most of the code is being passed through the
output verbatim. Passing this along in the "default" case
allows the output code to be very human readable when debugging the
pre-processed code.t_oper - An operator (anything that does not fall into
one of the other categories)c_cur_lexeme is the actual set of characters that make up the
token. So for example, for an identifier foo, the variable
c_cur_token would equal t_ident and
c_cur_lexeme would be the actual string foo.
If there is an error during evaluation or lexical analysis the
err parameter is evaluated. If it evaluates to true,
processing of the token stream continues even in the face of errors.
If it evaluates to false, the error is propagated up and handled
appropriatly. If the parameter is omitted, the default is to propagate
the error.
c_walk_opt_tokens(expr, err)This is very similar to c_walk_tokens, except no error is
generated if the foreign code operator (<-) is not present.
In stead, expr is simply not evaluated.
c_stop_walkingCalling this function within the expression of a c_walk_tokens
or c_walk_opt_tokens will abort any further walking of the
input file and return AMC's lexical analyzer to its default state.
This is useful if an "end delimiter" is detected and there is more code for AMC in the current module.
This call only effects the inner-most enclosing c_walk_tokens
or c_walk_opt_tokens call. If there are no enclosing calls,
this function has no effect. This is similar to C's break
statement.
c_keep_tokenCalling this within an enlosing c_walk_tokens or
c_walk_opt_tokens function will result in the current token
being re-evaluated (this is especially useful after shifting states in
a parser).
This call only effects the inner-most enclosing c_walk_tokens
or c_walk_opt_tokens call. If there are no enclosing calls,
this function has no effect. This is similar to C's continue
statement.
c_pushback(token, lexeme)This function forces the next iteration of the next
c_walk_tokens or c_walk_opt_tokens block to get a
token of type token with a lexeme of lexeme.
Unlike some of the other functions in this group, this one can be called
without being in an enclosing c_walk_tokens or
c_walk_opt_tokens block and it will still take effect.
This is very useful for shifting states in a parser or seeding an initial state.
c_remove_all_pushbackThis function flushes all of the tokens that were created using a
c_pushback call. This is useful for error recovery.
c_get_token(tok_var_name, lex_var_name)This function can be used to obtain the next token while inside at least
one nesting of c_walk_tokens or c_walk_opt_tokens.
The two arguments are the names of two variables to bind with the next
token and lexeme. The return value is true if the next token
was obtained or false for end of file.
It is an error to call this function when a call to c_walk_tokens
or c_walk_opt_tokens is not in effect.
Using this package, it is trivial to create very powerful pre-processors
that add new syntax to C very easily. The most common approach is to
iterate over the code using one or two map statements.
In the simplest case, no state information is necessary. Only one
map is requirted to analyze the incoming token. However, if
previous state information is required, two map statements may be
necessary.
A very simple parser can be written like this:
current_state = (0)
c_walk_tokens
(
map (current_state)
{
0: /* Initial state. */
{
map (c_cur_token)
{
/* Set current_state appropriatly in here. */
};
};
1: /* In state 1. */
{
map (c_cur_token)
{
/* State 1 transitions go here. */
};
};
};
)
There are also several other functions that can be considered part
of the "C back end." is_c_identifier(str)
evaluates to true if the identifier specified fits the rules
defined by ANSI C for valid identifiers. To avoid having to
constantly do conditionals, the function
validate_c_identifier(str)
performs the same validation, but it issues an error message and
aborts compilation if the identifier is not valid. If the identifier
is valid, it does nothing.
Optionally, both is_c_identifier and
validate_c_identifier can take a second parameter that is a
boolean that defines if the C type keywords are to be flagged as invalid.
Normally, C keywords are never valid identifiers, but when validating
a type name, it may be beneficial to pass false as the second
argument to these functions.
The write_as_c_string(str)
function evaluates to the string as you would have to write it in
C (quoting the various characters as necessary). It does not evaluate
to the actual double-quote (") characters though.