Next Previous Contents

7. Native Languages

Since AMC is a preprocessor, chances are pretty high that it must deal with small sections of code in the native language. In order to support this, CGL routines are provided for operating on foreign code blocks. Currently, the only CGL routines that exist in this area are for the C language. However, others can be added easily.

7.1 Copying fragments of C code

The most frequent activity in AMC when pre-processing C code is simply reading in a code fragment and pasting it in the output surrounded by template code.

This is supported with the copy_c function. This function knows how to scan through C code without getting confused on strings or comments. An ``artificial'' keyword end is added to terminate a foreign code block. The copy_c function also generates the appropriate #line directives so that the native C compiler generates correct error messages. This feature can be turned off by using the set_copyc_opts CGL function. Giving it a CGL boolean value as its argument turns this feature on (if the argument is true) or off (if the argument is false).

In addition, it is very important that the resultant C code accuratly reflect the source line numbers of the original source file. Therefore, two routines are provided to write directives into the output stream.

c_sync_input will write out a line number synchronization directive such that the output stream is synchronized with the filename and line number of the input stream. Once this is no longer necessary calling c_sync_output will write a directive reverting the previous one (synchronizing with the absolute line number in the output file).

7.2 Analyzing C code

Of course, sometimes it is necessary to do quite a bit of pre-processing on input code. For this, AMC includes a complete C lexical analyzer package.

The package includes several functions which provide a parser framework with token lookahead.

c_walk_tokens(expr, err)

This requests a block of code (via the <- operator) and executes expr for each token found. If the <- operator is not the next token in the input module, an error is generated.

During evaluation of expr, two variables are bound to hold information about the current token.

c_cur_token is set to designate the class of a token, which can be one of the following:

c_cur_lexeme is the actual set of characters that make up the token. So for example, for an identifier foo, the variable c_cur_token would equal t_ident and c_cur_lexeme would be the actual string foo.

If there is an error during evaluation or lexical analysis the err parameter is evaluated. If it evaluates to true, processing of the token stream continues even in the face of errors. If it evaluates to false, the error is propagated up and handled appropriatly. If the parameter is omitted, the default is to propagate the error.

c_walk_opt_tokens(expr, err)

This is very similar to c_walk_tokens, except no error is generated if the foreign code operator (<-) is not present.

In stead, expr is simply not evaluated.

c_stop_walking

Calling this function within the expression of a c_walk_tokens or c_walk_opt_tokens will abort any further walking of the input file and return AMC's lexical analyzer to its default state.

This is useful if an "end delimiter" is detected and there is more code for AMC in the current module.

This call only effects the inner-most enclosing c_walk_tokens or c_walk_opt_tokens call. If there are no enclosing calls, this function has no effect. This is similar to C's break statement.

c_keep_token

Calling this within an enlosing c_walk_tokens or c_walk_opt_tokens function will result in the current token being re-evaluated (this is especially useful after shifting states in a parser).

This call only effects the inner-most enclosing c_walk_tokens or c_walk_opt_tokens call. If there are no enclosing calls, this function has no effect. This is similar to C's continue statement.

c_pushback(token, lexeme)

This function forces the next iteration of the next c_walk_tokens or c_walk_opt_tokens block to get a token of type token with a lexeme of lexeme.

Unlike some of the other functions in this group, this one can be called without being in an enclosing c_walk_tokens or c_walk_opt_tokens block and it will still take effect.

This is very useful for shifting states in a parser or seeding an initial state.

c_remove_all_pushback

This function flushes all of the tokens that were created using a c_pushback call. This is useful for error recovery.

c_get_token(tok_var_name, lex_var_name)

This function can be used to obtain the next token while inside at least one nesting of c_walk_tokens or c_walk_opt_tokens.

The two arguments are the names of two variables to bind with the next token and lexeme. The return value is true if the next token was obtained or false for end of file.

It is an error to call this function when a call to c_walk_tokens or c_walk_opt_tokens is not in effect.

Using this package, it is trivial to create very powerful pre-processors that add new syntax to C very easily. The most common approach is to iterate over the code using one or two map statements.

In the simplest case, no state information is necessary. Only one map is requirted to analyze the incoming token. However, if previous state information is required, two map statements may be necessary.

A very simple parser can be written like this:


    current_state = (0)
    c_walk_tokens
    (
       map (current_state)
       {
         0: /* Initial state. */
         {
           map (c_cur_token)
           {
             /* Set current_state appropriatly in here. */
           };
         };

         1: /* In state 1. */
         {
           map (c_cur_token)
           {
             /* State 1 transitions go here. */
           };
         };
       };
    )

Of course, in a real-world example it would be wise to factor each state handling map into a separate CGL procedure.

Helper Routines

There are also several other functions that can be considered part of the "C back end." is_c_identifier(str) evaluates to true if the identifier specified fits the rules defined by ANSI C for valid identifiers. To avoid having to constantly do conditionals, the function validate_c_identifier(str) performs the same validation, but it issues an error message and aborts compilation if the identifier is not valid. If the identifier is valid, it does nothing.

Optionally, both is_c_identifier and validate_c_identifier can take a second parameter that is a boolean that defines if the C type keywords are to be flagged as invalid. Normally, C keywords are never valid identifiers, but when validating a type name, it may be beneficial to pass false as the second argument to these functions.

The write_as_c_string(str) function evaluates to the string as you would have to write it in C (quoting the various characters as necessary). It does not evaluate to the actual double-quote (") characters though.


Next Previous Contents