Next Previous Contents

3. CGL Language Reference Manual

CGL stands for ``Code Generation Language.'' It is a language designed for string processing. It has a very simple syntax and uses built-in functions to perform most of its work. Unlike some scripting languages, CGL is not interpreted. Rather, all of the CGL code is compiled into an in-memory representation that is quite efficient to interpret. The only data type that CGL can process is the character string. This is not to say that other data types are not possible in CGL, just that they must be defined via a function library.

CGL is a functional language with a very simple syntax. Programming in CGL is all about calling functions. Some of these functions are built-in's and they perform common actions; others are user-defined (written by you). The main constructs in CGL are calling functions and setting variables. Even looping and conditional statements are done by calling special built-in's that either don't evaluate all of their arguments or evaluate them more than once.

3.1 Constants

CGL will take integer literals, quoted strings, and identifiers prefixed with the ~ character to mean a constant. Because there is only one data type in CGL (the string), these constants are always stored as a string. For example, the string 0x0001 is synonymous with the string constant "1" Notice that it is not necessarily a one to one corrospondance between the characters that make up a constant and the string that it represents.

Identifiers with a ~ preceeding them are syntactic sugar for one word strings. For example: ~foo Is a synonym for the string constant foo. This is useful because some CGL built-in's take an identifier name as a parameter.

Because of different ranges, unsigned numbers must be identified as such in CGL using the unsigned keyword. For example:


   bnot(unsigned 65535)

Will pass the constant 65535 in unsigned form to the bnot function. This becomes important when trying to manipulate numbers that are bigger than the machines signed long word size. For example, on a 32-bit architecture, you will not get the correct answer if you omit the unsigned keyword from the following:
   bnot(unsigned 0xffff0000)

Templates

Frequently compilers have to generate large amounts of text from a template. CGL has a special syntax that allows easy inlining of template code with a pleasant syntax. This is done with the template keyword. The first non-whitespace character after the template keyword is used to define the delimiter character.

Each line that is to be included in the template must be prefixed by the delimiter character. Additionally, variables may be expanded within the template by using the escape character. For example, the following procedure evaluates to a bit of assembly language code.


   proc FunctionProlog(1) =
      stack_bytes = (mul(4, arg(1)))

      template
      /
      /  ST   [sp],bp
      /  SUB  sp,  4
      /  COPY bp, sp
      /  SUB  sp, /stack_bytes/
      
      write_debug_desc(arg(1))
      ;

When a variable name is sandwiched between two delimiters in a template the value of the variable is inserted into the template text when the template is expanded. The first line after the template that does not begin with the template character is treated as normal CGL code.

3.2 Expressions

Null expressions

The simplest expression possible in CGL is the continue statement, which does just what it does in FORTRAN, nothing. Sometimes this is useful when an expression is necessary but there is no reason for it to do anything in the calling context. In addition, using continue in place of an empty string ("") is more efficient.

One good use of continue is for stubbing out procedures that are not finished. For example:


   proc UnfinishedRoutine = continue;

Will allow code to compile even if the routine is unfinished.

Variables and assigment

All variables in CGL hold strings and are named with strings. There is no scoping, all variables are global variables. An identifier in CGL evaluates to the contents of that variable. If the variable has never been assigned to, its value is the empty string ("").

To assign a value to a variable, the following construction is used:


   variable=("Mr. " first_name " " last_name)

The expression inside the parenthesis is evaluated and the results of that expression are assigned to the variable.

Some variables are automatically assigned during certain times by built-in routines (particularly those that evaluate their parameters more than one time). These variables are in no way special and can be assigned to and changed just as any other.

Function calls

All functions (both user defined and built-in's) are called using the same syntax. For example:


   myfunc(var ".service", 17)

Calls a function named myfunc with two arguments. If a function takes no arguments, its calling sequence can be abbriviated to its name without any parenthesis.

An ambiguity exists if a variable is named the same thing as a function. In this case, the function always prevails.

Complex expressions

A complex expression is one or more constants, variable fetches, variable assignments, or function calls concatenated together. What that is saying in so many words is that you can string the elements of expressions together to form more expressions. For example, given the following:


   "Hello "
     user_name 
   ", you are " 
   v=(calculate_age(user_name)) 
   v "year"
   if(gt(v, 1), "s")  /* Make plural if necessary */
   " old today."

CGL will evaluate it as a concatenation of the results of each of its elements. Given that user_name and caculate_age do as they imply, and given that if and gt are built-in functions, the above expression might result in the following when evaluated:
   Hello Lorelei Stierlen, you are 32 years old today.

3.3 Procedure definitions

A CGL procedure takes a certain number (possibly variable) of arguments and evaluates to a string result. Unlike some languages where the arguments are given formal names, CGL accesses the arguments by positional number (much like UNIX shell scripts); Therefore, a CGL procedure is an expression that makes calls to special CGL built-in's to get the passed arguments.

User procedures in CGL are defined using the proc keyword. If a CGL procedure is defined as taking no arguments a special case in the argument passing logic allows the calling function's arguments to be accessed in the procedure via the same access routines. This rule applies recursively down the call stack. As soon as a function that takes argumentsis called, the old argument list is saved and a new one is created. The old argument list can not be accessed while it is obscured by a newer one.

When calling a user defined procedure the arguments are always evaluated once from left to right. Built in CGL functions don't always abide by this rule. This is how looping and conditional built-in's work.

Let's define a simple CGL procedure that takes no arguments and evaluates to the string "Hello World."


   proc HelloWorld = "Hello World.";

The code above defines a CGL procedure, called HelloWorld, that evaluates to the string. While any CGL expression can follow the equal sign, it is perfectly legal to have the function return a constant. If we wanted to write a CGL procedure that took a two arguments (a persons name and their age) it would look like this:


   proc HelloPerson(2) = 
        "Hello " 
        arg(1) 
        ",  who is " 
        arg(2) 
        " years old.";

The 2 in parenthesis after the procedure name is the number of arguments to expect. Sadly, this routine has a little bug. If a person is one year old, it prints "1 years old," which is clearly poor English. Let's fix this by adding a conditional statement to decide if we need a plural word or a singular word (after all, why can't we write a code generator for English in CGL?):


    proc HelloPerson(2) = 
         "Hello "   
         arg(1) 
         " who is " 
         arg(2)
         " year" 
         if(gt(arg(2),1),"s") 
         " old."
         ;

The way this works is that first the if function is called. Because if is a built in, the handler for the if function is called. The handler then evaluates the first argument and tests the truth of the evaluation. This then calls the gt primitive which compares its arguments and returns true if the first is greater than the second. The arg function returns the numbered argument. Only if the evaluation is true does the if built-in evaluate the second argument.

But perhaps we want a routine that will take between two and three arguments.

If the third argument is present, we can assume it is the sex of the person and add a title to their name. To define this:


   proc HelloPerson(2:3) = 
        "Hello "
        if (eq(argcnt,3), 
          if (strequ(arg(3),~m), "Mr. ", "Mrs. "))
        arg(1)
        ", who is "
        arg(2)
        " year"
        if(gt(arg(2),1),"s") 
        " old."
        ;

Notice the 2:3 in the procedure definition. The built-in function argcnt returns the number of arguments. We can check this and modify the behavior of the function. This says that this routine can take from two to three arguments. This code contains one conditional that checks if the argument count is numerically three, if it is, it executes a second conditional that checks if the argument is the string "m." If so, it returns "Mr." otherwise it returns "Mrs." However, we could technically do a little optimization here and modify the second conditional to look like this:
   "Mr" 
   if (strequ(arg(3),~m), "","s"))
   ". "

Of course, how you factor your CGL code is up to you. From a time perspective, the former is better. From a space perspective, the latter is. Although CGL is fast enough that even on slow CPU's you shouldn't have to be concerned with this.

It is also possible for a CGL procedure to take an infinite number of arguments. Either component (the minimum or the maximum) of the argument count may be omitted (an asterisk must be used in place of the integer constant). If it is omitted, the limit is assumed not to exist. For example, a routine that takes a minimum of three arguments but an infinite number could be defined as:


   proc ThreeOrMore(3:*) = 

Or we could define a function that takes up to 7 arguments like this:


   proc UpToSeven(*:7) = 

3.4 Maps

CGL supports a construct called a ``map''. This is partially a performance hack and partially some syntactic sugar. A map is a static translation table that takes the result of an expression and matches it against several constants. Each constant is associated with a CGL expression. The result of the map is the result of the expression associated with the constant that matches the input. This is similar to the ``case'' statement of some other programminge languages.

For example, we can use a map to translate the titles of songs to the bands that wrote them:


   proc Name_That_Tune(1) =
      "The song \"" strcap(arg(1)) " was written by "
      map (strlwr(arg(1)))
      [
        "young lust",
        "comfortably numb":   { "Pink Floyd"   };
        "99 luftballoons":    { "Nena"         };
        "nookie":             { "Limp Bizkit"  };
        "money for nothing":  { "Dire Straits" };
        "supernova":          { "Liz Phair"    };

        default               { "some band"    };
      ]
      ".\n"
      ;

This procedure will take in the listed song names (on the left) and evaluate the expression on the right for that song. In this case, the associated expression is nothing more than a constant, however, it can be as complex as any other CGL procedure (including other maps).

More than one constant value can point to the same expression (in the example above, there are two songs written by "Pink Floyd".

The bottom entry in the map, default is a catch all. It is optional and if it is not specified and the input to the map does not match any of the entries then the map evaluates to nothing.

Because CGL is designed to aassist the writing of language translators, a function, remap, can be called to have a map re-evaluated. If a numeric argument is passed to remap then that is taken as the number of maps to backtrack over for remapping.

This allows parsers to be easily constructed with nested maps. For example a parser has a specific state, which starts out at 0.


  map (state)
  [
    0: { state_0 };
    1: { state_1 };
    2: { state_2 };
  ];

The code for each state is constructed as a map on the current input token:
  map (cur_token)
  [
    ~IDENT: 
    { 
      if 
      (
        token_is_for_me,

        process_token,
        /* else: */
        state = (1)
        remap(1)
      )
     };
   ]

The code for a specific state decides if a token is for it, if it is, the token is consumed and the normal return path will fetch another token (assuming all this is called from a routine like walk_tokens). If the token is not for state 0, then it can be tried in state 1 by shifting the state and causing the state machine to "re-map." The argument 1 to the remap function tells it to remap the previously invoked map, not the current one.

If the argument to remap is higher than the number of maps that are in processing a runtime error is generated.

3.5 Forms

When a normal CGL procedure is called, all of the arguments are evaluated once and their values are then saved so they can be accessed using the arg function. This means that any side-effects occur only once during a normal CGL procedure call. Forms allow procedures like the built-in's that evaluate their arguments multiple times.

To declare a procedure as a form keyword is placed before the number of arguments in procedure declaration:


  proc my_form(form 2)

Just as with procedures, forms can have a varying number of arguments. Unlike procedures, however, arguments are accessed with a different set of functions. The number of arguments passed to a form is obtained with the fcount function. To evaluate an argument, the feval function evaluates the specified argument.

For example, to make a form where each of its arguments are evaluated twice, this can be as follows:


  proc double(form *) = 
    count
    (
      1, fcount,      /* Loop limits. */
      feval(counter)
      feval(counter)
    )
    ;

When calling double each argument is evaluated twice; but more importantly, the side effects for each argument are applied twice.


Next Previous Contents