CGL stands for ``Code Generation Language.'' It is a language designed for string processing. It has a very simple syntax and uses built-in functions to perform most of its work. Unlike some scripting languages, CGL is not interpreted. Rather, all of the CGL code is compiled into an in-memory representation that is quite efficient to interpret. The only data type that CGL can process is the character string. This is not to say that other data types are not possible in CGL, just that they must be defined via a function library.
CGL is a functional language with a very simple syntax. Programming in CGL is all about calling functions. Some of these functions are built-in's and they perform common actions; others are user-defined (written by you). The main constructs in CGL are calling functions and setting variables. Even looping and conditional statements are done by calling special built-in's that either don't evaluate all of their arguments or evaluate them more than once.
CGL will take integer literals, quoted strings, and identifiers
prefixed with the ~ character to mean a constant.
Because there is only one data type in CGL (the string), these
constants are always stored as a string. For example, the string
0x0001 is synonymous with the string constant
"1"
Notice that it is not necessarily a one to one corrospondance
between the characters that make up a constant and the string that
it represents.
Identifiers with a ~ preceeding them are syntactic sugar
for one word strings. For example:
~foo
Is a synonym for the string constant foo. This is useful
because some CGL built-in's take an identifier name as a
parameter.
Because of different ranges, unsigned numbers must be identified as such
in CGL using the unsigned keyword. For example:
bnot(unsigned 65535)
bnot
function. This becomes important when trying to manipulate numbers that
are bigger than the machines signed long word size. For example, on a
32-bit architecture, you will not get the correct answer if you omit
the unsigned keyword from the following:
bnot(unsigned 0xffff0000)
Frequently compilers have to generate large amounts of text from a
template. CGL has a special syntax that allows easy inlining of template
code with a pleasant syntax. This is done with the template
keyword. The first non-whitespace character after the template keyword
is used to define the delimiter character.
Each line that is to be included in the template must be prefixed by the delimiter character. Additionally, variables may be expanded within the template by using the escape character. For example, the following procedure evaluates to a bit of assembly language code.
proc FunctionProlog(1) =
stack_bytes = (mul(4, arg(1)))
template
/
/ ST [sp],bp
/ SUB sp, 4
/ COPY bp, sp
/ SUB sp, /stack_bytes/
write_debug_desc(arg(1))
;
The simplest expression possible in CGL is the continue
statement, which does just what it does in FORTRAN, nothing. Sometimes
this is useful when an expression is necessary but there is no reason
for it to do anything in the calling context. In addition, using continue
in place of an empty string ("") is more efficient.
One good use of continue is for stubbing out procedures that are not finished. For example:
proc UnfinishedRoutine = continue;
All variables in CGL hold strings and are named with strings. There
is no scoping, all variables are global variables. An identifier in
CGL evaluates to the contents of that variable. If the variable has
never been assigned to, its value is the empty string
("").
To assign a value to a variable, the following construction is used:
variable=("Mr. " first_name " " last_name)
Some variables are automatically assigned during certain times by built-in routines (particularly those that evaluate their parameters more than one time). These variables are in no way special and can be assigned to and changed just as any other.
All functions (both user defined and built-in's) are called using the same syntax. For example:
myfunc(var ".service", 17)
myfunc with two arguments. If a
function takes no arguments, its calling sequence can be abbriviated
to its name without any parenthesis.
An ambiguity exists if a variable is named the same thing as a function. In this case, the function always prevails.
A complex expression is one or more constants, variable fetches, variable assignments, or function calls concatenated together. What that is saying in so many words is that you can string the elements of expressions together to form more expressions. For example, given the following:
"Hello "
user_name
", you are "
v=(calculate_age(user_name))
v "year"
if(gt(v, 1), "s") /* Make plural if necessary */
" old today."
user_name and caculate_age do as
they imply, and given that if and gt are built-in
functions, the above expression might result in the following when
evaluated:
Hello Lorelei Stierlen, you are 32 years old today.
A CGL procedure takes a certain number (possibly variable) of arguments and evaluates to a string result. Unlike some languages where the arguments are given formal names, CGL accesses the arguments by positional number (much like UNIX shell scripts); Therefore, a CGL procedure is an expression that makes calls to special CGL built-in's to get the passed arguments.
User procedures in CGL are defined using the proc keyword.
If a CGL procedure is defined as taking no arguments a special case in
the argument passing logic allows the calling function's arguments to
be accessed in the procedure via the same access routines. This rule
applies recursively down the call stack. As soon as a function that takes
argumentsis called, the old argument list is saved and a new one is
created. The old argument list can not be accessed while it is
obscured by a newer one.
When calling a user defined procedure the arguments are always evaluated once from left to right. Built in CGL functions don't always abide by this rule. This is how looping and conditional built-in's work.
Let's define a simple CGL procedure that takes no arguments and evaluates to the string "Hello World."
proc HelloWorld = "Hello World.";
The code above defines a CGL procedure, called
HelloWorld, that evaluates to the string.
While any CGL expression can follow the equal sign, it is
perfectly legal to have the function return a constant. If we
wanted to write a CGL procedure that took a two arguments
(a persons name and their age) it would look like
this:
proc HelloPerson(2) =
"Hello "
arg(1)
", who is "
arg(2)
" years old.";
The 2 in parenthesis after the procedure name is the number
of arguments to expect. Sadly, this routine has a little bug.
If a person is one year old, it prints "1 years old," which
is clearly poor English. Let's fix this by adding a conditional
statement to decide if we need a plural word or a singular
word (after all, why can't we write a code generator
for English in CGL?):
proc HelloPerson(2) =
"Hello "
arg(1)
" who is "
arg(2)
" year"
if(gt(arg(2),1),"s")
" old."
;
The way this works is that first the if function is called.
Because if is a built in, the handler for the if
function is called. The handler then evaluates the first argument
and tests the truth of the evaluation. This then calls the gt
primitive which compares its arguments and returns true if the first is
greater than the second. The arg function returns the
numbered argument. Only if the evaluation is true does the if
built-in evaluate the second argument.
But perhaps we want a routine that will take between two and three arguments.
If the third argument is present, we can assume it is the sex of the person and add a title to their name. To define this:
proc HelloPerson(2:3) =
"Hello "
if (eq(argcnt,3),
if (strequ(arg(3),~m), "Mr. ", "Mrs. "))
arg(1)
", who is "
arg(2)
" year"
if(gt(arg(2),1),"s")
" old."
;
2:3 in the procedure definition. The built-in
function argcnt returns the number of arguments. We can
check this and modify the behavior of the function. This says that
this routine can take from two to three arguments. This code contains
one conditional that checks if the argument count is numerically three,
if it is, it executes a second conditional that checks if the argument
is the string "m." If so, it returns "Mr."
otherwise it returns "Mrs." However, we could technically
do a little optimization here and modify the second conditional
to look like this:
"Mr" if (strequ(arg(3),~m), "","s")) ". "
It is also possible for a CGL procedure to take an infinite number of arguments. Either component (the minimum or the maximum) of the argument count may be omitted (an asterisk must be used in place of the integer constant). If it is omitted, the limit is assumed not to exist. For example, a routine that takes a minimum of three arguments but an infinite number could be defined as:
proc ThreeOrMore(3:*) =
Or we could define a function that takes up to 7 arguments like this:
proc UpToSeven(*:7) =
CGL supports a construct called a ``map''. This is partially a performance hack and partially some syntactic sugar. A map is a static translation table that takes the result of an expression and matches it against several constants. Each constant is associated with a CGL expression. The result of the map is the result of the expression associated with the constant that matches the input. This is similar to the ``case'' statement of some other programminge languages.
For example, we can use a map to translate the titles of songs to the bands that wrote them:
proc Name_That_Tune(1) =
"The song \"" strcap(arg(1)) " was written by "
map (strlwr(arg(1)))
[
"young lust",
"comfortably numb": { "Pink Floyd" };
"99 luftballoons": { "Nena" };
"nookie": { "Limp Bizkit" };
"money for nothing": { "Dire Straits" };
"supernova": { "Liz Phair" };
default { "some band" };
]
".\n"
;
More than one constant value can point to the same expression (in the example above, there are two songs written by "Pink Floyd".
The bottom entry in the map, default is a catch all. It is
optional and if it is not specified and the input to the map does not
match any of the entries then the map evaluates to nothing.
Because CGL is designed to aassist the writing of language translators,
a function, remap, can be called to have a map
re-evaluated. If a numeric argument is passed to remap then
that is taken as the number of maps to backtrack over for remapping.
This allows parsers to be easily constructed with nested maps. For example a parser has a specific state, which starts out at 0.
map (state)
[
0: { state_0 };
1: { state_1 };
2: { state_2 };
];
map (cur_token)
[
~IDENT:
{
if
(
token_is_for_me,
process_token,
/* else: */
state = (1)
remap(1)
)
};
]
walk_tokens). If
the token is not for state 0, then it can be tried in state 1 by shifting
the state and causing the state machine to "re-map." The
argument 1 to the remap function tells it to remap the
previously invoked map, not the current one.
If the argument to remap is higher than the number of maps that
are in processing a runtime error is generated.
When a normal CGL procedure is called, all of the arguments are evaluated
once and their values are then saved so they can be accessed using the
arg function. This means that any side-effects occur only once
during a normal CGL procedure call. Forms allow procedures like the
built-in's that evaluate their arguments multiple times.
To declare a procedure as a form keyword is placed before the number of arguments in procedure declaration:
proc my_form(form 2)
Just as with procedures, forms can have a varying number of arguments.
Unlike procedures, however, arguments are accessed with a different
set of functions. The number of arguments passed to a form is obtained
with the fcount function. To evaluate an argument, the
feval function evaluates the specified argument.
For example, to make a form where each of its arguments are evaluated twice, this can be as follows:
proc double(form *) =
count
(
1, fcount, /* Loop limits. */
feval(counter)
feval(counter)
)
;
When calling double each argument is evaluated twice; but more importantly, the side effects for each argument are applied twice.