(b) The Program
The "program", or preprocessed source, is stored as a list of
instructions, and prog[num] is a structure containing details about the
"num"th instruction. There are "stmts" instructions in this list. Note
that "THEN", "ELSE" and "OTHERWISE" are counted as individual
instructions, and that null clauses do not appear in the list of
instructions. prog[0] contains some pointers to the starts of things
(namely the block allocated for the tokenised program itself, the block
allocated for the source, and the beginning of the first comment in the
source). prog[stmts] is a null instruction which points to the end of
the source.
Each instruction is obtained from the source by removing all comments
and labels; stripping all unnecessary blanks and collapsing multiple
spaces into one; concatenating lines together whenever the continuation
character "," appears; tokenising keywords into special characters, and
translating "^" (the variant NOT operator) into "\" (the real NOT
operator). The source is also checked for mismatched quotes, invalid
labels and invalid characters during preprocessing.
Whereas "source" always contains the source of the program being
interpreted, "prog" can sometimes contain the list of instructions
obtained from an "interpret" instruction. In this case, the source
pointer in prog[0] is used instead of source[1] when freeing the source.
Also, certain instructions (such as "call") are required to save the
current program temporarily and go back to the original program, by
searching for it on the program stack.
The current instruction number within the program is stored in the global
variable ppc. The current character position within the program line is
stored in a local variable, often called lineptr.
Each prog[num] consists of the following structure, called "program":
int num; The line number in the source where the instruction
originates - used when tracing source instructions.
char *source; The start of this instruction in the source
char *sourcend; The end of this instruction in the source
int related; Reserved for future use
char *line; The address of the tokenised instruction
The source between "source" and "sourcend" is the current instruction
including any comments on the same line but not including labels.
Comments on separate lines and labels will appear after "sourcend" or
before "source".
(f) The Symbol Table
The symbol table is a multiple-level associative array containing the
definitions of all the REXX symbols which have been assigned values. It
is stored in a block of memory addressed by vartab and of length varlen.
Each level of the table is a separate entity, referring to the active
variables of a program fragment which used the PROCEDURE instruction to
hide the earlier levels, except that each level may contain pointers to
earlier levels as a result of the PROCEDURE EXPOSE instruction.
The start of each level is stored in the array varstk[] as an integer
offset from the start of the entire table. The number of levels is
varstkptr, and the number of elements in the array is varstklen. The
total length of all current levels of the symbol table is
varstk[varstkptr+1].
Each level of the symbol table contains all currently active simple
symbols and stems as a binary search tree. The elements of the tree are
in the following format (as described by the varent data type):
int total length of this element
int position of the left child (as offset from start of level)
int position of the right child (as offset from start of level)
int length of the symbol's name
int amount of memory allocated to hold the symbol's value
int actual length of the symbol's value
char[] the symbol's name, padded [*]
char[] space for the symbol's value
If the symbol is a stem, the terminating dot of the name is not stored,
but the most significant bit of the first character of the name is set.
If the length of the value is negative, then the symbol is considered
to be undefined; it has been deleted by DROP or some other process.
If the "amount of memory" element of the structure is negative, then
this symbol is actually located in an earlier level; the negative value
gives the level number, starting at -1 which means the earliest level
(i.e. the first in the memory block).
Each stem entry in the symbol table has a value containing a default
value and a mini-symbol table. The default value is stored first, in
the format:
int the amount of space allocated to the default value
int the actual length of the default value
char[] space for the default value
Immediately following that are entries for all the tails, in exactly the
same format as for the main symbol table.
(g) The Program Stack
The program stack holds information relating to currently active control
structures, e.g. function calls and DO-END blocks. It is stored in a
block of memory addressed by pstackptr and of length pstacklen. The
length of all data currently on the program stack is epstackptr. The
stack holds a sequence of various entries in the following formats, and
the number of entries in the current function is held in pstacklev:
for a simple DO-END block (type 0), a SELECT block (type 2),
a DO WHILE or DO FOREVER block or (type 8), a "struct minstack" entry:
int stmt the statement number where the block started
char *pos the character where the block started (used for finding the
WHILE or UNTIL part of a repetitive DO instruction)
int len the length of this structure (16)
int type the type value (as mentioned above)
for a repetitive DO with a control variable, an extended "struct
minstack" entry:
int stmt the statement number where the block started
char *pos the character where the block started
char[] the limit value, padded [*] (empty string for "no limit")
int the length of the limit value
char[] the step value, padded [*]
int the length of the step value
char[] the name of the control variable, padded [*]
int the length of the control variable name
int the value which appeared after FOR, or -1 for "no value"
int len the total length of this structure
int type the number 10.
for a repetitive block introduced by "DO count", a "struct forstack" entry:
int stmt the statement number where the block started
char *pos the character where the block started
int fornum the value specified after "DO"
int len the length of this structure (20)
int type the number 15.
for an internal function call, a "struct procstack2" entry:
int stmt the statement number where the function call ocurred
char *csp the caller's cstackptr
int ecsp the caller's ecstaclptr
int csl the caller's cstacklen
char trc the caller's trace flag
char tim the caller's timestamp flag
char form the caller's NUMERIC FORM (0 for SCIENTIFIC)
the compiler will probably insert one byte here
int digits the caller's NUMERIC DIGITS
int fuzz the caller's "NUMERIC DIGITS minus NUMERIC FUZZ"
long mic the caller's timestamp microseconds value
long sec the caller's timestamp seconds value
int address1 the caller's environment number (for ADDRESS)
int address2 the caller's alternate environment number
program *prg the program which was being interpreted at the time
int stmts the number of statements in that program
int len the length of this structure (60)
int type the number 11, becoming 12 when the PROCEDURE instruction
has been executed
Note: the program is stored in case it was generated by "interpret"; if
that is so then the "real" program will be reinstated before the call.
for an INTERPRET instruction, a "struct interpstack" entry:
int stmt the statement number where the INTERPRET occurred
program *prg the program which was being interpreted at the time
int stmts the number of instructions in that program
int len the length of this structure (20)
int type the number 14.
for a command typed during interactive trace mode, a "struct
interactstack" entry (note that a "struct interpstack" entry also
appears above this):
int stmt the statement number where the interruption occurred
char *csp the interrupted program's cstackptr
int ecs the interrupted program's ecstackptr
int csl the interrupted program's cstacklen
int len the length of this structure (16)
int type the number 16.
Occasionally, the interpreter stacks a program line with the sole intent
of having it appear in the traceback. Such an entry would be in the
format of a "struct errorstack":
int stmt the statement number where the error occurred
program *prg the program where the error occurred
int stmts the number of statements in this program
int len the length of this structure (20)
int type the number 20
Note that external calls are no longer stored on the program stack. For
most purposes, they can be seen to be executed by a separate incarnation
of the interpreter.
(h) The "Hash" Tables
There are three "hash" tables, each arranged in a similar way to one
level of the symbol table; however the value of each symbol in the hash
table is a single void* pointer rather than a string of characters.
There are three hash tables, each occupying a separate block of memory
addressed by hashptr[i] and of length hashlen[i] (for i=0,1,2). They
are used to store environment variables, details of open files, and
details about loaded functions respectively. Each element of a hash
table is arranged as follows (and as indicated in the "hashent" data
type):
int length of this element
int position of the left child
int position of the right child
void* value of the element
char[] name of the element, terminated by a zero byte and padded [*]
The value of each element may be one of the following:
In hash table 0: a pointer to the string "NAME=VALUE" which has been
used in a putenv() call.
In hash table 1: either a null pointer, or a pointer to a block of
memory which contains a "struct fileinfo" followed by a zero-terminated
filename (which may be empty if the file name is unknown). The "struct
fileinfo" is described in const.h.
In hash table 2: a pointer to a "funcinfo" structure containing the
address of a loaded function. One of the functions loaded from each
function package also contains the handle which was returned from
dlopen().
(i) The Signal Stack
REXX stores the context of each invocation of the interpreter()
function, so that it can be restored whenever an EXIT or a caught signal
is encountered. The context is stored in the array sigstack[], which at
any time has sigstacklen elements allocated to it. The number of the
highest used element of sigstack[] is interplev. Each entry of the
signal stack contains:
short bits The combined bits for all SIGNAL ON traps which are on
short bitson ...all SIGNAL ON traps which were executed at this level
short callon ...all CALL ON traps which are on
short delay ...all conditions which are delayed
char type 1 if a "signal" trap just occurred, 2 if "call", 0 if none
char which The number for the condition which just occurred (if any)
char *data A description for for the condition which just occurred
int ppc[6] The statement numbers to jump to for all the conditions
or minus the statement number to flag with "label not
found" if a condition occurs
jmp_buf jmp The context of this level of the interpreter.
Note: a negative number in ppc[i] means that the trap instruction for
this condition named a non-existent label. -ppc[i] will be the
statement number of this trap instruction. This means that we can
delay the "label not found" error until it actually happens: moreover,
we can include the condition trap instruction in the traceback.
However, an interpreted condition trap instruction is not guaranteed to
stay around, and therefore it is reported as an error immediately if it
names a non-existent label.
(j) The Shell's Hash Table
The builtin shell keeps a record of where it found each command in a
hash table (yes, a real one ;-) ). The variable "hashtable" points to
an array of bucket pointers, and the variable "hashbuckets" holds the
number of pointers in the table. A bucket pointer is null if there are
no entries in the bucket; otherwise it points to a hashitem structure,
representing the first item in the bucket. That item may in turn point
to another item in the bucket. The items in each bucket are kept in
alphabetical order, to reduce the average time taken for an unsuccessful
search (or an insertion). Each item contains the following information:
struct _hashitem *next The next item in the bucket, or null
int hits Number of times this has been found
int expense Position within $PATH
int dot Whether dot occurred in the path before this
int data Offset from end of header to data
char[] The key; a string of characters ending with 0.
char[] The data; another nul-terminated string.