TERMINOLOGY
===========

`Z source format':  The format of a source file that contains a Z 
        specification plus natural language commentary, and possibly 
        other things like diagrams, table of contents, indexes, 
        specifications in other formal languages (statecharts etc.).
        This is the format that USERS create with some kind of editor.

`Interchange format':  A file or storage format that is used to communicate
        a Z specification between tools.  This may be used to communicate
        just the Z part of a specification, or all of the specification, 
        so I propose two more specific terms: 

   `Z interchange format':  An interchange format that contains ONLY
        the Z constructs of the specification.  (The remaining parts,
        such as natural language, are discarded before creating this format.)

   `Complete interchange format':  An interchange format that contains
        EVERYTHING in the specification, including natural language, 
        diagrams etc. as discussed above. 

`Annotation':  A piece of Z-related information that can be deduced
        automatically by some tool, and attached to a Z construct.
        (Example 1: the inferred type of a Z function application.
                  In the predicate, a = {}, the 'a' and '{}' might be
                  annotated with P Colour, to indicate that a set of Colour
                  values are being compared). 
        (Example 2: the inferred signature of a Z schema expression)

        There has been some debate about whether interchange formats
        should be annotated or not. 
        The advantage of including annotations is that the receiving
        tool does not need to derive the annotations for itself 
        (e.g., the type checking is already done).  On the other hand, 
        if the receiving tool is not interested in that information, it 
        can ignore the annotations.  So I propose the following final four 
        kinds of interchange format (and an abbreviation for each one):


    `Annotated Z interchange format'             (AnnZIF)
    `Non-annotated Z interchange format'         (ZIF)
    `Annotated complete interchange format'      (AnnCIF)
    `Non-annotated complete interchange format'  (CIF)

         (If you use an abbreviation, please spell it out in full
          the first time you use it).

         Note that, by definition, a Z source format is not `annotated',
         because annotations are added automatically by tools.
         However, it is possible that a Z source format could also be
         used as an interchange format.  (In that case, it would be a CIF).

        
`AST':  = `Annotated Syntax Tree'.
        This is a Java data structure for representing Z constructs.
        It contains roughly the same kind of information as an
        Annotated Z interchange format (AnnZIF).

        (NOTE: it would be possible to define four different kinds
        of syntax tree:  non-annotated/annotated x complete/z-only.
        However, I think that the annotated, z-only one will be most
        common.

`Source Locator':  = a reference to a precise position in a Z source file
        (including file name, line number, column number etc.)


REQUIREMENTS
============
Each requirement has a one-line executive summary, followed
by extra explanation that hopefully makes it clear what I mean.

REQUIREMENT 1: CZT can read most existing Z source documents.
    It must be able to read LaTeX format at least, and any 
    other common Z source formats that are in widespread use.

REQUIREMENT 2: CZT can integrate with existing Z tools.
    It must be possible to connect it to existing interactive 
    Z tools, like the CadiZ prover.
    It should also be possible to connect it to batch-oriented
    Z tools that produce reports or perform Z->Z transformations.

    CONSEQUENCES:
    2a. The only way of connecting to existing tools is via files.
        So CZT must communicate with some tools via interchange files.

    2b. CZT must be able to invoke existing tools, so must have
        some knowledge of their command line options etc.
        For the batch-oriented tools that produce output files
        (perhaps Z files), CZT must also know roughly what the
        tool does and what kind of output it produces, so that 
        the output can be incorporated back into CZT.

    2c. Using existing tools means that CZT cannot have a 
        completely uniform user interface.  The core tools of 
        CZT may follow a common GUI-style, but some addon existing
        tools will continue to use their own interfaces.

REQUIREMENT 3: CZT eases the development of new Z tools.
    For example, a student project should be able to reuse existing 
    parsers and typecheckers etc., and get straight into manipulating 
    Z expressions/schemas, with convenient libraries to make 
    the task even easier.

    To help tool builders who wish to use programming languages other
    than Java (the preferred language of CZT), such as Prolog or functional
    languages, a good approach is to provide interchange formats that can be
    easily read by that language. (Perhaps an XML format, and/or a
    Prolog-specific format etc.).  

    However, for Java we can do even better.  The best way of making 
    it DRAMATICALLY easier to create new Z tools for Java programmers 
    is to provide Java AST classes for the Z constructs, plus parsing 
    libraries, typechecking libraries, visitor libraries, and 
    superclasses for various kinds of tools etc.  

    3a. CZT will provide Java AST classes for Z, plus support
        classes such as parsers, typecheckers, visitors, pretty-printers.


REQUIREMENT 4: CZT supports a WYSIWYG editor for Z.
    RATIONALE: useful/necessary for Z beginners, such as students.


REQUIREMENT 5: CZT should produce accurate error messages.
    Where possible, error messages should include accurate references
    to the line and column of the SOURCE file which caused the error.

    CONSEQUENCES:
    5a. Tools that transform the Z AST in various ways must try to preserve
        `source locator' information which provides backlinks to the source.
        For example, each AST object might have an optional attribute that
        records the source locator of that construct
        (which specifies the file name, line number and column number
        where the construct was originally read from).

    5b. Any interchange formats must be able to include these `locators'.


DESIGN ISSUES
=============
Some of the questions that people have been discussing are
(expressed in the above terminology):

  * should `the' Z source format be XML-based?

  * could raw unicode files (a sequence of unicode characters)
     by used as a Z source format?  
     (because it is defined in the Z standard and WYSIWYG 
      editors are, or soon will be, available for it).

  * should `the' interchange format be annotated or not?

  * how abstract should `the' AST be?

I think the `the' assumption (e. `there shall be only one') is unnecessary.
My main conclusion from listening to the discussion is that we
should support:

  1. MULTIPLE source formats (LaTeX and Unicode and others),
  2. MULTIPLE interchange formats (XML, plus others to communicate
     with other existing tools), and
  3. MULTIPLE AST formats (to support different levels of abstraction).

[Actually, I still have hopes that we can get away with just
one AST format, and still support several levels of abstraction,
but more about that in another message.]


PROPOSED ARCHITECTURE
=====================

So, here is an open, extensible style of architecture that
can support multiple kinds of each format.
It is a graph-like architecture, which is comprised of
a set of nodes and arcs, where:

  * each node is a Z specification, in one of the formats
    discussed above:  Z source format;  Interchange Format;  AST.

  * each arc is a tool (either a standalone utility, or a Java class)
    that transforms an input node to an output node.

In fact, other kinds of non-Z nodes are also usefully included, such as
.dvi/.ps/.html output files, tool-specific files etc.

In general, a tool may take several input nodes (though I think most 
will take just one) and produce several output nodes.


For example, an initial instantiation might be:

                 spec.tex             spec.unicode
                    |                 /
                    | Lexer          / Lexer
                    | ______________/ 
                    |/
                    V
                   Z Lexemes
                    |
                    | Parser
                    V
                   AST
                    |
                    | Type checker
                    V
                   AST+TypeAnns
                    /  \     \__________________
                   /    \                       \
                  /      \                       V
        Animator /        \ Test-case          AnnZIF(XML)
                /          \ generator            |
               V            V                     | Your favourite
          Animation       Test                    | XML Z tool here
          Outputs        Suites                   V
                                               Some other file


Wolfgang Grieska will probably recognise the above proposal
as a poorly described version of the Zeta architecture.  :-)
I still think we can use many aspects of the Zeta design,
perhaps its AST (though I like Ian Toyn's suggestion of doing
a comparison study of several ASTs first).
Perhaps we can even use the current Zeta code.
(I think I'm getting close to getting it to compile under Java 1.3,
now that the Pizza people have fixed the error that I found).