TERMINOLOGY =========== `Z source format': The format of a source file that contains a Z specification plus natural language commentary, and possibly other things like diagrams, table of contents, indexes, specifications in other formal languages (statecharts etc.). This is the format that USERS create with some kind of editor. `Interchange format': A file or storage format that is used to communicate a Z specification between tools. This may be used to communicate just the Z part of a specification, or all of the specification, so I propose two more specific terms: `Z interchange format': An interchange format that contains ONLY the Z constructs of the specification. (The remaining parts, such as natural language, are discarded before creating this format.) `Complete interchange format': An interchange format that contains EVERYTHING in the specification, including natural language, diagrams etc. as discussed above. `Annotation': A piece of Z-related information that can be deduced automatically by some tool, and attached to a Z construct. (Example 1: the inferred type of a Z function application. In the predicate, a = {}, the 'a' and '{}' might be annotated with P Colour, to indicate that a set of Colour values are being compared). (Example 2: the inferred signature of a Z schema expression) There has been some debate about whether interchange formats should be annotated or not. The advantage of including annotations is that the receiving tool does not need to derive the annotations for itself (e.g., the type checking is already done). On the other hand, if the receiving tool is not interested in that information, it can ignore the annotations. So I propose the following final four kinds of interchange format (and an abbreviation for each one): `Annotated Z interchange format' (AnnZIF) `Non-annotated Z interchange format' (ZIF) `Annotated complete interchange format' (AnnCIF) `Non-annotated complete interchange format' (CIF) (If you use an abbreviation, please spell it out in full the first time you use it). Note that, by definition, a Z source format is not `annotated', because annotations are added automatically by tools. However, it is possible that a Z source format could also be used as an interchange format. (In that case, it would be a CIF). `AST': = `Annotated Syntax Tree'. This is a Java data structure for representing Z constructs. It contains roughly the same kind of information as an Annotated Z interchange format (AnnZIF). (NOTE: it would be possible to define four different kinds of syntax tree: non-annotated/annotated x complete/z-only. However, I think that the annotated, z-only one will be most common. `Source Locator': = a reference to a precise position in a Z source file (including file name, line number, column number etc.) REQUIREMENTS ============ Each requirement has a one-line executive summary, followed by extra explanation that hopefully makes it clear what I mean. REQUIREMENT 1: CZT can read most existing Z source documents. It must be able to read LaTeX format at least, and any other common Z source formats that are in widespread use. REQUIREMENT 2: CZT can integrate with existing Z tools. It must be possible to connect it to existing interactive Z tools, like the CadiZ prover. It should also be possible to connect it to batch-oriented Z tools that produce reports or perform Z->Z transformations. CONSEQUENCES: 2a. The only way of connecting to existing tools is via files. So CZT must communicate with some tools via interchange files. 2b. CZT must be able to invoke existing tools, so must have some knowledge of their command line options etc. For the batch-oriented tools that produce output files (perhaps Z files), CZT must also know roughly what the tool does and what kind of output it produces, so that the output can be incorporated back into CZT. 2c. Using existing tools means that CZT cannot have a completely uniform user interface. The core tools of CZT may follow a common GUI-style, but some addon existing tools will continue to use their own interfaces. REQUIREMENT 3: CZT eases the development of new Z tools. For example, a student project should be able to reuse existing parsers and typecheckers etc., and get straight into manipulating Z expressions/schemas, with convenient libraries to make the task even easier. To help tool builders who wish to use programming languages other than Java (the preferred language of CZT), such as Prolog or functional languages, a good approach is to provide interchange formats that can be easily read by that language. (Perhaps an XML format, and/or a Prolog-specific format etc.). However, for Java we can do even better. The best way of making it DRAMATICALLY easier to create new Z tools for Java programmers is to provide Java AST classes for the Z constructs, plus parsing libraries, typechecking libraries, visitor libraries, and superclasses for various kinds of tools etc. 3a. CZT will provide Java AST classes for Z, plus support classes such as parsers, typecheckers, visitors, pretty-printers. REQUIREMENT 4: CZT supports a WYSIWYG editor for Z. RATIONALE: useful/necessary for Z beginners, such as students. REQUIREMENT 5: CZT should produce accurate error messages. Where possible, error messages should include accurate references to the line and column of the SOURCE file which caused the error. CONSEQUENCES: 5a. Tools that transform the Z AST in various ways must try to preserve `source locator' information which provides backlinks to the source. For example, each AST object might have an optional attribute that records the source locator of that construct (which specifies the file name, line number and column number where the construct was originally read from). 5b. Any interchange formats must be able to include these `locators'. DESIGN ISSUES ============= Some of the questions that people have been discussing are (expressed in the above terminology): * should `the' Z source format be XML-based? * could raw unicode files (a sequence of unicode characters) by used as a Z source format? (because it is defined in the Z standard and WYSIWYG editors are, or soon will be, available for it). * should `the' interchange format be annotated or not? * how abstract should `the' AST be? I think the `the' assumption (e. `there shall be only one') is unnecessary. My main conclusion from listening to the discussion is that we should support: 1. MULTIPLE source formats (LaTeX and Unicode and others), 2. MULTIPLE interchange formats (XML, plus others to communicate with other existing tools), and 3. MULTIPLE AST formats (to support different levels of abstraction). [Actually, I still have hopes that we can get away with just one AST format, and still support several levels of abstraction, but more about that in another message.] PROPOSED ARCHITECTURE ===================== So, here is an open, extensible style of architecture that can support multiple kinds of each format. It is a graph-like architecture, which is comprised of a set of nodes and arcs, where: * each node is a Z specification, in one of the formats discussed above: Z source format; Interchange Format; AST. * each arc is a tool (either a standalone utility, or a Java class) that transforms an input node to an output node. In fact, other kinds of non-Z nodes are also usefully included, such as .dvi/.ps/.html output files, tool-specific files etc. In general, a tool may take several input nodes (though I think most will take just one) and produce several output nodes. For example, an initial instantiation might be: spec.tex spec.unicode | / | Lexer / Lexer | ______________/ |/ V Z Lexemes | | Parser V AST | | Type checker V AST+TypeAnns / \ \__________________ / \ \ / \ V Animator / \ Test-case AnnZIF(XML) / \ generator | V V | Your favourite Animation Test | XML Z tool here Outputs Suites V Some other file Wolfgang Grieska will probably recognise the above proposal as a poorly described version of the Zeta architecture. :-) I still think we can use many aspects of the Zeta design, perhaps its AST (though I like Ian Toyn's suggestion of doing a comparison study of several ASTs first). Perhaps we can even use the current Zeta code. (I think I'm getting close to getting it to compile under Java 1.3, now that the Pizza people have fixed the error that I found).