The group was exciting in the 1970's, when we were groping for direction and divided by different orientations. I guess it was in this atmosphere that combined purpose with uncertainty that I found my own voice. The common goal was a transformational program development methodology that would improve productivity of designing and maintaining correct software with an emphasis on algorithmic software. We differed as to how to achieve this goal, and my approach was out on a limb.Bob Paige couldn't attend WG2.1 Meeting 53 due to illness. He wrote this retrospective of his work from WG2.1's point of view, to be delivered at the meeting in his absence. He hopes that others may be inspired to write similar summaries.

Based on a few transformations, the most exciting of which was Jay Earley's iterator inversion combined with high-level strength reduction, and also on an overly optimistic faith in the power of science to shed light on this subject, I believed that algorithms could be designed scientifically by application of a small number of rules, whose selection could be simplified (even automated in some cases) if it could be guided by complexity. Almost all others (including the SETL crowd at Courant) disagreed, and accepted the notion that algorithm design was `inspired', and that the most significant steps in a derivation were unexplainable `Eureka' steps.

I knew that my goals were ambitious and there was little supporting evidence. In fact, the case was too flimsy even to begin work on the components of a program development methododogy. It seemed better to gather facts first, to uncover compelling examples that might lead to a theory, and to put together this theory only after the pieces were sufficiently understood and developed.

In order to test the viability of a new transformational idea, I tried to demonstrate how it could be used to improve some aspect of a conventional area of CS; e.g., databases, algorithms, programming languages, etc. (but not transformational programming itself). A conservative, neutral test would be determined by an evaluation of the improvement (independent of the transformational idea) by members of the conventional area. A more subjective, but still useful, clinical test could be made by me checking whether the new idea could improve the quality of education or facilitate my research. (I used the class room not only as a laboratory for clinical tests, but also as a way for me to learn and retool in new subject areas.) After developing the transformational idea further, it could be evaluated directly at WG2.1 meetings or by the standard refereeing process.

Perhaps the most important component of a transformational methodology is a wide-spectrum language for expressing all levels of abstraction from problem specifications down to hardware-level implementations. Since I was a student at NYU with Jack Schwartz as my adviser, naturally I went with SETL, and found it convenient to use in my thesis work in finite differencing (from 1977-1979). As it turned out I was teaching full time at Rutgers starting Fall 1977, and tried lots of research ideas out in the class room early on.

Despite serious semantic flaws in SETL (some of which were fixed much later in SETL2), I was able to make good use of its notations combined with standard abstract notations found in the mathematical preliminaries chapters of standard math texts. Most of my students didn't know the first thing about mathematical notation, and my variant of SETL proved to be a great improvement.

In algorithms courses the built-in abstractions of SETL made algorithm understanding easier even (especially) with weak students who were overwhelmed by Mix-like or Algol-like implementation-level specifications. These abstractions made it easier to model data structures and specify rudimentary strategies of many algorithms with informal transformations into a real implementation. The same built-in abstractions can be reused avoiding language extensions with ADT specifications. It seemed that this approach allowed me to cover much more material and convey much greater understanding.

In database courses SETL notations seemed better than the standard query languages. It facilitated teaching different data models, levels of description including implementations at the level of file systems and indexes. A Philips Research technical report by Lacroix and Pirotte, which compared over 100 database queries in nine different query languages plus English, was instructive. My own SETL variants seemed to have greater clarity. A uniform wide-spectrum notation facilitated description and analysis of file and database organization, and made it easier to describe mappings from query to physical level and from one data model to another.

The wide-spectrum nature of the language made it easy to illustrate transformations at and between various levels of abstraction. The first transformation that I developed was finite differencing, which was based on conventional strength reduction plus Earley's Iterator Inversion. It included supporting transformations such as dead code elimination and stream processing. Differencing is a mathematical idea with an old history. The chain rule mechanism and other features of finite differencing are independent of language, but the wide-spectrum nature and simplicity of SETL seemed ideal for illustrating the power and broad applicability of the transformation in a separate implementation design.

Finite Differencing seemed to explain how to implement so very many algorithms efficiently. It seemed to make teaching much easier. Other aspects of an implementation could be worked out more informally. One of the best examples of how surprising connections can be made, due to finite differencing, took place at the Chamrousse WG2.1 meeting in France. We were all trying to derive one of those efficient max sequence algorithms, and the use of finite differencing at a major step suggested a similar step (to compute nested collective minimum values) in the derivation of Dijkstra's single-source shortest paths algorithm.

During these early years there was also one interesting objective test of finite differencing in the area of databases. In his Turing award address Codd said that integrity control was an important open problem in relational databases. Successful use of finite differencing to solve view maintenance and integrity control was reported in [Paige and Koenig, 1981] and [Paige, 1982]. Ed Clarke, who was working on the same problem with Blaustein and Bernstein, agreed that we made a significant advance over their work.

Over the years we have uncovered extensive evidence that our methodology will scale up and be capable of improving the productivity of large-scale system implementation. Three kinds of evidence have been considered. Our SETL-based specification languages and the transformations that implement them have been shown to be effective in terms of (1) explaining complex algorithms, (2) discovering new algorithms, and, most importantly, (3) improving the productivity of efficient implementations of algorithms.

Most mainstream research in Program Transformations emphasizes principles for constructing derivations with illustrations drawn from known algorithms. The purpose has been to uncover common patterns of abstraction in specification and transformation that could form the basis of a useful methodology for making algorithm design and program development easier. Short, elegant transformational proofs (of correctness integrated with analysis) document implementations of complex algorithms, lend greater confidence in the correctness of complex codes, and provide greater assurance in the reliable modification of such codes.

Perhaps the first examples of nontrivial algorithms being derived by finite differencing were presented in [Paige and Koenig, 1982; Paige, 1981]. Included among these examples is a SETL specification of Dijkstra's naive Bankers Algorithm and its transformation into Habermann's efficient solution. This derivation was done without knowledge of Habermann's solution, and there were no dead ends. Finite differencing homed right in on a solution matching Habermann's time/space bounds.

It is well known that the construction of optimizing compilers is a costly labor-intensive task. Can this labor be reduced by our methods? To answer this question in part, we showed how easy it was to specify dozens of programming language and compiler analysis problems in SQ2+, a functional language composed from the expressions of SETL augmented with least and greatest fixed point expressions. It was also shown how to simplify these specifications, and to transform them by dominated convergence into high-level SETL prototypes [Cai and Paige, 1989]. In [Cai and Paige, 1987] we showed how the constant propagation algorithm of Reif and Lewis could be expressed as set-theoretic equations in a subset of SQ2+ that could be mapped into RAM code guaranteed to run in linear time in the size of the program dataflow relation.

The use of notation has been regarded as a burden to algorithm
designers ever since Knuth came out with Mix. But can
notation also help satisfy the needs of the algorithm community -
precise algorithmic analysis and succinct exposition? An SQ2+
specification of the Single Function Coarsest Partition Problem and
its transformation by dominated convergence, finite differencing, and
real-time simulation was used to derive a new linear-time solution
[Paige *et al.*, 1985].
That algorithm paper was selected for publication
in a special issue of TCS as a best paper from ICALP. In
[Paige and Zhe, 1997]
a much improved explanation of the algorithmic tool
called Multiset Discrimination in
[Cai and Paige, 1995]
was obtained by
specifying the algorithm in low-level SETL and using its type system to
formally explain and analyze the low-level implementation that would
be obtained by real-time simulation. The earlier presentation of this
algorithmic device involved so much indirection from pointer-based
primitives, that most readers were confused. Some of these earlier
readers agreed that the low-level SETL presentation clarified their
understanding.

The viability of a transformational methodology can be demonstrated by using it to `explain' or `prove' well-known algorithms. However, if in the course of such formal explication no new deep structure is uncovered that leads to improved solutions, then its impact on algorithmics and programming productivity is likely to be limited. More powerful evidence favoring a transformational methodology is provided if it can be demonstrated to help facilitate the discovery of new algorithms. Our success with algorithm discovery may be attributed to our reliance on complexity in both specification and transformation.

Our first instance of algorithm discovery by transformation was reported in [Paige and Henglein, 1987], where we used all three transformations to turn an SQ2+ specification of Horn Clause Propositional Satisfiability into a new linear-time pointer machine algorithm. Previously, Dowling and Gallier found a linear-time algorithm that relied heavily on array access. In [Bloom and Paige, 1995] we used dominated convergence and finite differencing to derive a low-level SETL executable prototype from an SQ2+ specification of ready simulation. We then showed informally how the low-level SETL prototype could be turned into an algorithm that runs five orders of magnitude faster than the previous solution in Bloom's thesis. All three transformations, but especially real-time simulation (where types were shown to be useful in modeling complex data structures), were involved in the discovery of a new improved solution to the classical problem of DFA minimization [Keller and Paige, 1996]. Finite differencing was used extensively in [Chang and Paige, 1997] to derive a new improved solution to the classical problem of turning regular expressions into DFA's. Perhaps our most convincing paper-and-pencil result was in [Goyal and Paige, 1997], where my Ph.D. student Deepak Goyal and I used low-level SETL specifications and real-time simulation to improve Willard's time bound for query processing from linear expected to linear worst-case time without degrading space. Willard's original algorithm was extremely difficult, and involved over 80 pages of proofs. Our transformational approach yielded much shorter but also more precise constructive proofs that led to an implementation design. Here is a first successful example of scaling up.

Two summers ago, Goyal designed and implemented `practical' algorithms in Java at Microsoft. He believes that his use of low-level SETL and the data structure selection transformation as part of a programming methodology increased his productivity and improved the quality of the code that was produced.

Perhaps the most compelling evidence that our transformational methodology will scale up and provide a dramatic improvement in the productivity of large high-performance complex systems may be found in the experiments by Cai and Paige [Cai and Paige, 1993]. In that paper we developed a simple but conservative model of productivity. Within that model we demonstrated a five-fold improvement in productivity of high-performance algorithm implementation in C in comparison to hand-coded C.

Those experiments tested an approach to producing C programs by writing programs in a simple variant of low-level SETL, and compiling them into high-performance C. The high performance of the C code produced by the SETL-to-C translator is based on the translator's ability to simulate associative access (e.g., finite set membership or finite map application) on a RAM in real time.

Measuring productivity improvement depended on two assumptions. The first is that one line of low-level SETL takes no more time to compose than one line of C. This assumption is not controversial. Our experience is that one line of low-level SETL can be produced faster than a line of C. The generally lower level of discourse in C creates an intellectual gap between the program and the mathematical function it computes. For example, in order to implement SETL's element deletion operation (s less:= x) efficiently in C would require at least 10 carefully chosen C operations. The need to access data through pointers and cursors in C creates a greater level of indirection (which complicates understanding) than in SETL, where data is accessed directly through values.

The second assumption is that the C code generated automatically from low-level SETL has roughly the same number of lines as equivalent hand-coded C. We found that the generated C was between 10% and 30% larger than hand-coded C, and concluded that the errors in the two assumptions would cancel each other out.

Suppose we measure programming productivity in a given programming language as the number of pretty-printed source lines produced per unit time. Suppose also that productivity decreases as the conceptual difficulty in understanding a program grows. Then in our investigation, which is restricted to highly algorithmic programming (as is found in complex language systems and environments), conceptual difficulty is roughly reflected in the size of the dataflow relation, which can be expected to grow nonlinearly with the number of source lines. Thus, programming productivity P(L), as a function of the number of program source lines L, increases as L decreases.

Let L2 be the size of a C program C2 compiled from a low-level SETL specification S1 of size L1, and let C3 be a hand-crafted C program equivalent to C2. By assumption one, we can use the same productivity function P(L) for programs written in both low-level SETL and C. By assumption two, we know that the size of C3 is roughly the same as the size L2 of C2. Then the improvement in productivity by using our automated program development methodology versus hand-crafted programming is given by the time L2/P(L2) to manually produce C2 divided by the time L1/P(L1) to manually produce S1. A lower bound on this productivity improvement is given by L2/L1, since P(L1)/P(L2) > 1 whenever L1 < L2.

It should be emphasized that our experiments did not measure productivity directly, which would have required difficult and costly analysis of human factors such as programming expertise and intelligence. Instead we measured productivity improvement by exploiting the two assumptions mentioned above to obtain an objective, inexpensive, and credible framework for conducting comparative productivity experiments that could avoid all human factors. The ratio of lines of C code generated automatically from low-level SETL divided by the number of lines of low-level SETL being compiled gives a lower bound on productivity improvement. Every algorithm that we tested in [Cai and Paige, 1993] yielded ratios that exceeded 5, and that grew as the input size grew.

Although we found that the C code generated automatically from low-level SETL had runtime performance comparable to good hand code (whose running time, excluding I/O, was at least 30 times faster than SETL2 running time executed by the standard SETL2 interpreter), only small-scale examples were used. The largest C program was about 10 pages. Low-level SETL lacked procedures, and the type system was highly restricted. The SETL-to-C production rate was too slow - about 24 lines of C per minute on a SPARC 2. Finally, high-level SETL input had to be translated into low-level C input at compilation time.

The hypothesis that productivity improvement will scale up by our methods is based on the methodology and the application domain. Real-time simulation is applied uniformly to each instruction of a program regardless of program size. Program analysis and transformation algorithms have a complex combinatorial nature, which fits our model of productivity.

Over the last few years, I've felt that, finally, there was enough evidence and know-how to put together much of our work into a formal program development methodology that could achieve my goals of 20 years ago. Although I was too sick to carry this work out, I'm happy that my student Deepak Goyal has done it in his Ph.D. thesis, which should be finished some time over the summer. To some of you Deepak's formalisation will be the first real result in transformational programming to come out of my research project. Deepak or I ought to present it at a future meeting. Annie Liu, who's on Deepak's Committee, can tell you more, or you may look at Deepak's home page, which is in the Ph.D. student area of our NYU Dept home page. I think you'll like it.

For anyone interested, Deepak just accepted a research position at IBM TJ Watson starting October 1. Until then, he'll be at NYU.

- [Bloom and Paige, 1995]
- B. Bloom and R. Paige,
Transformational Design and Implementation of a New Efficient
Solution to the Ready Simulation Problem,
Science of Computer Programming 24(3), p. 189-220, 1995.
- [Cai and Paige, 1987]
- J. Cai and R. Paige,
Binding Performance at Language Design Time,
Proceedings of the ACM Conference on Principles of Programming Languages,
p. 85-97, January 1987.
- [Cai and Paige, 1989]
- J. Cai and R. Paige,
Program Derivation by Fixed Point Computation,
Science of Computer Programming 11,
p. 197-261, 1988-89.
- [Cai and Paige, 1993]
- J. Cai and R. Paige,
Towards Increased Productivity of Algorithm Implementation,
Proceedings of ACM SIGSOFT,
in Software Engineering Notes 18(5), p. 71-78, December 1993.
- [Cai and Paige, 1995]
- J. Cai and R. Paige,
Using Multiset Discrimination To Solve Language Processing Problems
Without Hashing,
Theoretical Computer Science 145(1-2), p. 189-228, July 1995.
- [Chang and Paige, 1997]
- C. Chang and R. Paige,
From Regular Expressions to DFA's Using Compressed NFA's,
Theoretical Computer Science 178(1-2), p. 1-36, May 1997.
- [Goyal and Paige, 1997]
- D. Goyal and R. Paige,
The Formal Reconstruction and Improvement of the Linear Time
Fragment of Willard's Relational Calculus Subset,
in Algorithmic Languages and Calculi, p. 382-414,
R. Bird and L. Meertens (eds.), Chapman and Hall, 1997.
- [Keller and Paige, 1996]
- J. P. Keller and R. Paige,
Program Derivation With Verified Transformations -
A Case Study,
Communications in Pure and Applied Mathematics 48(9-10), 1996.
- [Paige, 1981]
- R. Paige,
Formal Differentiation - A Program Synthesis Technique,
UMI Research Press, 1981.
- [Paige, 1982]
- R. Paige,
Applications of Finite Differencing to Database Integrity
Control and Query/Transaction Optimization,
Proceedings of the Workshop on Logical Bases for Databases,
Toulouse, France, December 1982.
Revised version appears in Advances in Database Theory,
Volume 2, p. 171-210,
H. Gallaire, J. Minker and J. M. Nicolas (eds.), Plenum Press, March 1984.
- [Paige
*et al.*, 1985] - R. Paige, R. Tarjan and R. Bonic,
A Linear Time Solution to the Single Function Coarsest Partition
Problem,
Theoretical Computer Science 40(1), p. 67-84, September 1985.
- [Paige and Henglein, 1987]
- R. Paige and F. Henglein,
Mechanical Translation of Set Theoretic Problem Specifications
into Efficient RAM Code - A Case Study,
Journal of Symbolic Computation 4(2), p. 207-232, August 1987.
- Paige and Koenig, 1981
- R. Paige and S. Koenig,
A Transformational Framework for the Automatic Control of
Virtual Data,
Proceedings of the 7th International Conference on Very Large Databases,
p. 306-381, September 1981.
- [Paige and Koenig, 1982]
- R. Paige and S. Koenig,
Finite Differencing of Computable Expressions,
ACM Transactions on Programming Languages and Systems,
4(3) p. 402-454, July 1982.
- [Paige and Yang, 1997]
- R. Paige and Z. Yang,
High Level Reading and Data Structure Compilation,
Proceedings of 24th ACM Conference on Principles of Programming Languages,
p. 456-469, January 1997.

Bob Paige, New York, June 1999