Tabulation for Type Hackers

If a hacker is someone who write programs by debugging the empty file, then a type hacker is someone who writes programs by refining instances of the type class

        class Program a where
                program :: a
Such a person might end up writing a program like this one.

This document is (intended to be) a valid Hugs 1.3c literate script, which describes in a general sort of way a family of functions which memoise functions, in a purely functional way. (Note that it uses multiple parameter type classes which are not necessarily available in later releases of Hugs.)

This document is not yet stable. Do not expect it not to be improved. I would appreciate any suggestions you might have about the style.


> module Memo where

> import Array          -- only used for the example of tabulating by arrays

Memoisation

A computable function might be implemented - in the usual way - by describing an algorithm which when presented with its arguments calculates the result. The disadvantage of this that it might prove necessary to perform the same potentially expensive calculation several times if the function is applied to the same argument several times over.

This inefficiency can be avoided in the case of finite mappings which can always be represented by a table of the possible results of the function, indexed argument. This will not work for arbitrary functions, nor is it necessarily a good idea to evaluate all possible applications of a function of a large argument type -- such as functions of strings. Fortunately, lazy evaluation avoids both of these problems. I think I was first pointed at the idea of lazy memo functions[Hughes85] by John Hughes, back in the early eighties. I have finally done something with the idea.

The idea is to translate a function into a table and back again:


> type Tabulation function table = (table -> function, function  -> table)

in such a way that the composition of the two transformations

> type Memo function = function -> function

> memoise :: Tabulation function table -> Memo function
> memoise (lookUpIn, tabulate) = lookUpIn . tabulate

is the identity on functions of a given type. In general, the table is infinite, but only a finite amount of it will be explored, and so only a finite number of its entries will be calculated and stored. However, once an entry has been made in the table, that application of the function will not be calculated again. The only cost of applying the memoised function to that argument is the cost of looking up the entry in its table.

Thus, for example, if it is possible to tabulate functions of type

        Int -> a
by a table of type
        (Comp IntTo Result a)
it makes sense to apply
        memoise (translation :: Tab (Int -> a) (Comp IntTo Result a))
to a function of one integer argument.

The obvious silly example is the Fibonacci function, which is the fixed point


> fib :: Integral a => Int -> a
> fib = fib_kernel fib

of the recursion

> fib_kernel :: Integral a => (Int -> a) -> (Int -> a)
> fib_kernel f    0  = fromInt 1
> fib_kernel f    1  = fromInt 1
> fib_kernel f (n+2) = f n + f (n+1)

? fib 20
10946 :: Int
(631638 reductions, 765567 cells)
It can be memoised by

> mfib = memoise (translation :: Tabulation (Int -> a) (Comp IntTo Result a))
>                (fib_kernel mfib)

Thus:
? mfib 20
10946 :: Int
(5558 reductions, 11365 cells)
and of course, it is even faster the next time...
? mfib 20
10946 :: Int
(186 reductions, 405 cells)

Memoisable functions

There is a class of functions which can be represented by a table of a given type, and the translation (for notational convenience) is implemented by a Tabulation pair of functions.


> class Representation function table where
>       translation :: Tabulation function table

Note that it would not do to put the definition of
        memo :: Memo function
        memo = memoise translation
into the class Representationi, because one wants it to be indexed by both the function type and by the tabulation.

The simplest case is that a constant value - a function of no arguments - can always be tabulated just by recording it:


> newtype Result a = Result a

> unResult :: Result a -> a
> unResult (Result x) = x

> instance Representation result (Result result) where
>       translation = (unResult, Result)

If we know that there is a way of tabulating a function of one variable, if that argument is Representable by a given tabulation,

> class Representable argument tabulation where
>       tabulate :: (argument -> result) -> tabulation result
>       lookUpIn :: tabulation result -> (argument -> result)

then there is a systematic way of eliminating one argument of that type from the function to be tabulated

> instance (Representable x xTo, 
>	    Representation function (tabulation result)) =>
>             Representation (x -> function) (Comp xTo tabulation result) where
>       translation = ((lup .) . lookUpIn . unComp, Comp . tabulate . (tab .))
>                     where (lup, tab) = translation

The Comp type constructor is needed to distinguish this instance from the one with a Result, and to ensure that
        (Comp aTo bTo) 
is a type constructor of one type argument

> newtype Comp f g a = Comp (f (g a))

> unComp :: Comp f g a -> f (g a)
> unComp (Comp x) = x

This is sufficient to deal with Curried functions of arbitrarily many arguments.

The example tabulates the Ackerman function:

     a    0   y    = y + 1
     a (x+1)    0  = a x 1
     a (x+1) (y+1) = a x (a (x+1) y)
first of all, here is the general recursion scheme:

> ack_kernel a    0   y    = y + 1
> ack_kernel a (x+1)    0  = a x 1
> ack_kernel a (x+1) (y+1) = a x (a (x+1) y)

whose fixed point is the expected recursive implementation

> ack = ack_kernel ack

but whose memoisation is

> mack = memoise (translation :: Tabulation (     Int ->      Int ->        a)
>                                           (Comp IntTo (Comp IntTo Result) a))
>                (ack_kernel mack)

? ack 3 5
253 :: Int
(1739941 reductions, 2121877 cells)
? mack 3 5
253 :: Int
(237025 reductions, 489186 cells)
? mack 3 5
253 :: Int
(137 reductions, 282 cells)

Constant functions

Functions of an argument drawn from the Unit type are essentially constants. For later convenience, the type is given by a constructor


> data Unit a = Unit

> value :: (Unit x -> a) -> a
> value f = f Unit

> lift :: a -> (Unit x -> a)
> lift x Unit = x

Functions of a Unit argument can be tabulated by a single value, which is the result.

> newtype Id a = Id a

> unId :: Id a -> a
> unId (Id x) = x

> instance Representable (Unit a) Id where
>       lookUpIn = lift . unId
>       tabulate = Id . value

In passing, let me remark that a function from (Id a) might as well be from a, and can be tabulated as such:

> instance Representable a b => Representable (Id a) b where
>       lookUpIn = (. unId) . lookUpIn
>       tabulate = tabulate . (. Id)

This instance is used in unwinding the representation of lists as a sums of products. (Id is of course isomorphic to Result, but they are kept apart for mnemonic reasons.)

Functions of a single integer argument

The simplest tabulation of a function of a natural number argument is just a list of it values,


> instance Representable Int [] where
>       tabulate f            = map f [0..]
>       lookUpIn x n | n >= 0 = x !! n

and that would be a perfectly good way of tabulating the Fibonacci function discussed earlier

> nfib = memoise (translation :: Tabulation (Int -> a) (Comp [] Result a)) 
>                (fib_kernel nfib)

? nfib 20
10946 :: Int
(12970 reductions, 15378 cells)
although it only works for natural number arguments. It would be possible to use two lists, or some interleaving, but there is also the problem of the extra O(n) time taken to look up the value of f n. In a world with constant time arrays, of course, it possible to eliminate this at the cost of deciding in advance on the bounds of the array. In the case of small types, such as Char, perhaps
  instance (Bounded i, Ix i) => Representable i (Array i) where
        tabulate f = array bs (zip xs (map f xs)) 
                        where xs = range bs
                              bs = (minBound, maxBound)
        lookUpIn a x = a!x
would be reasonable, but not in the case of Int, or of course an unbounded type like Integer. In that case one would have to make an arbitrary limitation:

> instance Representable Int (Array Int) where
>       tabulate f   = array bs (zip xs (map f xs)) where xs = range bs
>                                                         bs = (0,100)
>       lookUpIn a n = a!n

> afib = memoise (translation :: Tabulation (      Int ->            a)
>                                           (Comp (Array Int) Result a))
>                (fib_kernel afib)

? afib 20
10946 :: Int
(6582 reductions, 8585 cells)

Here is a more intricate tabulation which guarantees a bound of O(log n) on access to an arbitrary value. It tabulates the function in a potentially infinite binary tree:


> data IntTo a = IntTo (IntTo a) a (IntTo a)

There is only one constructor for the type because empty trees will never be inspected in the course of a search for a value and so need never be constructed. The tree to be constructed will consist of two infinite spines, the very leftmost and the very rightmost, from which will depend a number of finite perfectly balanced binary trees.

> instance Representable Int IntTo where

(Aside -- I would like this to be
  instance Integral a => Representable a IntTo where ...
and only left this out because it makes overlapping instances.)

>       tabulate f 
>           = IntTo (linf (back unit)) (f 0) (rinf unit)
>             where linf (a,b) = IntTo (linf (left (a,b))) (f a) (fin (a,b))
>                   fin  (a,b) = IntTo (fin am) (f m) (fin mb)
>                                where (am,m,mb) = mid (a,b)
>                   rinf (a,b) = IntTo (fin (a,b)) (f b) (rinf (right (a,b)))

The inorder traversal of tabulate f is
        map f [..., -2, -1, 0, 1, 2, ...]
which is made up of the concatenation of the inoder traversals of
        ...fin (-1,0),(f 0),fin(0,1),(f 1),fin(1,3),(f 3),fin(3,7),(f 7),...
Note that any call of fin(x,x+1) in tabulation represents a tree containing no values of the function, and since such trees will never be explored these calls will never be evaluated. There is therefore no need for an empty alternative in the data type for the tree.

The values are extracted from the tree by


>       lookUpIn (IntTo l f0 r) n
>                 = pivot 0 (linf l (back unit)) f0 (rinf r unit)

>         where linf (IntTo l fa r) (a,b)
>                 = pivot a (linf l (left (a,b))) fa (fin r (a,b))

>               fin  (IntTo l fm r) (a,b) 
>                 = pivot m (fin l am) fm (fin r mb) where (am,m,mb) = mid (a,b)

>               rinf (IntTo l fb r) (a,b)
>                 = pivot b (fin l (a,b)) fb (rinf r (right (a,b)))

>               pivot p lt eq gt 
>                 = case compare n p of LT -> lt; EQ -> eq; GT -> gt

which follows an exactly similar pattern.

The range of arguments for which the values of a finite tree is defined is given by a pair of integers, (a,b), and the finite tree between a and b (which always differ by a power of two) tabulates

        map f [a+1..b-1]
by dividing the interval into two equal parts and a midpoint:

> mid :: Integral a => (a,a) -> ((a,a), a, (a,a))
> mid   (a,b) = ((a,m), m, (m,b)) where m = (a + b) `div` 2

The successive finite trees along the infinite spines of the table tabulate intervals of successively doubling length, leaving a gap of one value to be used to mark the position along the spine.

> right, left, back :: Integral a => (a,a) -> (a,a)
> right (a,b) = (b, b + 2 * (b - a))
> left  (a,b) = (a + 2 * (a - b), a)
> back  (a,b) = (-b, -a)

The efficiency can be improved (by a constant amount, and only for large values of the argument) by changing the size of the first finite sized trees

> unit :: Integral a => (a,a)
> unit = (0,1)

Functions of a single character argument

The tabulation of a function of characters could be accomplished by any of the means which would have done for a function of an integer by translating the character into a corresponding integer.


> instance Representable Int f => Representable Char f where
>       lookUpIn = (. ord) . lookUpIn
>       tabulate = tabulate . (. chr)

Functions of a pair of arguments

Functions of pairs can be tabulated provided functions of each of the two arguments could be tabulated -- the easiest way is to curry the function and tabulate it almost as it were a function of two arguments.


> instance (Representable x xTo, 
>           Representable y yTo) => Representable (x,y) (Comp xTo yTo) where
>       lookUpIn = uncurry . (lookUpIn .) . lookUpIn . unComp
>       tabulate = Comp . tabulate . (tabulate .) . curry

Functions of an argument of a sum type

An element of a sum type is (detectably) of one of two types,


> data Sum a b = Fst a | Snd b

> unFst :: Sum a b -> a
> unFst (Fst x) = x

> unSnd :: Sum a b -> b
> unSnd (Snd y) = y

and so a function from a sum type is effectively a pair of functions, which can be taken apart

> split :: (Sum a b -> c) -> (a -> c, b -> c)
> split f = (f . Fst, f . Snd)

and reconstructed

> join :: (a -> c, b -> c) -> (Sum a b -> c)
> join (f, g) = jfg where jfg (Fst a) = f a
>                         jfg (Snd b) = g b

in a polymorphic manner. Such a function having been teased apart can be treated as two parallel parts

> par :: (a -> c) -> (b -> d) -> ((a,b) -> (c,d))
> (f `par` g) (x,y) = (f x, g y)

Since a function from a sum type is effectively a pair of functions, it can be tabluated as a pair of tables, one for each of its two components.

> data Pair f g a = Pair (f a, g a)

> unPair :: Pair f g a -> (f a, g a)
> unPair (Pair x) = x

Again the type constructor Pair is purely for labelling, so that (Pair f g) is a type constructor with one argument.

To tabulate a function from a sum type, it is split into components which are tabulated in parallel; and vice versa.


> instance (Representable a f, Representable b g) => 
>			Representable (Sum a b) (Pair f g) where
>       lookUpIn = join . (lookUpIn `par` lookUpIn) . unPair
>       tabulate = Pair . (tabulate `par` tabulate) . split

By way of illustrating the code for sums and pairs, here is a rather weird implementation of Ackerman's function


> fack :: (Sum Int (Int,Int)) -> Int
> fack  = funny_kernel fack

> funny_kernel :: ((Sum Int (Int,Int)) -> Int) -> ((Sum Int (Int,Int)) -> Int)
> funny_kernel f (Fst     x    ) = f (Snd (x,1))
> funny_kernel f (Snd (  0,y  )) = y + 1
> funny_kernel f (Snd (x+1,  0)) = f (Fst x)
> funny_kernel f (Snd (x+1,y+1)) = f (Snd (x, f (Snd (x+1, y))))

? fack (Snd (3,5))
253 :: Int
(1443370 reductions, 1910178 cells)

> mfack :: (Sum Int (Int,Int)) -> Int
> mfack  = memoise (translation :: 
>                   Tabulation (      Sum  Int        (Int,  Int) ->        a)
>                              (Comp (Pair IntTo (Comp IntTo IntTo)) Result a))
>                  (funny_kernel mfack)

? mfack (Snd (3,5))
253 :: Int
(234647 reductions, 489181 cells)

Functions of a list

The most straightforward approach is to say that since a list is the sum of a Unit and a Pair it can be tabulated by a Pair of an Id and a Comp.


> data ListTo f a = ListTo (Pair Id (Comp f (ListTo f)) a)

> unListTo :: ListTo f a -> Pair Id (Comp f (ListTo f)) a
> unListTo (ListTo x) = x

One could go directly for the tabulation
  instance Representable a b => Representable [a] (ListTo b) where
        tabulate (ListTo (Pair (Id l, Comp r))) 
                = join (l, lookUpIn . lookUpIn r)
                  where join (e,f)    []  = e 
                        join (e,f) (x:xs) = f x xs
        lookUpIn f 
                = split (f []) (tabulate (tabulate . (f .) . (:)))
                  where split a as = ListTo (Pair (Id a, Comp as))
but that seems a bit mysterious, and rather too much like real programming. Here, instead is a datatype isomorphic to [a], but using the Sum and Pair constructors of this implementation.

> data List a = List (Sum (Unit a) (Id a, List a))

> unList :: List a -> Sum (Unit a) (Id a, List a)
> unList (List x) = x

The things that correspond to [], (:) and foldr are

> nilList :: List a
> nilList = (List (Fst Unit))

> consList :: a -> (List a) -> (List a)
> consList a as = List (Snd (Id a, as))

> foldrList :: (a -> b -> b) -> b -> List a -> b
> foldrList f e (List (Fst Unit))       = e
> foldrList f e (List (Snd (Id x, xs))) = f x (foldrList f e xs)

The point of that is that a tabulation of functions of lists follows from the standard tabulation of functions of the corresponding List arguments.

> instance Representable (List a) f => Representable [a] f where
>       lookUpIn = (. (foldr consList nilList)) . lookUpIn
>       tabulate = tabulate . (. (foldrList (:) [])) 

The tabulation of functions of List is a straightforward consequence of the implementation of List as a Sum of Pairs:

> instance (Representable a b) => Representable (List a) (ListTo b) where
>       lookUpIn = (. unList) . lookUpIn . unListTo
>       tabulate = ListTo . tabulate . (. List)

So, for example,

> memo_str :: Memo (String -> a) 
> memo_str = memoise (translation :: Tabulation (      String ->            a) 
>                                               (Comp (ListTo IntTo) Result a))

will memoise functions of strings, and

> memo_lpzz :: Memo ([(Int,Int)] -> a) 
> memo_lpzz = memoise (translation :: 
>		       Tabulation (      [      (     Int,  Int)] ->       a)
>                                 (Comp (ListTo (Comp IntTo IntTo)) Result a))

will memoise functions of pairs of Ints.

Here is a really silly function which calculates the Fibonacci number of the length of its argument,

	sfib = fib . length

> sfib = sfib_kernel sfib

> sfib_kernel s      []  = 1
> sfib_kernel s   (x:[]) = 1
> sfib_kernel s (x:y:zs) = s (y:zs) + s zs

An instance of it restricted to strings might be memoised by

> msfib = memoise (translation :: Tabulation (      [      Char] ->      a) 
>                                            (Comp (ListTo IntTo) Result a))
>                 (sfib_kernel msfib)

? sfib  "abcdefghijklmnopqrstuvwxyz"
196418 :: Int
(1178507 reductions, 1374955 cells)
? msfib  "abcdefghijklmnopqrstuvwxyz"
196418 :: Int
(207989 reductions, 424354 cells)
although it has to be said that the overhead is sufficiently large that memoisation in this way might well not be justified:
? sfib  "abcdefghijklmnopqrst"
10946 :: Int
(65675 reductions, 76644 cells)
? msfib  "abcdefghijklmnopqrst"
10946 :: Int
(144308 reductions, 296759 cells)
If this surprises you it might be that you think that the memoisation should be checking to see whether the function has previously been applied to the the same storage -- rather than having to check the value of the string.

I could do with a more sensible example of a function of strings which would benefit from being memoised. Perhaps one of those common sub-sequence things. Or a really sensible one if there is one.

`Remember me in roses'

Here is an utterly weird one: I bet it never occurred to you to memoise functions of rose trees.


> data Rose a = Rose a [ Rose a ]

> instance (Representable a b) => Representable (Rose a) (RoseTo b) where
>       lookUpIn (RoseTo x) (Rose a ts) = lookUpIn x (a,ts)
>       tabulate f = RoseTo (tabulate (\(x,ts) -> (f (Rose x ts))))

> data RoseTo f a = RoseTo (Comp f (ListTo (RoseTo f)) a)

> unRoseTo :: RoseTo f a -> (Comp f (ListTo (RoseTo f)) a)
> unRoseTo (RoseTo x) = x

Of course that might be because there are no functions of that type which are realistically memoisable.

How realistic is this mechanism?

Whilst it is true that the only cost of reapplying a memoised function is the cost of looking up the precomputed result, there is to balance that gain the additional cost of the first application -- which must as well as being calculated also be entered in a table and looked up. Moreover, and more significantly, there is a cost in space consumed by the table which grows as a memoised function is applied to more and more different arguments.

When space is in short supply there would be no harm to the value calculated by a program (as opposed to the time it takes) in abandoning the tabulation of a memoised function, trading back the space used against extra time taken. The mechanism described here has no way of abandoning any of the tabulation of a memoised function - which [Hughes85] assumed would be a function of the garbage collector. However the whole memoisation is discarded as soon as all references to the memoised function itself are lost.

Reference

[Hughes85]
R. J. M. Hughes. Lazy Memo-functions. In Proceedings Conference on Functional Programming and Computer Architecture, pages 129-146, Nancy, 1985. Springer-Verlag.

Geraint Jones