FDR2 User Manual: A.2 Pattern Matching

[Index]

A.2 Pattern Matching

Many of the above examples made use of pattern matching to decompose values. The version of CSP_Mused by FDR 2.1 introduced much better support for pattern-matching; for example, we can write

reverse(<>)    = <>
reverse(<x>^s) = reverse(s)^<x>

rather than

reverse(s) = if null(s) then <> else reverse(tail(s)) ^ <head(s)>

The branches of a function definition must be adjacent in the script, otherwise the function name will be reported as multiply defined.

Patterns can occur in many places within CSP_Mscripts

Function definitions (reverse above)
Direct definitions ‘(x,y)=(7,2)’
Comprehensions ‘{ x+y | (x,y) <- {(1,2),(2,3)} }’
Replicated operators ‘||| (x,y):{(1,2),(2,3)} @ c!x+y->STOP’
Communications ‘d?(x,y)->c!x+y->STOP’

The patterns which are handled in these cases are the same, but the behaviour in the first two cases is different. During comprehensions, replicated operators and communications we can simply discard values which fail to match the pattern: we have a number of such values to consider so this is natural. When a function fails to match its argument (or a definition its value) silently ignoring it is not an option so an error is raised. On the other hand, functions can have multiple branches (as in the case of reverse) which are tried in top to bottom order while the other constructs only allow a single pattern. For example,

f(0,x) = x
f(1,x) = x+1
print f(1,2) -- gives 3
print f(2,1) -- gives an error
print { x+1 | (1,x) <- { (1,2), (2,7) } } -- gives {3}

The space of patterns is defined by

Integer literals match only the corresponding numeric value.
Underscore (‘_’) always matches.
An identifier always matches, binding the identifier to the value.
A tuple of patterns is a pattern matching tuples of the same size. Attempting to match tuples of a different size is an error rather than a match failure.
A simple sequence of patterns is a pattern (<x,y,z>) matching sequences of that length.
The catenation of two patterns is a pattern matching a sequence which is long enough, provided at least one of the sub-patterns has a fixed length.
The empty set is a pattern matching only empty sets.
A singleton set of a pattern is a pattern matching sets with one element.
A datatype tag (or channel name) is a pattern matching only that tag.
The dot of two patterns is a pattern. (A.x)
The combination of two patterns using @@ is a pattern which matches a value only when both patterns do.
A pattern may not contain any identifier more than once.

For example, {}, ({x},{y}) and <x,y>^_^<u,v> are valid patterns. However, {x,y} and <x>^s^t are not valid patterns since the decomposition of the value matched is not uniquely defined. Also (x,x) is not a valid pattern by the last rule: the effect that this achieves in some functional languages requires an explicit equality check in CSP_M.

When a pattern matches a value, all of the (non-tag) identifiers in the pattern are bound to the corresponding part of the value.

The fact that tags are treated as patterns rather than identifiers can cause confusion if common identifiers are used as tags. For example, given

channel n : {0..9}
f(n) = n+1

attempting to evaluate the expression f(3) will report that the function \ n @ n+1 does not accept the value 3. (It accepts only the tag n.) This conflict between pattern matching and tags commonly manifests itself when parameterising a process with channels:

channel in,out,mid:Int

COPY(in,out) = in?x -> out!x -> COPY(in,out)

BUFF  = COPY(in,out)
BUFF2 = COPY(in,mid) [|{|mid|}|] COPY(mid,out)  -- error!

In the definition of COPY, the parameters are pattern-matched to be the channels in and out and so are not new identifiers that get bound to the actual arguments. So, the error with the definition of BUFF2 is that COPY is being applied outside its domain — in the first application, mid does not match the pattern out, since out is also a channel name so it matches only that channel.

Only names defined as tags are special when used for pattern-matching. For example, given

datatype T = A | B
x = A
f(x) = 0
f(_) = 1
g(A) = 0
g(_) = 1

then f is not the same as g since f(B) is 0 while g(B) is 1.

The singleton-set pattern allows us to define the function which picks the unique element from a set as

pick({x}) = x

This function is surprisingly powerful. For example, it allows us to define a sort function from sets to sequences.

sort(f,a) =
  let
    below(x)  = card( { y | y<-a, f(y,x) } )
    pairs     = { (x, below(x)) | x <- a }
    select(i) = pick({ x | (x,n)<-pairs, i==n })
  within < select(i) | i <-<1..card(a)> >

where the first argument represents a ‘<=’ relation on the elements of the second. Because pick works only when presented with the singleton set, the sort function is defined only when the function f provides a total-ordering on the set a.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

This document was generated by Phil Armstrong on May 17, 2012 using texi2html 1.82.