Proofs

This web page is intended for students who want further guidance on how to get started in writing down a proof of a mathematical statement, and what is expected. Writing down a proof of a mathematical statement is a key skill of Foundations of Computer Science, and also various other courses. As a starting-point, you should read Sipser’s textbook, Section 0.3 “Definitions, Theorems and Proofs” (starts on p17).

In understanding what is meant by a valid proof, to some extent we have to learn by studying examples of good proofs. The aim of a mathematical proof is to convince a (human) reader that the claim is true. Proofs are rarely “formal” in the sense of applying an agreed-upon set of rules for deducing one statement from another. Indeed, there are various levels of formality (or precision, or detail) that might be used in the presentation of a proof. To get full credit (if it’s being done as part of an exercise or exam) the presentation should be comparable to a textbook proof, or one that would appear in a published paper. A good proof should convince the reader that he can bet, say, £100 to a penny that the claim is true. You can reasonably expect partial credit for conveying a general idea, the way I present the undecidability of the Post Correspondence Problem in lecture. The aim there was to make the result look plausible: even if you didn’t know that it was an established result, having seen the general idea, you might be more likely to bet that the problem really is undecidable.

Getting started

Suppose you are asked to prove a statement, as part of an exercise or exam question. You may find it useful to start by considering why you should believe it to be true, as discussed in the textbook. If you manage to convince yourself that the statement is true (perhaps by considering examples), you may be able to write up a summary of that general understanding, which would probably constitute an imprecise proof, but would be a good start.

There are also certain standard techniques, and in the context of proving facts about formal languages and computational complexity, you should at least be able to see how to get started on a precise, detailed proof. The following examples show how this might work.

Example. Suppose I say: Prove that if L is a language, then L^*=(L^*)^*. How to start? Note that the superscript * denotes the closure operation, thus L^* is all words that can be constructed by concatenating elements of L.

Since you’re being asked to establish that two languages are the same, it’s pretty safe to start out by saying:

We need to show that

if word w is in L^*, then w∈(L^*)^*, and

if word w is in (L^*)^*, then w∈L^*.

Next there’s a short-cut, which is that we observe that since the closure operation can only add further words to a language, the first of the above two items can be seen to hold. How to prove the second? Once again, the way to continue is really rather automatic: you write down what it means for a word to belong to (L^*)^*, planning to deduce from it that the word belongs to L^*. So you should write something like:

If w∈(L^*)^* then w=w₁w₂…w_n, where all the words w_i belong to L^*.

Setting up this notation is useful. The next point you want to make is that each word w_i is a concatenation of words that belong to L, which you can write down formally as

for all i=1,…,n, w_i=w_i,1w_i,2…w_i,m(i) for non-negative numbers m(i).

We’re aiming for a conclusion that w is a concatenation of words that belong to L, and we can now indeed write down w as

w= (w_1,1w_i,2…w_i,m(1)) (w_2,1w_i,2…w_i,m(2)) … (w_n,1w_i,2…w_i,m(n))

which expresses w as a concatenation of words in L, that is, w∈L^*. General point: notice that it was useful to set up some notation to refer to the subwords that w can be decomposed into.

Another example. consider the exercise: Prove that L_INF is undecidable, where

L_INF={⟨M⟩ : Turing machine M accepts infinitely many words} You may use any undecidability results discussed in lectures.

It should be clear how to get started: you point out that you will reduce from a known undecidable problem. Next, there is a choice to be made regarding which known undecidable problem to use for that purpose. It makes sense to use a problem that “looks like” L_INF, that is, a problem involving encodings of Turing machines, as opposed to, say, the context-free grammar equivalence problem. HALT is a good choice. The next step is automatic: you note that you want design a computable function f that maps any ⟨M,w⟩ to a new (encoding of a) Turing machine ⟨M'⟩ so that M' accepts infinitely many words if any only if M halts on input w.

The construction of f is left as an exercise.