Notes on Contention Resolution

Leslie Ann Goldberg

2 Oct 2000

These notes contain a survey of some work on contention resolution in multiple-access channels. I have written the notes in connection with EPSRC Research Grant GR/L60982, Design and Analysis of Contention-Resolution Protocols. The notes are not currently up to date (they were last updated in October 2002) but I may update them in the future.

A multiple-access channel is a broadcast channel that allows multiple users to communicate with each other by sending messages onto the channel. If two or more users simultaneously send messages, then the messages interfere with each other (collide), and the messages are not transmitted successfully. The channel is not centrally controlled. Instead, the users use a contention-resolution protocol to resolve collisions. Thus, after a collision, each user involved in the collision waits a random amount of time (which is determined by the protocol) before resending.

Although the most familiar multiple-access channels are local-area networks (such as the Ethernet network) which are implemented using wire, multiple-access channels are now being implemented using a variety of technologies including packet-radio, fibre-optics, free-space optics and satellite transmission (see [GFL87]). These multiple-access channels are used for communication in many distributed systems such as distributed computer networks.

Research in contention resolution typically relies on the slotted time assumption [1] to simplify the analysis. The two models [2] considered are the following:

There is no single definition which is used to determine whether or not a contention-resolution protocol is ``good'' (or ``stable'' [4] ), but the basic idea is that a protocol should allow sufficiently frequent retransmissions (so that channel time is not wasted while users wait) but it should not allow too-frequent retransmissions (which waste channel time with collisions). Typically, one models the execution of the protocol as a Markov chain. If the protocol is good (for a given arrival rate lambda ), the corresponding Markov chain will be recurrent (with probability 1, it will eventually return to the empty state in which no messages are waiting.) Otherwise, the chain is said to be transient (and we also say that a protocol is transient). Note that transience is a very strong form of instability. Informally, if a protocol is transient then with probability 1 it is in "small" states (states with few backlogged messages) for only a finite number of steps. Another way to measure the success of a protocol is to measure its capacity. A protocol is said to achieve throughput lambda if, when it is run with input rate lambda, the average success rate is lambda. The capacity of the protocol [EH98] is the maximum throughput that it achieves.

Most research on collision-resolution protocols focuses on simple protocols because

Several natural classes of protocols have been defined.

Chlebus has written a nice survey of this topic and also of communication in radio networks. See [Chl00].

Some Results

Age-based protocols: Kelly [Kel85], Kelly and MacPhee [KM87], and MacPhee [MacP89] considered age-based protocols in the queue-free model. They showed, for any given age-based protocol, how to calculate a critical arrival rate such that the expected number of successful transmissions is infinite if and only if the arrival rate is at most the critical arrival rate. They showed that the critical arrival rate is 0 if and only if p0+ ... + pt is omega(log(t)). Proof Method. Ingenoso [Ing93] also considered age-based protocols in the queue-free model. He showed that if p0, p1,... is monotonically decreasing then the protocol is unstable in the sense that the corresponding Markov chain is transient [5]. Proof Method.

Backoff protocols:

Acknowledgement-based protocols: Goldberg, Jerrum, Kannan and Paterson [GJKP00] showed that no acknowledgement-based protocol is recurrent for lambda >=.530045 and that the capacity of every acknowledgement-based protocol is at most .530045. Proof Method.

Almost Acknowledgement-based protocols: Raghavan and Upfal's protocol [RU95] applies to the queueing model. It is almost acknowledgement-based in the sense that the only additional information about the multiple-access channel that the N users are given is a reasonably good estimate of the logarithm of N (one that is correct up to a constant factor). If the arrival rate is less than 1/10, then the expected waiting time of each message is O(log N). The idea motivating the protocol is that the reason that backoff protocols have high message-waiting times is that backoff protocols do not handle catastrophic events well. In order to handle these unlikely but harmful events (such as the simultaneous arrival of many messages) users of Raghavan and Upfal's protocol observe the history of their successes and, when it seems that the channel is excessively clogged, a user enters a reset state for a while in which it does not often try to send (so the channel can clear). Description of protocol. Proof Method. Raghavan and Upfal's protocol can be used in systems of more than one multiple access channel. Goldberg, MacKenzie, Paterson and Srinivasan's paper [GMPS98] gives an queue-free model protocol and a queueing-model protocol. While the queue-free-model protocol is not acknowledgement-based (the protocol requires the users to share a clock), the queueing-model protocol which simulates it is almost acknowledgement-based in the sense that the only additional information about the multiple-access channel that the N users are given is some ( any) upper bound on N. When the arrival rate is less than 1/e, the expected average message waiting-time is O(1). The analysis in the queueing case allows for the possibility that the N users might start and stop running the protocol over time, but it requires that each time a user starts running the protocol, it continues to run the protocol for a while. Description of Protocols. Proof Method.

Full-Sensing Protocols: Mosely and Humblet [MH85] discovered a full-sensing protocol with capacity .48776. This protocol is a "tree protocol" in the sense of Capetanakis [C79] and Tsybakov and Mikhailov [TM78] For a simple analysis of the protocol, see [V86]. Vvedenskaya and Pinsker have shown how to modify Mosely and Humblet's protocol to achieve an improvement in the capacity (in the seventh decimal place) [VP83]. Much work has gone into determining upper bounds on the capacity that can be achieved by a full-sensing protocol. The current best result is due to Tsybakov and Likhanov [TL87] who have shown that no protocol can achieve capacity higher than .568. (For more information, see [EH98] [GFL87] [MP93] [RS90].) In the full-sensing model, one typically assumes that messages are born at real "times" which are chosen uniformly from the unit interval. Recently, Loher [L98a] [L98b] has shown that if a protocol is required to respect these birth times, in the sense that packets must be successfully delivered in their birth order, then no protocol can achieve capacity higher than .4906. Intuitively, the "first-come-first-served" restriction seems very strong, so it is somewhat surprising that the best known algorithm without the restriction (that of Vvedenskaya and Pinsker) does not beat this upper bound. The algorithm of Humblet and Mosely satisfies the first-come-first-served restriction.

Some Protocols

Goldberg and MacKenzie's Multi-Channel Protocol: The variant of polynomial backoff studied by Goldberg and MacKenzie is actually the non-geometric variant discussed in footnote [6]. This variant is better for practical implementation because it does not require as many random numbers to be generated as the geometric variant.

The N-user K-channel protocol studied by Goldberg and MacKenzie is as follows: Each user keeps track of the number of times it has failed on each channel. On each step it checks which channels it would have sent on, had it been running K independent copies of polynomial backoff. If it would have sent on exactly one channel, then it does so. Otherwise, it assumes that all of the messages that it would have sent collided (and it doesn't send any messages).

Goldberg, MacKenzie, Paterson and Srinivasan's Protocols:

Queue-Free-Model Protocol: The model considered is the same as the queue-free model described in these notes except that the users are assumed to share a clock. That is, they all agree about which time-step is being executed. The synchronisation allows the users to use different time steps for different purposes.

The protocol itself is simple. We construct an explicit, easily computable collection {Si,t : i,t = 0,1,2,..} of finite sets of nonnegative integers Si,t where for all i and t, every element of Si,t is smaller than every element of Si+1,t A message born at time t which has made i (unsuccessful) attempts to send to the channel so far, picks a time w uniformly at random from Si,t, and tries using the channel at time w. If it succeeds, it leaves the system. Otherwise, it increments i and repeats this process.

The sets Si,t are constructed as follows: Imagine an infinite tree with a countably infinite number of leaves ordered left-to-right at level 0. The parents of the leaves are at level 1, and their parents are at level 2, and so on. Each internal node has 4 Delta^2 children (for an appropriately chosen constant Delta, which depends upon lambda). Each node of the tree will be given a "trial set" of time-steps. (If the node is at level i, then its trial set has size b (4 Delta)^i, for a constant b, which is chosen to be sufficiently large.) During the lifetime of the protocol, a message is regarded as residing at some node of the infinite tree. Initially, a message at time-step t resides at the left-most leaf whose trial set contains all nodes which are greater than t. The message picks a time-step w uniformly at random from the trial set of the leaf where it resides and tries to use the channel at time w. If the message fails, then it moves to the parent of its current home. It then picks a random time-step uniformly at random from the trial set of its new home and repeats. The trial sets at level i of the tree consist of those time-steps which, when written in base Delta, have zeros in their i least significant bits, and a non-zero in their next least significant bit. The trial-sets are assigned to the nodes in order. That is, the leftmost leaf of the tree is given the smallest b "level 0" time-steps, the next leaf is given the next smallest b "level 0" time-steps, and so on. The trial set of a node v at level i consists of the b (4 Delta)^i smallest time-steps which are greater than all of the time-steps in the trial sets of v's children. It turns out that all of the time-steps in v's trial sets are larger than the time-steps in the trial sets of v's left siblings.

Queueing Model Protocol: The basic idea is to simulate the queue-free-model protocol described above with N users. If the N users are synchronised (that is, they share a clock) the simulation is straightforward, but it involves a few data structures issues (because each user is responsible for a potentially unbounded number of messages, yet it must manage them in constant time at each time step.) The basic solution is for each user to maintain, for each level of the tree, a linked lists of its messages at that level. At every "level i" time-step, the user calculates the probability that at least one of these messages tries to send, and sends a message if one tries to send. Technically, if it finds that more than one of the messages under its control tries to send, then it should send a dummy message to the channel (to ensure that the simulation is faithful to the original protocol). It is not necessary to send dummy messages, however. This follows from the fact that the queue-free-model protocol is deletion resilient, which essentially means that the evolution of the protocol from any given state s stochastically dominates the evolution of the protocol from any state s' which is the same as s except that some messages have been removed. Let P be a protocol running on N completely synchronised users which simulates the synchronised queue-free protocol for N^2 - 1 steps and then skips a step and then continues. (Inputs might arrive during the skipped step.) We will refer to P below.

Since the users in the queueing model are not synchronised (they do not share a clock), it is necessary to build the synchronisation into the protocol itself. Thus, the structure of the protocol is as follows. Most of the time, the users are synchronised and are simulating the queue-free-model protocol (in fact, they are simulating P). The users occasionally enter a synchronising phase to make sure that the clocks are synchronised (or to resynchronize after a user enters the system). The synchronisation phase has some probability of (undetectably) failing, and thus it must be repeated periodically to guarantee constant expected delay. The synchronising phase of the protocol is somewhat complicated, because it must synchronise the users even though communication between users can only be performed through acknowledgements (or lack thereof) from the multiple-access channel. A further complicating factor is that the synchronising phases have to be robust, in the sense that the model allows users to start and stop, so some users may miss synchronisation phases, others may try to start synchronising in the middle of an ongoing synchronisation phase and so on.

For the purposes of description, we assume that all of the users know N, but it suffices for N to be an upper bound on the number of users. W is taken to be 12 N^4. In the normal state a user maintains a buffer B of size N^7 and a queue Q of unbounded length. B and Q contain messages to be sent, and when a message is generated it is put into B. For each message m in B the user maintains a variable trial(m) which contains the next step on which the user will attempt to send m. The step trial(m) will be chosen using protocol P. When P is ``skipping a step'' the protocol will take the opportunity to try to send some messages from Q: at such steps, with probability 1/(3N), the user attempts to send the first message in Q. Each user also maintains a list L which keeps track of the results (either ``failure'' or ``success'') of the most recent message sending attempts from Q, up to N^2 of them. A user goes into a synchronising state if a message has remained in the buffer for N^7 steps or if L is full (contains N^2 results) and only contains failures. It also goes into a synchronising state from time to time even when these events do not occur. (It synchronises if it has been simulating P for at least N^{40} steps, and it synchronises with probability N^{-30} on any given step.) If the user does go into a synchronising state, it transfers all messages from B to the end of Q. In the synchronising state, a user could be in one of many possible stages, and its actions depend on the stage that it is in. It will always put any generated messages into the queue. Also, it only sends dummy messages in the synchronising state. (The dummy messages are used for synchronising. Real messages that arrive during the synchronisation phase must wait until the next normal phase to be sent.) The various synchronisation stages are as follows (a user goes through these stages in order).

The length of each of these stages is very important, both in terms of achieving a high probability of synchronisation and a high level of robustness. The high probability of synchronisation is achieved by making the ``preliminary'' stages (i.e., JAMMING, FINDING LEADER, and ESTABLISHING LEADER) of length Theta(W) (this is long enough to guarantee all users in a normal state will detect a synchronisation), and the ``synchronising'' stages (i.e. SETTING CLOCK, COPYING CLOCK, and WAITING) of length Theta(WN^2) (this gives users enough time to determine the leader's clock modulo 4W, with high probability). The high level of robustness is achieved by the following properties: The differing lengths of time for the ``preliminary'', ``synchronising'' and POLLING stages, and the fact that only the POLLING stage could cause another synchronisation to occur, guarantee that bad events such as separate groups of users continually interrupting each other's synchronisation cannot occur, even when up to N users are starting at different times (and stopping periodically). Whenever a user joins the multiple-access channel, it starts the protocol JAMMING with L empty.

Raghavan and Upfal's Protocol: Each user has a message buffer of size O(log(N)) and a queue. During the protocol the user is either in the normal state or in the reset state. During each step in which the user is in the normal state, it places any new messages in the buffer. Then it chooses a random slot of the buffer. If the slot contains a message, then the user tries to send that message. Otherwise, with probability 1/N^2, it tries to send the message at the head of the queue. If

then the user moves all messages in the buffer to the end of the queue and switches to the reset state for a while. During each step in which the user is in the reset state, it places any new messages in the queue. With probability 1/N^2, it tries to send the message at the head of the queue.

Some Proof Methods

Aldous's Method: Aldous's instability proof for the binary exponential backoff protocol is based on the following idea. Suppose that at a given time slot it is the case that for every ``small'' positive integer b, there are ``many'' messages which have been sent unsuccessfully b times. (This event occurs with some small, but positive probability.) Aldous shows that when this event occurs, there is a positive probability that the system gets into a worse and worse state with more and more messages building up. His argument is essentially a potential function (Lyapounov function/supermartingale) argument in which the potential of the system at a given time-slot is taken to be the expected number of sends during the time-slot. The proof shows that the potential of the system tends to infinity as time goes on.

More details: Define a function L(t) which is Theta(log(t)). The potential at t is said to be ``small'' if it is at most (lambda/2) L(t) and ``large'' otherwise. A state is said to be ``good'' if, for all small enough b, at least lambda 2^b messages have backoff counter b (note that this is the number of messages with backoff counter b in the equilibrium state of an externally blocked channel).
Theorem: Given a good state at time t0, with positive probability, all following states have large potential.
Proof: Pick any time t >= t0 and show that the probability that t is the first step with small potential is very small (small enough to sum over t). In particular,

Note that the particular potential function that Aldous uses can only be used to show instability for protocols for which the proportion of time-slots in which successful transmissions occur is vanishingly small.

Goodman, Greenberg, Madras and March's Method: The proof uses the following theorem [11]:
Theorem: Suppose that epsilon is a positive constant and y(t) is a discrete time stochastic process on a countable state space. Let T be a stopping time [12] with respect to y(t). Suppose f(t) is non-negative and

E(f(t+1) - f(t) | y(0),...,y(t), t < T) <= -epsilon.
Then E(T) <= f(0)/epsilon.

The Lyapounov function f is defined to be the sum of

The stopping time T is chosen to be the first time that the system returns to the start state. Goodman et al. then show that for every state y other than the start state,
(*) E(f(t+1) - f(t) | y(t)=y) <= -epsilon.
Using the theorem, this implies that E(T) <= E(f(1) | f(0) = start state)/epsilon, which is finite.

The proof of (*) proceeds in two parts:

Goldberg, Jerrum, Kannan and Paterson's Method:

Goldberg and MacKenzie's Method: In order to simplify the analysis, Goldberg and MacKenzie introduce a "step counter" si,j for each user i and channel j. If the backoff counter bi,j for user i and channel j is incremented then si,j is set to the ceiling of (bi,j +1)^alpha. On every step, user i decides to send on channel j with probability equal to the inverse of si,j, and then si,j is decreased by 1.

The general approach is that of Hastad et al (see below) with potential function

f(s) = Sum_(1<=i<=N)Sum_(1<=j<=K) ( qi,j + (bi,j+1)^(alpha + (1/2)) - si,j^(1-1/(4alpha)) ).
Thus the goal is to prove that the corresponding Markov chain is positive recurrent and that
(*) E(lim_(t->infinity)(1/t) Sum_(1<=i<=t) f(t)) is finite.

Once it is proved that the Markov chain is ergodic (positive recurrent) and strictly stationary (the probability distribution remains the same after any given number of steps), the strong ergodic theorem applies and says that (*) boils down to showing that the expectation of f(t) is finite in the stationary distribution. As in the work of Hastad et al., this boils down to showing that for any state s with f(s) <= V, there is a stopping time Ts <= V-1 such that

E( f(y(Ts))^2 - f(y(0))^2 | y(0) = s) <= - f(s).

Thus, the hard part of the proof is once again a detailed analysis of the drift for an appropriately defined stopping time. The main difficulty in the analysis is the dependence between different users, many of whom must simultaneously be successful to decrease the potential. Actually, the analysis is so complicated in the multi-channel case that it would be very nice to have a better proof method.

Goldberg, MacKenzie, Paterson and Srinivasan's Method:

Queue-Free-Model Protocol: The partitioning of time-steps for different purposes makes it possible to prove that as long as lambda is at most 1/e, the expected delay of any message is O(1). Intuitively, the reason that the protocol works is that it handles bursts of arrivals well: When a large number of messages arrives in a short time, these messages are handled independently since the trial sets of the relevant nodes are pairwise disjoint. Over time, the problem messages get spread out from each other (so they succeed) but they also are kept out of the way of new arrivals (because they eventually reside at high levels of the tree).

The analysis of the protocol is based on the following idea, where c denotes e(1+delta) for some constant delta>0: suppose we have ell time slots and at most ell/c messages. A reasonable strategy is as follows:

Note that in all at most ell time steps are used. However, for any positive constant epsilon, if we run the above for a sufficient number (m(epsilon)) of steps, then the probability that any given message survives is at most epsilon and the probability of having ell mu messages remaining is exponentially small in ell.

The protocol is designed to essentially simulate many (an infinite number of!) parallel copies of the above idea. As a messages moves up a tree it is choosing a random trial time from a set that is 4 Delta times larger. However, each internal node has 4 Delta^2 children, so effectively, the size of the trial set is going down by a factor of Delta with each level.

The proof method is to show that for every level i, the probability that at least

b (4 Delta)^i (2 Delta)^{t-1}/(e(1+delta))
messages enter it is doubly exponentially small in i and t. The theorem then follows from the above analysis by looking at the case t=1.

Queueing-Model Protocol: The proof uses the approach of Raghavan and Upfal in showing that, conditioned on any given system state at a given time, the expected system state is good T steps later. The key steps of the proof are

The proof then follows because of the requirement that once a user starts it has to continue running "for a while". This ensures that most steps do not have users starting or stopping, so in fact the expected average waiting time of messages is O(1) (and not O(n^(37))!).

Hastad, Leighton and Rogoff's Method: In order to prove that polynomial backoff is stable, Hastad et al. study the potential function

f(s) = Sum_(1<=i<=N) qi + Sum_(1<=i<=N) (bi+1)^(alpha + (1/2))-N,
where qi is the length of the i'th queue in state s and bi is the size of the i'th backoff counter in s. They infer stability from the following conditions, where delta, d and V are taken to be constants: Condition (1) is used to show that for some constant c (which depends upon delta, d and V),
(*) E(Sum_(0 <= t <= T*) L(y(t)) | y(0)=s) <= c f(s)^2,
where T* is the stopping time giving the time of the first visit to a state with potential at most V and L(y) denotes the number of messages in the system in state y.

By Theorem 14.0.1 of Meyn and Tweedie [MT93], (*) implies that the chain is positive recurrent with a stationary distribution such that the expected value of L (the number of messages in the system) is finite in the stationary distribution.

The proof of (*) is by induction on time, by considering a new stopping time T*j which is the first time that either

Then it is shown by induction on j that
E(Sum_(0 <= t <= T*j) L(y(t)) | y(0)=s) <= c f(s)^2.

The main part of the proof is to establish condition (1), and this is done by detailed (and clever) case analysis (with a careful choice of the stopping time). The analysis is complicated by dependence between the events that are considered.

To prove the instability of binary exponential backoff, Hastad et al. find a potential function for which, for any state in the system, the potential is expected to increase by at least a fixed amount during the subsequent transition.

Given the nature of the potential function, it is easy to see that the expected average waiting time of messages (and, therefore, the expected average number of waiting messages in the system) grows without bound. One can then prove that the system is not positive recurrent by using a theorem such as the following (Theorem 2.2.6 of [FMM95]).
Theorem: For an irreducible Markov chain to be non-positive-recurrent, it is sufficient that there exist a positive function f on the state space and constants c and d such that

Ingenoso's Method: We can assume that the sequence {pi} tends to zero as i tends to infinity. (Otherwise, the result follows from Kelly and MacPhee's result.) Furthermore, we can assume that for any fixed age T, the expected time that elapses before an age-T message next attempts to send is finite. Thus, the sum (from t = T to infinity) of the product (from i = T to t) of (1 - pi) is finite. The finiteness of this sum prevents the {pi}s from tailing off too quickly and guarantees that, with positive probability, the backlog of messages grows without bound. In particular, Ingenoso defines a sequence t1,t2,... of ``observation times'' and shows

Kelly and MacPhee's Method: Consider a system in which the protocol runs in an externally jammed channel, with Poisson arrivals with mean lambda. Let qt denote the sum p0+ ... + pt. Let F(lambda) be a random variable denoting the set of steps in which at most one message tries to transmit. The number of attempted transmissions at step t is Poisson with mean lambda qt, so the expected size of F(lambda) is the sum from t=1 to infinity of

(1 + lambda qt) exp(- lambda qt).
Note that the expected size of F(lambda) is non-increasing in lambda. The critical arrival rate for the protocol is defined to be
lambdac = inf { lambda | E( | F(lambda) | < infinity) }.
If lambda > lambdac then, with probability 1, | F(lambda) | is finite.

Now suppose that lambda > lambdac and consider a system in which the protocol runs with arrival rate lambda without an externally jammed channel. Consider adding a new stream of Poisson arrivals with mean epsilon for some epsilon > 0. If F(lambda) is finite, the probability that every step in F(lambda) is blocked by one of the new arrivals is at least some constant p > 0. Thus, the probability that there are no successes is at least p. This implies that for any step t, the probability that there are no successes from step t onwards is at least p. This holds even if we condition on the system being in any given state at step t. Thus, if Ti denotes the step in which the i'th success takes place, then the probability that there are no successes after step Ti , conditioned on the fact that Ti is finite, is at least p. Finally, the probability that there are at least n successes is at most (1-p)^n, so the expected number of successes is finite. We conclude that for any arrival rate lambda > lambdac, the expected number of successful transmissions is finite.

We now wish to show that if lambda is less than lambdac then the expected number of successful transmissions is infinite. To do this, we note first that the set of steps in which there is at most one retransmission is a superset of F(lambda). (This is also true when we run without the blocked channel.) On any step in which there is exactly one retransmission, the probability of success is at least e^(-lambda) and on any step in which there is no retransmission, the probability of success is at least p0 lambda e^(-lambda). The result follows because Kelly and MacPhee assume that p0 = 1. (Note: A similar result holds even if p0 = 0. Let T be the smallest integer such that pT > 0. Then consider the set of steps in which there are at most one transmission by messages older than T. Then calculate the (positive) probability of having exactly one send on such a step... )

Finally, we show that the critical arrival rate is zero if and only if qt = omega(log(t)). To see this, note that the expected size of F(lambda) is at most a constant times the sum from t = 1 to infinity of

qt exp(- lambda qt).
Rewriting this, we see that the expected size of F(lambda) is at most a constant times the sum from t=1 to infinity of
qt t^(- lambda (qt/log(t))).
Finally, since qt <= t, the expected size of F(lambda) is at most a constant times the sum from t=1 to infinity of
t^(1- lambda (qt/log(t))).
It is apparent from the above that E( | F(lambda) | ) is finite if qt >= ((2 + epsilon)/lambda)log(t).

Raghavan and Upfal's Method: A basic tool used by Raghavan and Upfal is a lemma which shows that, for any x, if the system has x messages in the buffer at step t, then, with high probability, the buffer is at most 1/4 full at step t + T(n), for an appropriately chosen function T(n). The Markov chain describing the messages in the buffers has a stationary distribution and in that stationary distribution, the probability that any user switches to the reset state is at most N^(-8).

These two facts can be used to show that the expected amount of time that a message spends in a buffer is O(log N). Similarly, they can be used to show that, with high probability (with probability at least 1 - N^(-5)), the message is delivered from the buffer and never enters the queue.

The final part of the proof is to show that if the message is put in the queue, then the expected time that it spends there is O(N^4 log(N)). This is established by showing that the probability that the message is put in a queue of any given length is (relatively) low, and the expected time needed to spend the message, given its position in the queue is not too long. The analysis is done by breaking the period of time after the message is put in the queue into intervals and then proving (using the tool described earlier) that there is a reasonably high probability of successful deliveries from the queue during each interval.

Some References

There has been a huge amount of work on the contention-resolution problem. This bibliography is not intended to contain all publications on the subject. Instead, we are most interested in publications which prove theoretical results about contention on multiple-access channels, without using unproven assumptions, and in publications which are useful for proving such results. I'm sure that I omitted many important publications. Feel free to tell me ( about them. I haven't read all of the publications on this list!

[IEEE85] Special issue of IEEE Trans. on Information Theory, IT-31 1985.

[Abr73] N. Abramson,The Aloha system, In Computer-Communication Networks, N. Abramson and F. Kuo, editors, Prentice-Hall, (1973).

[Ald87] D. J. Aldous, Ultimate Instability of Exponential Back-Off Protocol for Acknowledgement-Based Transmission Control of Random Access Communication Channels, IEEE Trans. on Information Theory, IT-33(2) (March, 1987) 219-223.

[AGM00] H. Al-Ammal, L.A. Goldberg and P. MacKenzie, Binary Exponential Backoff is stable for high arrival rates, Proc. Int'l. Symp. on Theoretical Aspects of Computer Science 17 Lecture Notes in Computer Science 1770 H. Reichel and S. Tinson (eds.) (Springer-Verlag 2000) 169-180.

[Cap79] J.I. Capetanakis, Tree Algorithms for Packet Broadcast Channels, IEEE Trans. on Information Theory IT-25(5) (1979) 505-515.

[Chl00] B.S. Chlebus, Randomized Communication in Radio Networks, to appear in Handbook on Randomized Computing, Kluwer Academic Publishers, P.M. Pardalos, S. Rajasekaran, J. Reif and J.D.P. Rolim, Eds. (2000)

[CFM83] M. Cottrell, J-C Fort and G. Malgouyres, Large deviations and rare events in the study of stochastic algorithms, IEEE Trans. Automat. Contrl. AC-28 (1983) 907-920

[EH98] A. Ephremides and B. Hajek, Information theory and communication networks: an unconsummated union, IEEE Trans. Inf. Theory 44(6) (1998) 2416-2432.

[FMM95]G.Fayolle, V.A. Malyshev, and M.V. Menshikov, Topics in the Constructive Theory of Countable Markov Chains, Cambridge University Press, 1995.

[FFH86]G. Fayolle, P. Flajolet and M. Hofri, On a functional equation arising in the analysis of a protocol for a multi-access broadcast channel, Adv. Appl. Prob. 18 (1986) 441-472.

[FFHJ85]G. Fayolle, P. Flajolet, M. Hofri and P. Jacquet, Analysis of a stack algorithm for random multiple-access communication, IEEE Trans. on Information Theory IT-21(2) (1985) 244-254.

[FJ87] P. Flajolet and P. Jacquet, Analytic models for tree communication protocols, in Flow control of congested networks, NATO Advance Science Institute Series, Series F: Computer and systems sciences 38 (A.R. Odoni, L. Bianco and G. Szego, Eds.) (Springer-Verlag 1987) 223-234.

[GJKP00] L.A. Goldberg, M. Jerrum, S. Kannan and M. Paterson, A bound on the capacity of backoff and acknowledgement-based protocols, Research Report 365, Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK (Jan 2000).

[GM99] L. A. Goldberg and P. MacKenzie, Analysis of Practical Backoff Protocols for Contention Resolution with Multiple Servers. Journal of Computer and System Sciences 58 (1999) 232--258.

[GMPS98] L. A. Goldberg, P. MacKenzie, M. Paterson and A. Srinivasan, Contention Resolution with Constant Expected Delay, Submitted for journal publication. (Preliminary versions of this work appeared in a paper written by the third and fourth authors (FOCS 95, 104-113) and in a paper written by the first and second authors (FOCS 97, 213-222).)

[GGMM88] J. Goodman, A.G. Greenberg, N. Madras and P. March, Stability of binary exponential backoff, J. of the ACM 35(3) (1988) 579-602.

[GFL87] A. G. Greenberg, P. Flajolet and R. E. Ladner, Estimating the multiplicities of conflicts to speed their resolution in multiple access channels, J. of the ACM 34(2) (1987) 289-325.

[H82] B. Hajek, Acknowledgement-based random access transmission control: An equilibrium analysis, Proc. Int. Conf. Communications New York, IEEE Press, 1982.

[HLR96] Johan Hastad, Tom Leighton and Brian Rogoff, Analysis of Backoff Protocols for Multiple Access Channels, SIAM J. Comput, 25(4) 740-774 (1996)

[Ing93] M.J. Ingenoso, Stability Analysis for Certain Queueing Systems and Multi-Access Communication Channels, PhD Thesis, University of Wisconsin-Madison, 1993.

[Kel85] F. P. Kelly, Stochastic Models of Computer Communication Systems, J. R. Statist. Soc B 47(3) (1985) 379-395

[KM87] F.P. Kelly and I.M. MacPhee, The Number of Packets Transmitted by Collision Detect Random Access Schemes, The Annals of Probability 15(4) (1987) 1557-1568.

[L98a] U. Loher, Efficiency of first-come first-served algorithms, Proc. ISIT (1998) p108.

[L98b] U. Loher, Information-theoretic and genie-aided analyses of random-access algorithms, PhD Thesis, Swiss Federal Institute of Technology, DISS ETH No. 12627, Zurich (1998).

[MacP89] I.M. MacPhee, On Optimal Strategies in Stochastic Decision Processes, D. Phil Thesis, University of Cambridge, (1987).

[MF85] P. Mathys and P. Flajolet, Q-ary Collision Resolution Algorithms in Random-Access Systems with Free or Blocked Channel Access, IEEE Trans on Information Theory IT-31(2) (1985) 217-243.

[MB76] R. Metcalfe and D. Boggs, Distributed packet switching for local computer networks, Comm. ACM 19 (1976) 395-404.

[MT93] S.P. Meyn and R.L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, 1993.

[MT81]V.A. Mikhailov and T.S. Tsybakov, Upper bound for the capacity of a random multiple access system, Problemy Peredachi Informatsii, 17(1) (1981) 90-95.

[MP93]M. Molle and G.C. Polyzos, Conflict Resolution Algorithms and their Performance Analysis, Technical Report Number CS93-300, Computer Systems Research Institute, University of Toronoto, 1993.

[MH85 J. Mosely and P.A. Humblet, A class of efficient contention resolution algorithms for multiple access channels, IEEE Trans. on Communications, COM-33(2) (1985) 145-151.

[Pip81]N. Pippenger, Bounds on the performance of protocols for a multiple access broadcast channel, IEEE Trans. on Information Theory, IT-27 (1981) 145-151.

[RS90]R. Rom and M. Sidi, Multiple access protocols: performance and analysis, (Springer-Verlag, 1990).

[Ros84]W.A. Rosenkrantz, Some theorems on the instability of the exponential back-off protocol, Performance '84 (E. Gelenbe, Ed) (North Holland) (1984) 199-205.

[RU95] P. Raghavan and E. Upfal, Contention resolution with bounded delay, Proc. ACM Symp. on Theory of Computing 24 (1995) 229-237.

[Sti74] S. Stidham, The last word on L= lambda W, Operations Research 22 (1974) 417-421.

[SR86] W. Szpankowski and V. Rego, Instability conditions arising in analysis of some multiaccess protocols, technical report CSD-TR-577, Purdue University, Lafeyette, Indiana, 1986.

[TL87] B.S. Tsybakov and N. B. Likhanov, Upper Bound on the Capacity of a Random Multiple-Access System, Problemy Peredachi Informatsii, 23(3) (1987) 64-78.

[TM78] B.S. Tsybakov and V.A. Mikhailov, Free synchronous packet access in a broadcast channel with feedback, Probl. Information Transmission, 14(4) (1978) 259-280.

[V86] S. Verdu, Computation of the efficiency of the Mosely-Humblet contention resolution algorithm: a simple method, Proc. of the IEEE 74(4) (1986) 613-614.

[VP83] N. D. Vvedenskaya and M.S. Pinsker, Non-optimality of the part-and-try algorithm, In Abstracts of the International Workshop on Convolutional Codes, Multiuser Communication, Sochi, USSR, (1983) 141-144.


[1] It is standard to work in a slotted model in which time is divided into discrete time slots. During any time slot, each user can send at most one message. The user can determine (by listening to the channel) whether the message was successfully transmitted. While real implementations of multiple-access channels do not fit precisely within the slotted model, it can be shown (for example, see [HLR96] and [KM87]) that results obtained in the slotted model do apply to realistic multiple-access channels.


[3] That is, the probability that k messages arrive on a given time step is (lambda^k / k!)e^(-lambda)

[4] For a good discussion of different notions of stability, see [FMM95], [MT93] and [HLR96].

[4A] In practice, it is possible to implement the full-sensing model when there is a single channel, but this becomes increasingly difficult in situations where there are multiple shared channels, such as optical networks. Thus, acknowledgement-based protocols are sometimes preferable to full-sensing protocols. For work on contention-resolution in the multiple-channel setting, see [GM99].

[5] An irreducible aperiodic Markov chain is said to be transient if the probability of returning to the start state (in this case, the state in which no messages are waiting) is less then 1. Note: An irreducible Markov chain is one in which every state can be reached from every other state. It is said to be aperiodic if, for every state s and every sufficiently large t, there is positive probability of moving from state s back to state s in t steps.

[6] Perhaps the easiest way to see this (which is from Kelly and MacPhee's paper) is to consider the following variant of a backoff protocol: Suppose that when a message is sent unsuccessfully for the i'th time, it chooses an integer w in [1,...,(i+1)^k] (for some constant k) and then waits w steps before trying again. This variant is similar to the backoff protocol in which pi= (i+1)^(-k) but it is easier to understand (in this context). Now let R(t) be a random variable representing the number of times that a message tries (in the variant) before it is age t. Clearly, the sum from i=1 to R(t) of (i+1)^k is at least t, so (R(t)+1)^(k+1) is at least t and, since k is constant, R(t) is omega(log t). The result for the variant of backoff now follows from Kelly and MacPhee's result. A similar result holds for the geometric version of backoff protocols described in these notes.

[7] Aldous notes ``simulations and heuristic arguments ( [CFM83], [GGMM88], and [H82]) suggest that, for fairly general [backoff protocols], if lambda is rather smaller than 1/e, then the process quickly reaches a quasistationary distribution which persists for a long, but finite time T_lambda before instability sets in. Formalising this and estimating E[T_lambda] is [a] challenging problem.''

[8] An irreducible aperiodic Markov chain is said to be positive recurrent if the expected time to return to the start state is finite. This implies that the chain has a unique stationary distribution.

[9] More formally, let Li be the number of messages in the system after step i and let Lavg be the limit as t goes to infinity of (1/t) Sum_(1<= i <= t) L_i. Their requirement is that the expectation of Lavg be finite (bounded from above by a function of N). This requirement is stronger than just requiring Lavg to be finite with probability 1, for example, consider the case in which Lavg is equal to 2^i with probability 2^(-i). Work by Jewell and Stidham (see [Sti74]) shows that with probability 1, Lavg = lambda Wavg, where Wi is the waiting time of the i'th message, and Wavg is the limit as m goes to infinity of (1/m) Sum_(1<= i <= m) W_i.

[10] Suppose that each user has a sequence of probabilities p0,p1,... Every time the user receives a message when its queue is empty it acts like an age-based protocol with the above sequence until either it sends the message or another message arrives (after this, the protocol might possibly do something else). Raghavan and Upfal show that if there are n/2 users with empty queues at time t, then the probability that at least two of them sends at step t + n/3 is at least a constant q. Thus, if lambda is larger than 1 - q, then either there are many steps with fewer than n/2 users with empty channels or there are many steps with collisions, and an ever-increasing load.

[11] Goodman, Greenberg, Madras and March note that the stated theorem is

[12] A stopping time T with respect to y(t) is a random variable taking non-negative integers as values for which y(0),...,y(t) determines whether or not T=t.