Hidden Encoder in Decoder only generative models
Supervisor
Suitable for
Abstract
Supervised by Andrey Kravchenko andrey.kravchenko@cs.ox.ac.uk
Expected background of students and CS techniques that will be applied
The student will know how to work or be able to quickly learn how to work with Pytorch, Hugging Face, IPython notebooks. They should also have a solid mathematical base.
Hidden Encoder in Decoder only generative models
In decoder-only LMs, do some layers primarily encode context while later layers convert it into next-token predictions?
This project will quantify and test that separation. You’ll use probes to map a model’s “prediction depth”—how
the latent guess for the next token sharpens
layer-by-layer—and ask whether early/mid layers look encoder-like
(contextual abstraction) while late layers look decoder-like (strong next-token alignment). Prior work shows the raw logit
lens often gives brittle, biased readouts, while the tuned lens gives calibrated per-layer distributions; you’ll build
on those and other methods.
Study will involve popular pre-trained open-source models (e.g., GPT, Llama). Possible criteria for “encoding vs decoding”:
- Predictivity: per-layer cross-entropy of various probes to the ground-truth next token; the “generation frontier” is where loss drops sharply.
- Causality: “activation/attribution patching’ and targeted ablations to see which layers are necessary for (a) reconstructing/encoding context features vs (b) predicting the next token.
- Mechanism clues: check for contribution of induction heads (next-token copy/continuation circuits) and MLP key-value memories (stored lexical/semantic knowledge) that typically appear in mid/late blocks—evidence for generation-oriented roles.
Supplement with representation analyses to show early→mid layers aggregate and abstract context, aligning with broader findings of layer specialization.
Outcomes include a reproducible map of layer roles across models, with: (i) a quantitative prediction-depth metric;
(ii) causal evidence for which layers are primarily “encoders” vs
“decoders” under your
criteria; and (iii) design hints (e.g., where pruning, caching, or routing would least harm “understanding”).
Additionally, it is possible to compare families/scales, and test whether moving or skipping “middle” layers preserves
understanding but hurts generation.