Skip to main content

Hidden Encoder in Decoder only generative models

Supervisor

Andrey Kravchenko

Suitable for

MSc in Advanced Computer Science
Computer Science, Part B
Computer Science, Part C

Abstract

Supervised by Andrey Kravchenko andrey.kravchenko@cs.ox.ac.uk

Expected background of students and CS techniques that will be applied

The student will know how to work or be able to quickly learn how to work with Pytorch, Hugging Face, IPython notebooks. They should also have a solid mathematical base.

 

Hidden Encoder in Decoder only generative models

In decoder-only LMs, do some layers primarily encode context while later layers convert it into next-token predictions? This project will quantify and test that separation. You’ll use probes to map a model’s “prediction depth”—how the latent guess for the next token sharpens
layer-by-layer—and ask whether early/mid layers look encoder-like (contextual abstraction) while late layers look decoder-like (strong next-token alignment). Prior work shows the raw logit lens often gives brittle, biased readouts, while the tuned lens gives calibrated per-layer distributions; you’ll build on those and other methods.

Study will involve popular pre-trained open-source models (e.g., GPT, Llama). Possible criteria for “encoding vs decoding”:

  1. Predictivity: per-layer cross-entropy of various probes to the ground-truth next token; the “generation frontier” is where loss drops sharply.
  2. Causality: “activation/attribution patching’ and targeted ablations to see which layers are necessary for (a) reconstructing/encoding context features vs (b) predicting the next token.
  3. Mechanism clues: check for contribution of induction heads (next-token copy/continuation circuits) and MLP key-value memories (stored lexical/semantic knowledge) that typically appear in mid/late blocks—evidence for generation-oriented roles.

 Supplement with representation analyses to show early→mid layers aggregate and abstract context, aligning with broader findings of layer specialization.

Outcomes include  a reproducible map of layer roles across models, with: (i) a quantitative prediction-depth metric; (ii) causal evidence for which layers are primarily “encoders” vs
“decoders” under your criteria; and (iii) design hints (e.g., where pruning, caching, or routing would least harm “understanding”). Additionally, it is possible to compare families/scales, and test whether moving or skipping “middle” layers preserves understanding but hurts generation.