Skip to main content

Quantizing Video Mamba: Robust Streaming Vision on Low-Precision Optical Hardware

Supervisor

Suitable for

MSc in Advanced Computer Science
Mathematics and Computer Science, Part C
Computer Science and Philosophy, Part C
Computer Science, Part C

Abstract

Abstract

Lumai is developing a 3D optical AI accelerator capable of executing matrix–vector operations with significantly higher energy efficiency than conventional digital hardware such as Nvidia GPUs. Instead of relying on digital parallelism, computation is performed through optical dataflow, enabling extremely high throughput at low energy cost.

However, this architecture imposes two critical constraints that current AI models (like Vision Transformers) are not natively designed for:

  1. Streaming Computation
    Data flows continuously through the processor, with limited random access to past activations or memory. Architectures such as State-Space Models (e.g., Mamba) are therefore better aligned than architectures relying on full attention and key–value caches, such as Transformers.

  2. Extreme Low Precision (Int4) + Analog Noise
    To maximize optical efficiency, weights are represented in 4-bit integer format, and computations incur analog noise and non-ideal signal propagation. Many current deep learning models assume FP32/BF16 precision and can degrade significantly under such constraints.

We are seeking Master students to explore the algorithmic, software, and hardware co-design challenges associated with this architecture. Given the novelty of running state-space models on optical hardware, these projects have strong potential to lead to publishable research.

Project 1: The Algorithmic Research Path

Title: Quantizing Video Mamba: Robust Streaming Vision on Low-Precision Optical Hardware

State-of-the-art Video Mamba models are trained in high precision (FP32/BF16). If we compress their weights to Int4 and inject analog noise (simulating optical physics), the model's recurrent state may "drift," causing hallucinations that worsen over time.

Core Objectives:

  • Establish the Baseline: Configure VideoMamba (or similar) model for "Streaming Inference," processing video one frame at a time using purely recurrent states.
  • Build the "Ghost Hardware" Simulator: Implement a custom PyTorch OpticalLiner layer that:
    • Quantizes weights to Int4.
    • Injects Activation Noise.
  • Outlier Suppression: Analyze "Activation Outliers". Implement and evaluate rotation or calibration techniques to smooth these outliers before they hit the optical bottleneck.

Deliverable: An "Optical-friendly VideoMamba" model that maintains performance comparable to the FP16 baseline on standard benchmarks while running on simulated 4-bit optical constraints.