Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners


1 Google       2 TU Munich       3 Munich Center for Machine Learning

✨ CVPR 2026 Oral Presentation ✨


Paper (arXiv) Supplemental Material (26MB) Code

The gist: Linear In-Context Learning (LILA) uses dense cues, such as optical flow and depth, to learn effective pixel-level features from videos. By fitting a linear mapping from features to cues on a context frame, LILA encourages the same mapping to remain valid for a query frame.

Preview


Linear In-Context Learning (LILA)


Split-View Comparison

Image frames
LILA (PCA)

Video Segmentation (one-shot linear probing)


Citation

@inproceedings{Araslanov:2025:LILA,
  author = {Araslanov, Nikita and Sundermeyer, Martin and Matsuki, Hidenobu and Tan, David Joseph and Tombari, Federico},
  title = {Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners},
  booktitle = {CVPR},
  year = {2026},
}