Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners

¹ Google ² TU Munich ³ Munich Center for Machine Learning

✨ CVPR 2026 Oral Presentation ✨

Paper (arXiv) Supplemental Material (26MB) Code

The gist: Linear In-Context Learning (LILA) uses dense cues, such as optical flow and depth, to learn effective pixel-level features from videos. By fitting a linear mapping from features to cues on a context frame, LILA encourages the same mapping to remain valid for a query frame.

Preview

Linear In-Context Learning (LILA)

Split-View Comparison

Image frames

LILA (PCA)

Video Segmentation (one-shot linear probing)

Citation

@inproceedings{Araslanov:2025:LILA,
  author = {Araslanov, Nikita and Sundermeyer, Martin and Matsuki, Hidenobu and Tan, David Joseph and Tombari, Federico},
  title = {Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners},
  booktitle = {CVPR},
  year = {2026},
}

Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners

Nikita Araslanov^1,2,3

Martin Sundermeyer¹

Hidenobu Matsuki¹

David Joseph Tan¹

Federico Tombari^1,2,3

¹ Google ² TU Munich ³ Munich Center for Machine Learning

✨ CVPR 2026 Oral Presentation ✨

Preview

Linear In-Context Learning (LILA)

Split-View Comparison

Video Segmentation (one-shot linear probing)

Citation

Featurising Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners

Nikita Araslanov1,2,3

Martin Sundermeyer1

Hidenobu Matsuki1

David Joseph Tan1

Federico Tombari1,2,3

1 Google 2 TU Munich 3 Munich Center for Machine Learning

✨ CVPR 2026 Oral Presentation ✨

Preview

Linear In-Context Learning (LILA)

Split-View Comparison

Video Segmentation (one-shot linear probing)

Citation

Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners

Nikita Araslanov^1,2,3

Martin Sundermeyer¹

Hidenobu Matsuki¹

David Joseph Tan¹

Federico Tombari^1,2,3

¹ Google ² TU Munich ³ Munich Center for Machine Learning