TerraFlow

Multimodal, Multitemporal Representation Learning for Earth Observation

TerraFlow: Multimodal, Multitemporal Representation Learning for Earth Observation

We propose TerraFlow, a novel approach to multimodal, multitemporal learning for Earth observation. TerraFlow builds on temporal training objectives that enable sequence-aware learning across space, time, and modality, while remaining robust to the variable-length inputs commonly encountered in real-world Earth observation data.

Key Features

  • Multimodal & Multitemporal Foundation Model Jointly learns from multiple EO modalities (e.g., optical, SAR, DEM) and temporal sequences via early temporal fusion in a unified transformer.

  • Explicit Temporal Pretraining Incorporates temporal attention with rotary positional embeddings (RoPE) to model relative time differences and handle irregular, variable-length time series.

  • Temporal Disjoint Sampling (TDS) A novel training objective that enforces true temporal reasoning by separating input and target timestamps, encouraging learning of dynamics rather than single-timestep shortcuts.

  • Strong, Parameter-Efficient Performance Consistently outperforms state-of-the-art EO foundation models on GEO-Bench-2 temporal tasks, with small models rivaling much larger baselines.

References