MBZUAI’s New Model Builds Worlds That Remember Your Moves
Unlike existing AI video generation models, PAN simulates what happens next, and how the world evolves when you take an action.
Topics
News
- Meta Leans on AI to Revive Marketplace’s Appeal Among Young Adults
- Baidu Stock Drops as Its New AI Model Fails to Win Over Investors
- MBZUAI’s New Model Builds Worlds That Remember Your Moves
- Microsoft Launches 'World’s First AI Superfactory'
- OpenAI Pushes Back on NYT’s Demand for 20 Million ChatGPT Conversations
- SoftBank Sells Out of Nvidia as It Bets Big on AI
[Image source: Krishna Prasad/MITSMR Middle East]
Imagine you’re navigating a virtual city. You tell the system: “Turn left at the river, head toward the stadium lights.” The scene shifts accordingly—buildings morph, vehicles change lanes, streetlights glow—and this world remembers where you came from and where you’re going. That’s the promise of the new paper from researchers at the Institute of Foundation Models (IFM) of Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), titled “PAN: An Interactive General World Model Capable of Long-Horizon Simulation.”
In simpler terms, the team has built a model that does more than just generate one flashy video clip when prompted—it simulates what happens next, and how the world evolves when you take actions. Previous models might show a drone flying over a scene if you ask “Drone flying over a city at sunset,” but they typically lose track of the world when you ask for what happens five or ten steps later. PAN allows you to issue a sequence of instructions—“drive around the corner, turn right, stop at the red car” and the model keeps the internal state of the scene stable and consistent as you progress.
The key lies in how PAN is structured. Instead of directly predicting what every pixel will look like next, the model builds and evolves a latent “state” of the world (what’s in the scene, how parts are moving, how they relate), and then uses that to generate the next short video segment. This two-stage process (reasoning in latent space, then rendering) helps the model maintain coherence over longer sequences of time—what the authors call “long-horizon simulation.”
This is worth mentioning because it brings the users closer to an interactive, meaningful simulation of the environment and agents that can be useful in robotics, planning, virtual training, or any scenario where one wants to “play out” the consequences of actions rather than just a one-shot illusion.
The authors of the paper stress that for real-world utility, you need more than crisp frames: you need action simulation fidelity (does the action you asked for take place?), long-horizon stability (does the simulation hold together over many steps?), and simulative planning ability (can you use the model as a sandbox to test strategies?). PAN reportedly leads open-source models on those metrics.
Like all generative systems, it comes with a set of aspects that can be further worked upon, like rendering quality, realism, and edge-case behaviour remain challenging. As an interactive world builder that can support planning, exploration, and decision-making, it has the potential to become a modest but meaningful tool.
