Interaction, Memory, and Efficiency
World models are systems that predict the next state of the world based on historical states and interactive action control. Among various approaches, Video World Models based on interactive video generation are particularly promising due to the photorealism and scalability of video data, as well as recent advances in video generation.
However, current video generation models face significant challenges in achieving the ideal Video World Model. This workshop aims to provide a platform for researchers from both academia and industry to discuss and address these challenges, foster collaboration, advance related academic research, and promote the practical application of Video World Models.
Effective interaction with virtual worlds, including navigation and object manipulation
Maintaining consistency over long video sequences with causal reasoning
Real-time video generation with high quality, addressing throughput and latency
Robotics, Embodied AI, autonomous driving, and more
| Event | Time |
|---|---|
| Opening Remarks | 8:30 AM - 8:35 AM |
| Invited Talk #1 (20 min + 5 min Q&A) | 8:35 AM - 9:00 AM |
| Invited Talk #2 (20 min + 5 min Q&A) | 9:00 AM - 9:25 AM |
| Invited Talk #3 (20 min + 5 min Q&A) | 9:25 AM - 9:50 AM |
| Poster Session + Coffee Break + Best Paper Award | 9:50 AM - 10:40 AM |
| Invited Talk #4 (20 min + 5 min Q&A) | 10:40 AM - 11:05 AM |
| Invited Talk #5 (20 min + 5 min Q&A) | 11:05 AM - 11:30 AM |
| Invited Talk #6 (20 min + 5 min Q&A) | 11:30 AM - 11:55 AM |
| Closing Remarks | 11:55 AM - 12:00 PM |
Submissions must present original, unpublished research. Manuscripts should be 4–8 pages (excluding references) using the CVPR 2026 template. Accepted papers will be published in the CVPR 2026 Workshop Proceedings.
A flexible, non-archival venue for sharing a broad range of contributions without restrictive publishing constraints, formatting requirements, or page limits. We warmly welcome:
For any questions, please contact us at: