What is object permanence in the context of AI?

It refers to the AI's ability to remember and consistently render objects in a 3D space even after the camera has moved away and returned to that location.

How does Lyra 2.0 differ from OpenAI's Sora?

While both use diffusion transformers, Lyra 2.0 incorporates a 3D geometry cache that allows for interactive, persistent exploration of a world rather than just generating a non-interactive video.

Can Lyra 2.0 generate moving characters or traffic?

The current version is primarily focused on static scenes. Adding consistent dynamic objects like moving people or cars is a challenge for future research.

What hardware is required to run these models?

These models require powerful NVIDIA GPUs. The video demonstrates running large models efficiently using the Lambda GPU Cloud infrastructure.

Is the code for Lyra 2.0 available to the public?

Yes, NVIDIA has released the model and the accompanying research paper for free to the scholarly and developer community.

How Can NVIDIA Lyra 2.0 Create Infinite 3D Worlds from a Single Image? [2026 Latest]

Beyond 2D Pixels: How Lyra 2.0 Achieves Object Permanence

The landscape of generative AI has shifted from simple image generation to the creation of entire interactive environments. For a long time, AI models struggled with a fundamental concept that even human toddlers master: object permanence. Early attempts at creating 3D worlds from images often suffered from 'memory loss,' where looking away from a scene and looking back would result in a completely different or corrupted visual. NVIDIA Lyra 2.0 represents a significant leap forward by ensuring that these digital worlds never break, maintaining strict long-term coherence across extended sessions.

Traditional video-based AI models often view the world as a series of 2D pixels on a flat screen. While they can generate stunning visuals, they lack an underlying understanding of 3D geometry. This lack of depth leads to inconsistencies where the 'pixels' don't know where they are supposed to be in space. NVIDIA Lyra 2.0 changes this by integrating a diffusion transformer—similar to the architecture used in Sora—but with a critical addition: a 3D memory component that anchors every visual element to a physical scaffold.

💡

Key insight: The ability to maintain object permanence is the bridge between a 'cool video' and a 'functional simulation environment.'

The importance of this cannot be overstated for industries like robotics and autonomous driving. If a robot is being trained in a simulation that changes its layout every time the robot turns its head, the training becomes useless. Lyra 2.0 provides a stable ground truth. By taking a single street view image, for example, the AI can build a persistent world where a robot can learn safely and effectively without the visual data 'hallucinating' new obstacles or removing existing landmarks.

NVIDIA Lyra 2.0 turns a single static photo into a fully explorable, persistent 3D reality. This transition from generative imagery to generative geometry is what sets this research apart. Dr. Károly Zsolnai-Fehér highlights that this technology can recreate the feeling of visiting one's hometown, like Budapest or Pécs, by transforming old photos into explorable digital twins. The emotional and practical implications of this 'memory-preserving' AI are vast, touching everything from historical preservation to high-end game development.

Real-world photos become 3D stages.
Object permanence is maintained indefinitely.
Simulation data for AI agents becomes higher quality.
Computational overhead is managed via smart caching.

The Technical Breakthrough: Per-Frame Scaffolding vs. Global Fusion

How Can NVIDIA Lyra 2.0 Create Infinite 3D Worlds from a Single Image? [2026 Latest] - 本論イラスト

The core innovation of Lyra 2.0 lies in how it handles its 3D memory. Most previous research attempted to fuse all visual data into one giant, global 3D world. While this sounds logical, it actually introduces a 'photocopy of a photocopy' effect. Tiny errors in depth estimation or camera positioning accumulate over time. Eventually, these microscopic mistakes pile up until the entire scene becomes warped, noisy, or completely corrupted. This phenomenon, known as error accumulation, has been the primary barrier to infinite AI worlds.

Instead of a global map, Lyra 2.0 uses what the researchers call a per-frame 3D geometry cache. Imagine the AI keeping a separate little 3D snapshot for every viewpoint it has ever seen. When the user moves the camera back to a previous location, the AI doesn't try to guess what was there. Instead, it asks, 'Which earlier views saw this place best?' and uses those specific snapshots as its memory source. This ensures that the reconstruction is always based on the cleanest possible data rather than a degraded global model.

🔥ここから本番

ここからが大事な
ポイントです

具体例・注意点・明日から使えるヒントを整理しています。

✨無料閲覧で全文＋図解の完全版を3日間いつでも読み返せる

この先で、
学びを自分の知識に変える

続きの本文・まとめ図解・FAQ
まで確認できます。

✏️ この記事で学べること

▸3D
▸3D

10秒で完了・クレカ不要・パスワード作成不要

この続きは…

残り 6,781/11,377 文字(残り 60%)

あと 3 章 + 編集視点 + FAQ

ログイン (登録済の方)

How Can NVIDIA Lyra 2.0 Create Infinite 3D Worlds from a Single Image? [2026 Latest]

📘この記事で学べること

この動画から学べる学習ポイント

Beyond 2D Pixels: How Lyra 2.0 Achieves Object Permanence

The Technical Breakthrough: Per-Frame Scaffolding vs. Global Fusion

ここからが大事な
ポイントです

YouTube要約 1,000ノートが
いつでも無料で学習し放題

How Can NVIDIA Lyra 2.0 Create Infinite 3D Worlds from a Single Image? [2026 Latest]

📘この記事で学べること

この動画から学べる学習ポイント

Beyond 2D Pixels: How Lyra 2.0 Achieves Object Permanence

The Technical Breakthrough: Per-Frame Scaffolding vs. Global Fusion

ここからが大事なポイントです

YouTube要約 1,000ノートがいつでも無料で学習し放題

ここからが大事な
ポイントです

YouTube要約 1,000ノートが
いつでも無料で学習し放題