KNOWLEDGE LIBRARY

How Can NVIDIA Lyra 2.0 Create Infinite 3D Worlds from a Single Image? [2026 Latest]

📘この記事で学べること

NVIDIA Lyra 2.0 、1 3D 。 、AI 「 」 、 、 。

manabi AI
2026/5/6 作成 2026/5/7 更新
NVIDIA's New AI Turns One Photo Into A World That Never Breaks
動画を再生

Two Minute PapersNVIDIA's New AI Turns One Photo Into A World That Never Breaks📅 2026年5月3日 公開

この動画の内容を、要点・図解・学習ポイントとして 分かりやすく AI が要約しています。

⚠️

AI が要約しているため、 内容は必ずしも正確とは限りません。 重要な内容は元動画などでご確認ください。

🎯

こんな人におすすめ

  • AI 3D
  • 1
  • NVIDIA AI

この動画から学べる学習ポイント

  • 13D
  • 23D
  • 3
  • 4
  • 5

ここからが本番

詳細な解説記事 - ここを読むと
一気に理解度が深まります

Beyond 2D Pixels: How Lyra 2.0 Achieves Object Permanence

How Can NVIDIA Lyra 2.0 Create Infinite 3D Worlds from a Single Image? [2026 Latest] - 導入 イラスト

The landscape of generative AI has shifted from simple image generation to the creation of entire interactive environments. For a long time, AI models struggled with a fundamental concept that even human toddlers master: object permanence. Early attempts at creating 3D worlds from images often suffered from 'memory loss,' where looking away from a scene and looking back would result in a completely different or corrupted visual. NVIDIA Lyra 2.0 represents a significant leap forward by ensuring that these digital worlds never break, maintaining strict long-term coherence across extended sessions.

Traditional video-based AI models often view the world as a series of 2D pixels on a flat screen. While they can generate stunning visuals, they lack an underlying understanding of 3D geometry. This lack of depth leads to inconsistencies where the 'pixels' don't know where they are supposed to be in space. NVIDIA Lyra 2.0 changes this by integrating a diffusion transformer—similar to the architecture used in Sora—but with a critical addition: a 3D memory component that anchors every visual element to a physical scaffold.

💡

Key insight: The ability to maintain object permanence is the bridge between a 'cool video' and a 'functional simulation environment.'

The importance of this cannot be overstated for industries like robotics and autonomous driving. If a robot is being trained in a simulation that changes its layout every time the robot turns its head, the training becomes useless. Lyra 2.0 provides a stable ground truth. By taking a single street view image, for example, the AI can build a persistent world where a robot can learn safely and effectively without the visual data 'hallucinating' new obstacles or removing existing landmarks.

NVIDIA Lyra 2.0 turns a single static photo into a fully explorable, persistent 3D reality. This transition from generative imagery to generative geometry is what sets this research apart. Dr. Károly Zsolnai-Fehér highlights that this technology can recreate the feeling of visiting one's hometown, like Budapest or Pécs, by transforming old photos into explorable digital twins. The emotional and practical implications of this 'memory-preserving' AI are vast, touching everything from historical preservation to high-end game development.

  • Real-world photos become 3D stages.
  • Object permanence is maintained indefinitely.
  • Simulation data for AI agents becomes higher quality.
  • Computational overhead is managed via smart caching.

The Technical Breakthrough: Per-Frame Scaffolding vs. Global Fusion

How Can NVIDIA Lyra 2.0 Create Infinite 3D Worlds from a Single Image? [2026 Latest] - 本論 イラスト

The core innovation of Lyra 2.0 lies in how it handles its 3D memory. Most previous research attempted to fuse all visual data into one giant, global 3D world. While this sounds logical, it actually introduces a 'photocopy of a photocopy' effect. Tiny errors in depth estimation or camera positioning accumulate over time. Eventually, these microscopic mistakes pile up until the entire scene becomes warped, noisy, or completely corrupted. This phenomenon, known as error accumulation, has been the primary barrier to infinite AI worlds.

Instead of a global map, Lyra 2.0 uses what the researchers call a per-frame 3D geometry cache. Imagine the AI keeping a separate little 3D snapshot for every viewpoint it has ever seen. When the user moves the camera back to a previous location, the AI doesn't try to guess what was there. Instead, it asks, 'Which earlier views saw this place best?' and uses those specific snapshots as its memory source. This ensures that the reconstruction is always based on the cleanest possible data rather than a degraded global model.

🔥ここから本番

ここからが大事な
ポイントです

具体例・注意点・明日から使えるヒントを整理しています。

無料閲覧で全文 + 図解の完全版を3日間いつでも読み返せる

この先で、
学びを自分の知識に変える

続きの本文・まとめ図解・FAQ
まで確認できます。

✏️ この記事で学べること

  • 3D
  • 3D

10秒で完了・クレカ不要・パスワード作成不要

この続きは…

残り 6,781/11,377 文字(残り 60%)

あと 3 章 + 編集視点 + FAQ

manabi AI

動画の内容を基にAIが自動生成しました

🎉 ここまで読んでくれてありがとう

あなたの時間と学びが私たちの励みです

YouTube要約 1,000ノートが
いつでも無料で学習し放題

YouTube の知恵を 5 分で学べるメディア

30秒で完了 ・ クレカ不要