AI 周报

前沿内容汇总

2026-03-24

第 13 周

💡 一句话总结

AI正从感知走向认知与具身,多模态世界模型和安全成为实现AGI的关键。

AI 发展路线图

第 10 周

发展中...

第 11 周

发展中...

第 12 周

**多模态世界模型与具身智能**

第 13 周

**多模态世界模型与具身智能** **解读:** 本周的AI前沿内容呈现出几个相互关联的关键趋势,它们共同指向了未来AI发展的一个主要方向:构建能够理解和与复杂世界

第 14 周

具身智能与多模态融合

未来方向

💫 这条线展示了 AI 发展的演进脉络。每个节点代表一周的核心发展方向,通过观察这条线的演变,你可以看到 AI 如何从过去的方向逐步演进到现在,以及未来可能的发展方向。

📻 Lex Fridman 播客

#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age

#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age

2026/4/9

Lars Brownworth is a historian, teacher, podcaster, and author specializing in Viking history, medieval Europe, and the Byzantine Empire. https://lexfridman.com/sponsors/ep495-sc Transcript: https://lexfridman.com/lars-brownworth-transcript CONTACT LEX: Feedback – give feedback to Lex: https://lexfridman.com/survey AMA – submit questions, videos or call-in: https://lexfridman.com/ama Hiring – join our team: https://lexfridman.com/hiring Other – other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: https://larsbrownworth.com/ https://www.amazon.com/Sea-Wolves-History-Vikings/dp/1909979120 https://amzn.to/4sHY0xw https://12byzantinerulers.com/ https://apple.co/4sgSxNi SPONSORS: Larridin: Measure AI adoption in your business. https://larridin.com BetterHelp: Online therapy and counseling. https://betterhelp.com/lex LMNT: Zero-sugar electrolyte drink mix. https://drinkLMNT.com/lex Fin: AI agent for customer service. https://fin.ai/lex Shopify: Sell stuff online. https://shopify.com/lex Perplexity: AI-powered answer engine. https://perplexity.ai/ OUTLINE: PODCAST LINKS: https://lexfridman.com/podcast https://apple.co/2lwqZIr https://spoti.fi/2nEwCF8 https://lexfridman.com/feed/podcast/ https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 https://www.youtube.com/lexclips

打开播客
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution

#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution

2026/3/23

Jensen Huang is the co-founder and CEO of NVIDIA, the world’s most valuable company and the engine powering the AI computing revolution. https://lexfridman.com/sponsors/ep494-sc Transcript: https://lexfridman.com/jensen-huang-transcript CONTACT LEX: Feedback – give feedback to Lex: https://lexfridman.com/survey AMA – submit questions, videos or call-in: https://lexfridman.com/ama Hiring – join our team: https://lexfridman.com/hiring Other – other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: https://nvidia.com https://x.com/nvidia https://x.com/NVIDIAAI https://youtube.com/@nvidia https://www.instagram.com/nvidia/ https://www.linkedin.com/company/nvidia/ https://www.facebook.com/NVIDIA/ https://github.com/NVIDIA https://developer.nvidia.com/nemotron SPONSORS: Perplexity: AI-powered answer engine. https://perplexity.ai/ Shopify: Sell stuff online. https://shopify.com/lex LMNT: Zero-sugar electrolyte drink mix. https://drinkLMNT.com/lex Fin: AI agent for customer service. https://fin.ai/lex Quo: Phone system (calls, texts, contacts) for businesses. https://quo.com/lex OUTLINE: PODCAST LINKS: https://lexfridman.com/podcast https://apple.co/2lwqZIr https://spoti.fi/2nEwCF8 https://lexfridman.com/feed/podcast/ https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 https://www.youtube.com/lexclips

打开播客
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming

#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming

2026/3/11

Jeff Kaplan is a legendary Blizzard game designer of World of Warcraft and Overwatch, now preparing to launch a new game, The Legend of California, from his new studio Kintsugiyama – available to wishlist on Steam today, with alpha later in March. https://lexfridman.com/sponsors/ep493-sc CONTACT LEX: Feedback – give feedback to Lex: https://lexfridman.com/survey AMA – submit questions, videos or call-in: https://lexfridman.com/ama Hiring – join our team: https://lexfridman.com/hiring Other – other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: https://store.steampowered.com/app/2550530/The_Legend_of_California https://www.kintsugiyama.com/ SPONSORS: Fin: AI agent for customer service. https://fin.ai/lex Blitzy: AI agent for large enterprise codebases. https://blitzy.com/lex BetterHelp: Online therapy and counseling. https://betterhelp.com/lex Shopify: Sell stuff online. https://shopify.com/lex CodeRabbit: AI-powered code reviews. https://coderabbit.ai/lex Perplexity: AI-powered answer engine. https://perplexity.ai/ OUTLINE: PODCAST LINKS: https://lexfridman.com/podcast https://apple.co/2lwqZIr https://spoti.fi/2nEwCF8 https://lexfridman.com/feed/podcast/ https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 https://www.youtube.com/lexclips

打开播客
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music

#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music

2026/3/1

Rick Beato is a music educator, interviewer, producer, songwriter, and a true multi-instrument musician, playing guitar, bass, cello & piano. His incredible YouTube channel celebrates great musicians & musical ideas, and helps millions of people fall in love with great music all over again. https://lexfridman.com/sponsors/ep492-sc Transcript: https://lexfridman.com/rick-beato-transcript CONTACT LEX: Feedback – give feedback to Lex: https://lexfridman.com/survey AMA – submit questions, videos or call-in: https://lexfridman.com/ama Hiring – join our team: https://lexfridman.com/hiring Other – other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: https://youtube.com/RickBeato https://x.com/rickbeato https://instagram.com/rickbeato1 https://rickbeato.com https://beatoeartraining.com https://beatobook.com SPONSORS: UPLIFT Desk: Standing desks and office ergonomics. https://upliftdesk.com/lex BetterHelp: Online therapy and counseling. https://betterhelp.com/lex LMNT: Zero-sugar electrolyte drink mix. https://drinkLMNT.com/lex Fin: AI agent for customer service. https://fin.ai/lex Shopify: Sell stuff online. https://shopify.com/lex Perplexity: AI-powered answer engine. https://perplexity.ai/ OUTLINE: PODCAST LINKS: https://lexfridman.com/podcast https://apple.co/2lwqZIr https://spoti.fi/2nEwCF8 https://lexfridman.com/feed/podcast/ https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 https://www.youtube.com/lexclips

打开播客
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger

#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger

2026/2/12

Peter Steinberger is the creator of OpenClaw, an open-source AI agent framework that’s the fastest-growing project in GitHub history. https://lexfridman.com/sponsors/ep491-sc Transcript: https://lexfridman.com/peter-steinberger-transcript CONTACT LEX: Feedback – give feedback to Lex: https://lexfridman.com/survey AMA – submit questions, videos or call-in: https://lexfridman.com/ama Hiring – join our team: https://lexfridman.com/hiring Other – other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: https://x.com/steipete https://github.com/steipete https://steipete.com https://www.linkedin.com/in/steipete https://openclaw.ai https://github.com/openclaw/openclaw https://discord.gg/openclaw SPONSORS: Perplexity: AI-powered answer engine. https://perplexity.ai/ Quo: Phone system (calls, texts, contacts) for businesses. https://quo.com/lex CodeRabbit: AI-powered code reviews. https://coderabbit.ai/lex Fin: AI agent for customer service. https://fin.ai/lex Blitzy: AI agent for large enterprise codebases. https://blitzy.com/lex Shopify: Sell stuff online. https://shopify.com/lex LMNT: Zero-sugar electrolyte drink mix. https://drinkLMNT.com/lex OUTLINE: (00:00) – Introduction (03:51) – Sponsors, Comments, and Reflections (15:29) – OpenClaw origin story (18:48) – Mind-blowing moment (28:15) – Why OpenClaw went viral (32:12) – Self-modifying AI agent (36:57) – Name-change drama (54:07) – Moltbook saga (1:02:26) – OpenClaw security concerns (1:11:07) – How to code with AI agents (1:42:02) – Programming setup (1:48:45) – GPT Codex 5.3 vs Claude Opus 4.6 (1:57:52) – Best AI agent for programming (2:19:52) – Life story and career advice (2:23:49) – Money and happiness (2:27:41) – Acquisition offers from OpenAI and Meta (2:44:51) – How OpenClaw works (2:56:09) – AI slop (3:02:13) – AI agents will replace 80% of apps (3:10:50) – Will AI replace programmers? (3:22:50) – Future of OpenClaw community

打开播客

本周热点 Top 3

1
📰TechCrunch AI

📖 检索增强生成:让 AI 能访问外部知识库

Mirage raises $75M to continue building models for its AI video editing app Captions

Mirage, the maker of video editing app Captions, has raised $75 million in growth financing from General Catalyst's Customer Value Fund (CVF)....

2
📰TechCrunch AI

📰 行业动态:最新的 AI 发展趋势

Agile Robots becomes the latest robotics company to partner with Google DeepMind

Agile Robots will incorporate Google DeepMind's robotics foundation models into its bots while collecting data for the AI research lab....

3
📰TechCrunch AI

📰 行业动态:最新的 AI 发展趋势

Air Street becomes one of the largest solo VCs in Europe with $232M fund

London's Air Street Capital has raised a large Fund III with eyes locked on backing early-stage European and North American AI companies....

本周趋势词汇

videotraininglanguagereasoningsafety

本周 AI 核心洞察

本周AI领域呈现出多模态融合加速、AI安全日益受重视以及智能体生态蓬勃发展的显著趋势。多模态技术是本周的焦点,Mirage公司凭借其AI视频编辑应用Captions获得7500万美元融资。OpenAI的Sora模型强调了其在安全方面的努力。学术界也紧随其后,多篇论文探讨了多模态的深度融合。随着AI能力的增强,安全问题日益突出。OpenAI通过链式思维监控来检测和缓解内部编码智能体中的对齐风险。智能体概念在本周被广泛讨论并实现技术落地。GitHub热门项目如MoneyPrinterV2、TradingAgents和MiroFish都围绕智能体展开。

AI 新闻

Mirage raises $75M to continue building models for its AI video editing app Captions

来源:TechCrunch AI

Mirage, the maker of video editing app Captions, has raised $75 million in growth financing from General Catalyst's Customer Value Fund (CVF)....

#计算机视觉

Agile Robots becomes the latest robotics company to partner with Google DeepMind

来源:TechCrunch AI

Agile Robots will incorporate Google DeepMind's robotics foundation models into its bots while collecting data for the AI research lab....

#研究

Air Street becomes one of the largest solo VCs in Europe with $232M fund

来源:TechCrunch AI

London's Air Street Capital has raised a large Fund III with eyes locked on backing early-stage European and North American AI companies....

#强化学习

Creating with Sora Safely

来源:OpenAI Blog

To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we've built Sora 2 and the Sora app with safety at the foundation. Our approac...

#AI 安全

How we monitor internal coding agents for misalignment

来源:OpenAI Blog

How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and strengthen AI safety safeguards....

#强化学习#AI 安全

OpenAI to acquire Astral

来源:OpenAI Blog

Accelerates Codex growth to power the next generation of Python developer tools...

GitHub 趋势项目 (Python)

FujiwaraChoki/MoneyPrinterV2

Automate the process of making money online.

#开源

TauricResearch/TradingAgents

TradingAgents: Multi-Agents LLM Financial Trading Framework

#大语言模型#金融#开源#研究

666ghj/MiroFish

A Simple and Universal Swarm Intelligence Engine, Predicting Anything. 简洁通用的群体智能引擎,预测万物

#开源

unslothai/unsloth

Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.

#大语言模型#训练#开源

bytedance/deer-flow

An open-source SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tasks that could take minutes to hours.

#开源#研究

arXiv 最新 AI 论文

WorldCache: Content-Aware Caching for Accelerated Video World Models

作者:Umair Nawaz, Ahmed Heakl, Ufaq Khan, Abdelrahman Shaker, Salman Khan, Fahad Shahbaz Khan

Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature cachin...

#强化学习#训练#研究

End-to-End Training for Unified Tokenization and Latent Denoising

作者:Shivam Duggal, Xingjian Bai, Zongze Wu, Richard Zhang, Eli Shechtman, Antonio Torralba, Phillip Isola, William T. Freeman

Latent diffusion models (LDMs) enable high-fidelity synthesis by operating in learned latent spaces. However, training state-of-the-art LDMs requires complex staging: a tokenizer must be trained first...

#训练#研究

UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation

作者:Ziyi Wang, Xinshun Wang, Shuang Chen, Yang Cong, Mengyuan Liu

We present UniMotion, to our knowledge the first unified framework for simultaneous understanding and generation of human motion, natural language, and RGB images within a single architecture. Existin...

#计算机视觉#自然语言处理#研究

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

作者:Haichao Zhang, Yijiang Li, Shwai He, Tushar Nagarajan, Mingfei Chen, Jianglin Lu, Ang Li, Yun Fu

Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observatio...

#计算机视觉#强化学习#研究

3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing

作者:Haoyu Zhen, Xiaolong Li, Yilin Zhao, Han Zhang, Sifei Liu, Kaichun Mo, Chuang Gan, Subhashree Radhakrishnan

Large Language Models (LLMs) and Vision Language Models (VLMs) have shown impressive reasoning abilities, yet they struggle with spatial understanding and layout consistency when performing fine-grain...

#大语言模型#计算机视觉#研究

X (Twitter) 社区 AI 热点

Yann LeCun on AGI and World Models

LeCun emphasizes the importance of world models and self-supervised learning for achieving AGI, moving beyond pure scaling of language models.

#大语言模型#强化学习

Andrej Karpathy on LLM Training Optimization

Karpathy discusses optimization techniques for LLM training, focusing on efficiency and scalability improvements.

#大语言模型#训练

Community Discussion on AI Safety

Growing consensus in the AI community about the importance of safety and alignment as core research areas.

#AI 安全#研究

🚀 前沿发散洞察

为已站在最前沿的研究者提供发散性思考,包括研究交叉点、反向思考、技术瓶颈和社区趋势。

🔬 研究前沿交叉点

视频世界模型 + 强化学习

WorldCache等项目将视频理解与世界模型结合,为具身智能体提供实时环境预测能力

潜在影响: 极高

多模态对齐 + 链式思维监控

OpenAI的chain-of-thought monitoring将安全监控与多模态推理融合,实现更细粒度的对齐检测

潜在影响: 极高

机器人基础模型 + 自主学习

Google DeepMind与Agile Robots的合作展示了如何通过数据收集和模型整合推动具身智能体进化

潜在影响: 高

💭 反向思考

小模型可能优于大模型

当前业界聚焦大模型能力,但对于特定领域(如机器人控制、实时推理)小模型的高效性可能是关键

为什么被忽视: 大模型融资和关注度更高,掩盖了小模型的潜力

安全约束可能激发创新

OpenAI将安全机制内置于Sora设计中,而非事后补救,这种约束驱动设计可能催生新的架构范式

为什么被忽视: 安全通常被视为限制而非创新驱动力

数据收集比模型本身更稀缺

Google DeepMind与机器人公司的合作强调数据收集的价值,这可能比模型创新更成为竞争壁垒

为什么被忽视: 学术界更关注算法创新,而非数据基础设施

⚙️ 技术债务与瓶颈

实时视频推理延迟

当前视频世界模型仍难以实现低延迟实时推理,限制了其在机器人控制中的应用

可能的突破: 新的架构设计(如稀疏注意力)或硬件加速(如专用AI芯片)

多模态对齐的可解释性

链式思维监控虽然有效,但其决策过程仍然黑盒,难以被人类理解和干预

可能的突破: 可解释AI技术与多模态推理的深度融合

具身智能体的泛化能力

当前机器人模型在特定环境训练效果好,但跨环境泛化能力仍然有限

可能的突破: 元学习和自适应学习框架的应用

📱 X 社区深度趋势

从模型能力到智能体自主性的转变

社区讨论逐渐从'模型有多聪明'转向'智能体能做什么',反映了实用性关注的提升

未来方向: 智能体将成为AI应用的主要形式,而非单纯的模型

安全和伦理成为主流话题

Yann LeCun、Andrej Karpathy等大佬在X上频繁讨论AGI安全和对齐问题,表明这已成为行业共识

未来方向: 未来AI研究将把安全作为一等公民而非事后考虑

开源智能体框架的爆发

Deer-Flow、MoneyPrinterV2等开源项目获得大量关注,表明开发者社区对智能体框架的需求旺盛

未来方向: 智能体框架将成为下一代开发基础设施,类似于当年的深度学习框架