标签

 sora 

相关的文章:

Sora服务年内开放,意大利数据监管机构调查,Sora引发安全新挑战,Google发布Genie硬杠Sora,SORA参加一级方程式赛车。探索Sora的全面调查,了解Sora作为AGI世界模型的前世今生、技术内核及未来趋势。

六虎

六虎 -

免费AI音乐生成工具Udio来了:音乐界的Sora,Suno的最强对手!体验如何?

程序员X小鹿:AI 音乐黑马 Udio Suno v3 之后,又款 AI 音乐新星 Udio 闪亮登场。 Udio,使用户可以通过输入简单的文字描述,就可以创建一段音乐,然后将其转换为专业品质的曲目。

Udio是一款新的AI音乐工具,允许用户通过输入简单的文本描述来创作自己的音乐。目前处于测试阶段,用户每个月可以免费生成最多1200首歌曲。用户可以使用Google、Discord或Twitter账户在Udio网站上注册。该工具提供了各种功能,如建议标签、歌词选项以及扩展曲目和混音等高级功能。Udio有潜力影响非专业和专业音乐人,使他们能够创作高质量的音乐,并在创作过程中提供灵感和效率。

相关推荐 去reddit讨论
HyperAI超神经

HyperAI超神经 -

每周编辑精选|免费使用 Sora 平替、在线运行 Python 基础教程、MCFEND 中国假新闻检测数据集上线

该数据集为论文的开源数据及代码。角球往往是执行教练战术的绝佳机会,针对于此,谷歌 DeepMind 与利物浦足球俱乐部联合推出 TacticAI,通过使用几何深度学习方法,借助预测与生成模型,为专业人士提供角球战术层面的见解。该数据集包含 1,995 对问答样本,源自官方韩国考试与教科书,覆盖语言和文化两大类,细分为 11 个子类别,每个样本都提供了细粒度的注释,指明回答问题所需的文化和语言知识。该数据集在遥感科学领域具有广泛应用,特别是在地表监测、资源管理和环境变化评估等方面,提供了高精度的数据支持。

OpenAI发布了Sora文生视频模型,视频时长可达60秒。HyperAI提供了开源AI生成视频方案。公共数据集包括MCFEND、Fin-Eva、VidProM、FindingEmo、GPD、AlgoPuzzleVQA、UltraSafety、NAIP-S2和CLIcK。公共教程包括生成随机数字和使用PyTorch开发神经网络。社区文章包括Sora开源平台、中南大学的药物研究、DeepMind的TacticAI和AI相关词条。B站直播预告包括哈佛和MIT的课程。HyperAI是国内领先的人工智能及高性能计算社区。

相关推荐 去reddit讨论
HyperAI超神经

HyperAI超神经 -

在线教程 | 一键启动 Sora 开源平替,俘获 45 万 AI 开发者

时至今日,小到短视频平台的爆款视频,大到剧院内热映的大制作电影,仍然遵循着这一长链条的制作形式。其中,Stable Diffusion 是一种「潜在扩散模型」,它先通过编码器将原始高维度数据(如图像)映射至潜在空间,在此空间中进行扩散和去噪,后通过解码器将潜在空间中清理过的数据重构回高维度空间,相比于目前主流在 AI 视频生成领域的扩散模型,Stable Diffusion 引入了一个额外的编码-解码阶段,这使得它应用于高维数据(如图像)时,会在一个包含原始数据重要特征的低维潜在空间中执行,

1888年,爱迪生提交了名为「活动电影放映机」的专利,实现了连续播放静态照片的效果。视频制作经历了从无到有、从黑白到彩色、从模拟信号到数字信号的迭代。生成式AI的出现带来了视频制作的创新。OpenAI发布了Sora模型,可以通过接收文本指令生成长达一分钟的视频。多个开源的AI生成视频模型可供使用,包括Stable Diffusion、Prompt Travel和AnimateDiff。Stable Diffusion提高了模型的效率和生成质量,Prompt Travel调整文本指令以引导AI模型生成连贯且有变化的画面,AnimateDiff使模型具备生成多样化、个性化的文本驱动视频片段的能力。

相关推荐 去reddit讨论
DemoChen's Clip

DemoChen's Clip -

Sam Altman:OpenAI、GPT-5、Sora、Board Saga、Elon Musk、Ilya、Power & AGI

This is a summary of the Lex Fridman Podcast episode #419 featuring Sam Altman, CEO of OpenAI. 这是 Lex Fridman Podcast 第 419 期节目的摘要,主讲人是 OpenAI 首席执行官 Sam Altman。 Sam discusses the OpenAI board saga, Ilya Sutskever, Elon Musk lawsuit, Sora, GPT-4, memory & privacy, and AGI. 萨姆讨论了 OpenAI 董事会传奇、伊利亚-苏茨克沃(Ilya Sutskever)、埃隆-马斯克(Elon Musk)诉讼、索拉(Sora)、GPT-4、内存与隐私以及 AGI。 He also talks about his thoughts on aliens and the importance of compute in the future. 他还谈到了自己对外星人的看法以及计算在未来的重要性。 The episode also includes a discussion on the structure of OpenAI's board and the selection process for new board members. 本集还讨论了 OpenAI 董事会的结构和新董事会成员的遴选过程。 Sam also shares his thoughts on the power struggle in building AGI and the importance of resilience and preparation for challenges in the future. 萨姆还分享了他对建立 AGI 过程中权力斗争的看法,以及应变能力和准备迎接未来挑战的重要性。 I think compute is gonna be the currency of the future. I think it'll be maybe the most precious commodity in the world. I expect that by the end of this decade. And possibly somewhat sooner than that, we will have quite capable systems that we look at and say, wow, that's really remarkable. The road to AGI should be a giant power struggle. I expect that to be the case. 我认为计算将成为未来的货币。我认为它可能会成为世界上最珍贵的商品。我预计,到本十年末。我们将拥有相当强大的系统,我们看着它,会说,哇,太了不起了。通往人工智能的道路应该是一场巨大的权力斗争。我预计会是这样。 Whoever builds AGI first gets a lot of power. Do you trust yourself with that much power? The following is a conversation with Sam Altman, his second time in the podcast. He is the CEO of OpenAI, the company behind GPT-4, ChatGPT, Sora, and perhaps one day the very company that will build AGI. This is Lex Fridman Podcast. To support it, please check out our sponsors in the description. And now, dear friends, here's Sam Altman. 谁先建立 AGI,谁就能获得很大的权力。你相信自己有这么大的能力吗?以下是与萨姆-阿尔特曼(Sam Altman)的对话,这是他第二次参加播客。他是 OpenAI 的首席执行官,OpenAI 是 GPT-4、ChatGPT、Sora 的幕后公司,也许有一天,这家公司就会打造出 AGI。这里是 Lex Fridman 播客。如需支持,请查看说明中的赞助商。现在,亲爱的朋友们,有请山姆-奥特曼。 Take me through the OpenAI board saga that started on Thursday, November 16th, maybe Friday, November 17th for you. That was definitely the most painful professional experience of my life and chaotic, and shameful, and upsetting and a bunch of other negative things. There were great things about it too and I wish it had not been in such an adrenaline rush that I wasn't able to stop and appreciate them at the time. 带我回顾一下从 11 月 16 日星期四,也许是 11 月 17 日星期五开始的 OpenAI 董事会事件。那绝对是我一生中最痛苦的职业经历,混乱、可耻、令人不安,还有其他一些负面的东西。其中也有美好的一面,但愿当时我没有因为肾上腺素飙升而无法停下来欣赏它们。 I came across this old tweet of mine or this tweet of mine from that time period, which was it was like kind of going to your own eulogy, watching people say all these great things about you and just like unbelievable support from people I love and care about. That was really nice. 我发现了我的这条旧推特,或者说是那个时期我的这条推特,它就像是去听你自己的悼词,看着人们说着关于你的这些美好的事情,以及来自我爱和关心的人的难以置信的支持。这真的很好。 That whole weekend I kind of like felt with one big exception, I felt like a great deal of love and very little hate even though it felt like I have no idea what's happening and what's gonna happen here and this feels really bad. And there were definitely times I thought it was gonna be like one of the worst things to ever happen for AI safety. Well, I also think I'm happy that it happened relatively early. 整个周末,我都有一种感觉,除了一个很大的例外,我感觉到了大量的爱和很少的恨,尽管我感觉我不知道发生了什么,这里会发生什么,这感觉真的很糟糕。有好几次,我都觉得这对人工智能安全来说是最糟糕的事情之一。好吧,我也很高兴它发生得比较早。 I thought at some point between when OpenAI started and when we created AGI, there was gonna be something crazy and explosive that happened, but there may be more crazy and explosive things happen. It still I think helped us build up some resilience and be ready for more challenges in the future. But the thing you had a sense that you would experience is some kind of power struggle. 我认为,从 OpenAI 启动到我们创建 AGI 之间的某个时间点,一定会发生一些疯狂和爆炸性的事情,但可能还会有更多疯狂和爆炸性的事情发生。我认为这仍然帮助我们建立了一定的应变能力,并为未来更多的挑战做好了准备。但你感觉到你会经历的事情是某种权力斗争。 The road to AGI should be a giant power struggle. Like the world should. Well, not should. I expect that to be the case. And so you have to go through that, like you said, iterate as often as possible in figuring out how to have a board structure, how to have organization, how to have the kind of people that you're working with, how to communicate all that in order to deescalate the power struggle as much as possible, pacify it. 通往 AGI 的道路应该是一场巨大的权力斗争。就像世界应该如此。好吧,不是应该。我希望是这样所以你必须经历这一切,就像你说的,尽可能多地反复斟酌如何建立董事会结构,如何建立组织,如何拥有与你共事的那类人,如何沟通所有这些,以尽可能地缓和权力斗争,平息它。 But at this point, it feels like something that was in the past that was really unpleasant and really difficult and painful. But we're back to work and things are so busy and so intense that I don't spend a lot of time thinking about it. There was a time after. 但此时此刻,我觉得这就像是过去的事情,非常不愉快、非常困难和痛苦。但我们又回到了工作岗位上,事情是如此繁忙和紧张,以至于我没有花很多时间去想它。之后还有一段时间。 There was like this fugue state for kind of like the month after, maybe 45 days after that was I was just sort of like drifting through the days, I was so out of it. I was feeling so down-Just on a personal psychological level. Yeah. Really painful. And hard to have to keep running OpenAI in the middle of that. I just wanted to crawl into a cave and kind of recover for a while. 在那之后的一个月里,也许是 45 天里,我一直处于迷迷糊糊的状态。我感觉很低落,只是个人心理层面的。在个人心理层面真的很痛苦在这种情况下还要继续运行OpenAI,真的很不容易。我只想爬进一个山洞,恢复一段时间。 But now it's like we're just back to working on the mission. Well, it's still useful to go back there and reflect on board structures, on power dynamics, on how companies are run, the tension between research, and product development, and money and all this kind of stuff so that you have a very high potential of building AGI would do so in a slightly more organized, less dramatic way in the future. 但现在,我们好像又回到了完成任务的状态。好吧,回到那里,反思董事会结构、权力动态、公司运作方式、研究与产品开发之间的紧张关系、金钱和所有这些东西,还是很有帮助的,这样你就有很大可能在未来以一种稍微更有组织、不那么戏剧化的方式建造 AGI。 So there's value there to go both the personal psychological aspects of you as a leader and also just the board structure and all this kind of messy stuff. Definitely learned a lot about structure and incentives and what we need out of a board And I think that it is valuable that this happened now in some sense. 因此,作为领导者,在个人心理方面,以及董事会结构和所有这些乱七八糟的东西方面,都是有价值的。我肯定学到了很多关于结构和激励机制的知识,以及我们对董事会的需求。 I think this is probably not like the last high stress moment of OpenAI, but it was quite a high stress moment.company very nearly got destroyed. And we think a lot about many of the other things we've gotta get right for AGI. But thinking about how to build a resilient org and how to build a structure that will stand up to a lot of pressure the world, which I expect more and more as we get closer. 我认为,这可能不是 OpenAI 最后的高压力时刻,但也是相当高的压力时刻。我们想了很多关于 AGI 的其他事情。但我们也在思考如何建立一个有弹性的组织,如何建立一个能够承受世界巨大压力的结构,随着时间的推移,我预计压力会越来越大。 I think that's super important. Do you have a sense of how deep and rigorous the deliberation process by the board was? Can you shine some light on just human dynamics involved in situations like this? Was it just a few conversations and all of a sudden it escalates and why don't we fire Sam kind of thing. I think the board members were far, well-meaning people on the whole. 我认为这一点非常重要。你了解董事会的审议过程有多深入和严谨吗?你能不能透露一下在这种情况下所涉及的人际关系?是否只是几次谈话,然后突然就升级了,我们为什么不解雇山姆之类的事情。我认为董事会成员整体上都是心地善良的人。 And I believe that in stressful situations where people feel time pressure or whatever, people understandably make suboptimal decisions. And I think one of the challenges for OpenAI will be we're gonna have to have a board and a team that are good at operating under pressure. Do you think the board had too much power. 我相信,在人们感到时间压力或其他压力的情况下,做出次优决策是可以理解的。我认为,OpenAI 面临的挑战之一是,我们必须拥有一个善于在压力下工作的董事会和团队。你是否认为董事会权力过大? I think boards are supposed to have a lot of power, but one of the things that we did see is in most corporate structures, boards are usually answerable to shareholders. Sometimes people have like super voting shares or whatever. 我认为董事会应该拥有很大的权力,但我们确实看到,在大多数公司结构中,董事会通常要对股东负责。有时,人们拥有超级投票权的股份或其他东西。 In this case, I think one of the things with our structure that we maybe should have thought about more than we did is that the board of a nonprofit has, unless you put other rules in place, like quite a lot of power, they don't really answer to anyone but themselves. 在这种情况下,我认为我们应该对我们的结构多加考虑的一点是,非营利组织的董事会,除非你制定了其他规则,比如拥有相当大的权力,否则他们除了对自己负责,并不真正对任何人负责。 And there's ways in which that's good, but what we'd really like is for the board of OpenAI to answer to the world as a whole as much as that's a practical thing. So there's a new board announced. Yeah. There's, I guess, a new smaller board of first and now there's a new final board. Not a final board yet. We've added some, we'll add more. Added some, okay. 这在某些方面是好的,但我们真正希望的是,OpenAI 的董事会能够对整个世界负责,因为这是一件很实际的事情。所以新的董事会已经宣布了。是啊我猜,先是一个新的小型董事会,现在又有了一个新的最终董事会。还不是最终董事会我们已经添加了一些,我们会添加更多。增加了一些,好的 What is fixed in the new one that was perhaps broken in the previous one. The old board sort of got smaller over the course of about a year. It was nine and then it went down to six and then we couldn't agree on who to add. 新电路板修复了什么,而以前的电路板可能坏了什么。旧的董事会在大约一年的时间里逐渐缩小。原来有九个人,后来减少到六个,再后来,我们无法就增加谁达成一致意见。 And the board also, I think, didn't have a lot of experienced board members and a lot of the new board members at OpenAI have just have more experience as board members. I think that'll help. It's been criticized some of the people that are added to the board. I heard a lot of people criticizing the addition of Larry Summers, for example. What was the process of selecting the board? What's involved in that. 我认为,董事会也没有很多经验丰富的董事会成员,而 OpenAI 的很多新董事会成员都拥有更多的董事会成员经验。我认为这将有所帮助。一些新加入董事会的人受到了批评。例如,我听到很多人批评拉里-萨默斯(Larry Summers)的加入。董事会的遴选过程是怎样的?其中涉及哪些方面。 So Bret and Larry were kind of decided in the heat of the moment over this very tense weekend and that was. I mean, that weekend was like a real rollercoaster, like a lot of lots and downs. And we were trying to agree on new board members that both sort of the executive team here and the old board members felt would be reasonable. Larry was actually one of their suggestions, the old board members. 布瑞特和拉里在这个紧张的周末 决定了他们的关系我的意思是,那个周末就像坐过山车一样 经历了很多波折我们试图在新的董事会成员上达成一致,让这里的执行团队和老董事会成员都觉得是合理的。拉里实际上是他们的建议之一,老董事会成员。 Bret, previous to that weekend, suggested, but he was busy and didn't wanna do it. And then we really needed help in wood. We talked about a lot of other people too, but I felt like if I was going to come back, I needed new board members. 在那个周末之前,布雷特曾建议过,但他很忙,不想做。然后我们真的需要木材方面的帮助。我们也谈了很多其他人,但我觉得如果我要回来,就需要新的董事会成员。 I didn't think I could work with the old board again in the same configuration, although we then decided, and I'm grateful that Adam would stay, but we wanted to get to. We considered various configurations, decided we wanted to get to a board of three and had to find two new board members over the course of sort of a short period of time. So those were decided honestly without. 我认为我无法以同样的配置再次与旧董事会合作,虽然我们后来决定,我很感激亚当会留下来,但我们想达到这个目标。我们考虑了各种配置,决定成立一个三人董事会,并在短时间内找到了两名新的董事会成员。所以,这些都是诚实地决定没有。 That's like you kind of do that on the battlefield. You don't have time to design a rigorous process then. For new board members, since new board members will add going forward, we have some criteria that we think are important for the board to have different expertise that we want the board to have. 这就像你在战场上做的那样。你没有时间去设计一个严格的程序。对于新的董事会成员,由于新的董事会成员会不断增加,我们有一些标准,我们认为这些标准对于董事会来说非常重要,我们希望董事会拥有不同的专业知识。 Unlike hiring an executive where you need them to do one role, well, the board needs to do a whole role of kind of governance and thoughtfulness. Well, and so one thing that Bret says, which I really like is that we wanna hire board members in slates, not as individuals one at a time. And thinking about a group of people that will bring nonprofit expertise, expertise at running companies, sort of good legal and governance expertise. 与聘用高管不同的是,你需要他们扮演一个角色,而董事会则需要扮演一个整体的角色,即治理和深思熟虑。布雷特说的一句话我非常喜欢,那就是我们要以董事会成员的形式来聘用,而不是一个一个单独聘用。我们要考虑的是,让一群人带来非营利组织的专业知识、管理公司的专业知识、良好的法律和管理专业知识。 That's kind of what we've tried to optimize for. So is technical savvy important for the individual board members. Not for every board member, but certainly some you need that. That's part of what the board needs to do. So I mean, the interesting thing that people probably don't understand about OpenAI certainly is like all the details of running the business. 这正是我们努力优化的方向。那么,技术知识对董事会成员个人来说是否重要呢?不是每个董事会成员都需要技术知识,但有些人肯定需要。这是董事会需要做的一部分。所以,我的意思是,人们可能不了解 OpenAI 的有趣之处,当然就是经营业务的所有细节。 When they think about the board given the drama, they think about you, they think about like if you reach AGI or you reach some of these incredibly impactful products and you build them and deploy them, what's the conversation with the board like? And they kind of think, all right, what's the right squad to have in that kind of situation to deliberate. 当他们考虑到董事会的戏剧性时,他们会想到你,他们会想到如果你达到了 AGI,或者你达到了这些令人难以置信的有影响力的产品,你建立了它们并部署了它们,那么与董事会的对话会是怎样的?他们就会想,好吧,在这种情况下,应该有什么样的团队来商议呢? Look, I think you definitely need some technical experts there and then you need some people who are like, how can we deploy this in a way that will help people in the world the most and people who have a very different perspective? I think a mistake that you or I might make is to think that only the technical understanding matters. And that's definitely part of the conversation you want that board to have. 听着,我认为你肯定需要一些技术专家,然后你还需要一些人,比如,我们怎样才能以一种对世界上的人们帮助最大的方式来部署它,以及那些有着截然不同观点的人?我认为,你我可能会犯的一个错误是,认为只有技术理解才是最重要的。而这绝对是你希望董事会进行的对话的一部分。 But there's a lot more about how that's gonna just like impact society and people's lives that you really want represented in there too. Are you looking at the track record of people or you're just having conversations. Track record's a big deal. You, of course, have a lot of conversations. There's some roles where I kind of totally ignore track record and just look at slope, kind of ignore the y-intercept. Thank you. 但还有更多关于如何影响社会和人们生活的内容,你也希望在其中有所体现。你是在看别人的业绩记录,还是只是在聊天?业绩是个大问题。当然,你会有很多对话。有些职位我完全不看业绩记录,只看斜率,不看 Y-截距。谢谢。 Thank you for making it mathematical for the audience,-For a board member, I do care much more about the y-intercept. I think there is something deep to say about track record there and experiences, something's very hard to replace. Do you try to fit a polynomial function or exponential one to track record. That's not that. An analogy doesn't carry that far. All right. You mentioned some of the low points that weekend. 谢谢你为听众提供了数学知识,--作为董事会成员,我确实更关心 Y-截距。我认为,关于业绩和经验,有些东西是很难替代的。你是想用多项式函数还是指数函数来拟合业绩记录呢?不是这样的。类比没有那么远。好吧你提到了那个周末的一些低谷 What were some of the low points psychologically for you? Did you consider going to the Amazon jungle and just taking Ayahuasca disappearing forever or. I mean, there's so many low, like it was a very bad period of time. There were great high points too. My phone was just like sort of nonstop blowing up with nice messages from people I worked with every day, people I hadn't talked to in a decade. 你心理上的低谷是什么?你是否考虑过去亚马逊丛林服用死藤水,然后永远消失?我是说,有很多低谷,比如那是一段非常糟糕的时期。但也有高潮。我的手机不停地收到每天一起工作的人发来的祝福短信 我已经有十年没联系过他们了 I didn't get to appreciate that as much as I should have. 'cause I was just like in the middle of this firefight, but that was really nice. But on the whole, it was like a very painful weekend and also just like a very. It was like a battle fought in public to a surprising degree and that was extremely exhausting to me, much more than I expected. 我没能好好欣赏这一切因为我当时正处于交火中 但那真的很好但总的来说,这是一个非常痛苦的周末,也是一个非常。这就像一场在公众面前进行的战斗,程度令人惊讶,这让我非常疲惫,远远超出了我的预期。 I think fights are generally exhausting, but this one really was. The board did this Friday afternoon. I really couldn't get much in the way of answers, but I also was just like, "Well, the board gets to do this." And so I'm gonna think for a little bit about what I want to do, but I'll try to find the, the blessing in disguise here. 我觉得打架一般都很累,但这次真的很累。周五下午董事会做了这个我真的没有得到什么答案 但我也想,"好吧,董事会可以这么做"所以我还得考虑一下我想做的事 但我会努力找到变相的祝福 And I was like, "Well, my current job at OpenAI, it was like to like run a decently-sized company at this point." And the thing I'd always liked the most was just getting to work with the researchers. And I was like, yeah, I can just go do like a very focused AI research effort. And I got excited about. That didn't even occur to me at the time to like possibly that this was all gonna get undone. 我当时就想,"我现在在OpenAI的工作,就像是在经营一家规模不小的公司"。而我最喜欢的事情就是和研究人员一起工作。我当时想,是啊,我可以去做一个非常专注的人工智能研究工作。我很兴奋当时我甚至没有想到 这一切都会被推翻 This was like Friday afternoon. Oh, so you've accepted the death. Very quickly, very quickly. I mean, I went through like a little period of confusion and rage, but very quickly. And by Friday night, I was talking to people about what was gonna be next and I was excited about that. 好像是周五下午这么说你已经接受了死亡很快,非常快我是说,我经历了一段时间的困惑和愤怒 但很快就过去了到了周五晚上,我开始和别人讨论 接下来会发生什么,我对此很兴奋 I think it was Friday night evening for the first time that I heard from the exec team here, which is like, hey, we're gonna like fight this and we think. Well, whatever. And then I went to bed just still being like, okay, excited. Like onward, were you able to sleep. Not a lot. 我想那是周五晚上 我第一次听到这里的执行团队说 嘿,我们要去打这场仗,我们认为。随便吧然后我就上床睡觉了,仍然很兴奋。那你睡得着吗?没怎么睡 It was one of the weird things was there was this like period of four and a half days where sort of didn't sleep much, didn't eat much and still kind of had like a surprising amount of energy. You learn like a weird thing about adrenaline and more time. So you kind of accepted the death of this baby OpenAI. And I was excited for the new thing. I was just like, okay, this was crazy, but whatever. 其中一个奇怪的现象是,在四天半的时间里,我没怎么睡觉,也没怎么吃东西,但仍然精力充沛。你学会了肾上腺素和更多时间的怪事。所以,你算是接受了 OpenAI 这个婴儿的死亡。我对新事物感到兴奋我只是想,好吧,这太疯狂了,但不管怎样。 It's a very good coping mechanism. And then Saturday morning, two of the board members called and said, "Hey, we destabilize. We didn't mean to destabilize things. We don't restore a lot of value here. Can we talk about you coming back?" And I immediately didn't wanna do that, but I thought a little more and I was like, "Well, I really care about the people here, the partners, shareholders. 这是一个很好的应对机制。周六早上,两位董事会成员打来电话说:"嘿,我们破坏了稳定。我们不是故意破坏稳定的。我们并没有在这里恢复很多价值。我们能谈谈你回来的事吗?"我立刻就不想这么做了,但我又想了一下,我说:"我真的很关心这里的人,这里的合伙人,这里的股东。 I love this company." And so I thought about it and I was like, "Well, okay, but here's the stuff I would need." And then the most painful time of all over the course of that weekend, I kept thinking and being told. Not just me, like the whole team here kept thinking. Well, we were trying to keep OpenAI stabilized while the whole world was trying to break it apart, people trying to recruit, whatever. 我喜欢这家公司我想了想说 "好吧 但这是我需要的东西"在那个周末最痛苦的时候 我一直在想,也一直被告知不只是我,整个团队都在思考。我们在努力让OpenAI保持稳定的同时,整个世界都在试图拆散它,人们试图招募员工,等等。 We kept being told like, "All right, we're almost done, we're almost done. We just need like a little bit more time." And it was this like very confusing state. 我们一直被告知,"好了,我们快完成了,快完成了我们只是还需要一点时间"这让我们很困惑 And then Sunday evening when again like every few hours, I expected that we were gonna be done and we're gonna figure out a way for me to return and things to go back to how they were, the board then appointed a new interim CEO and then I was like. I mean, that feels really bad. That was the low point of the whole thing. 然后周日晚上,每隔几个小时 我都在想,我们应该结束了 我们应该想个办法让我回去 一切都恢复原样 董事会任命了新的临时首席执行官我的意思是,这感觉真的很糟糕。那是整件事的最低点 You know, I'll tell you something, it felt very painful, but I felt a lot of love that whole weekend. It was not other than that one moment, Sunday night, I would not characterize my emotions as anger or hate, but I really just like. I felt a lot of love from people towards people. It was like painful, but it was like the dominant emotion of the weekend was love, not hate. 我告诉你,那感觉非常痛苦,但我在整个周末都感受到了爱。除了周日晚上那一刻,我不会把我的情绪描述为愤怒或仇恨,我真的只是喜欢。我感受到了人们对人们的爱。虽然很痛苦,但周末的主要情绪是爱,而不是恨。 You've spoken highly of Mira Murati that she helped, especially as you put in a tweet, "In the quiet moments when it counts, perhaps we could take a bit of a tangent." What do you admire about Mira. Well, she did a great job during that weekend in a lot of chaos, but people often see leaders in the crisis moments, good or bad. 你对米拉-穆拉提的评价很高,认为她帮了大忙,尤其是你在推特上写道:"在关键时刻,也许我们可以来点切入点"。你欣赏米拉哪一点?在那个混乱的周末,她做得很好,但人们经常在危机时刻看到领导者,无论好坏。 But a thing I really value in leaders is how people act on a boring Tuesday at 9:46 in the morning and in just sort of the normal drudgery of the day-to-day, how someone shows up in a meeting, the quality of the decisions they make. That was what I meant about the quiet moments. 但是,我非常看重领导者的一点是,在一个无聊的周二早上 9:46 的时候,人们是如何行动的,在日常的琐碎工作中,人们是如何出席会议的,他们所做决定的质量如何。这就是我所说的安静时刻。 Meaning like most of the work is done on a day by day in a meeting by meeting, just be present and make great decisions. Yeah. I mean, look, what you have wanted to spend the last 20 minutes about and I understand is like this one very dramatic weekend. But that's not really what OpenAI is about. OpenAI is really about the other seven years. 我的意思是,大部分工作都是在一天天的会议中完成的,只要在场就能做出正确的决定。我的意思是,听着,你想用过去的 20 分钟来讨论的,我明白,就是这个非常戏剧性的周末。但这并不是OpenAI的真正意义所在。OpenAI的真正意义在于其他七年。 Well, yeah, human civilization is not about the invasion of the Soviet Union by Nazi Germany, but still that's something people totally focus on. Very understandable. It gives us an insight into human nature, the extremes of human nature, and perhaps some of the damage and some of the triumphs of human civilization can happen in those moments. So it's like illustrative. 是的,人类文明与纳粹德国入侵苏联无关,但这仍然是人们关注的焦点。这是可以理解的。它让我们了解了人性,了解了人性的极端,也许人类文明的某些破坏和某些胜利就发生在那些时刻。因此,这很有说明性。 Let me ask you about Ilya. Is he being held hostage in a secret nuclear facility. No. 让我问你关于伊利亚的事他被挟持到秘密核设施里了吗 Is he being held hostage in a secret nuclear facility.没有 No. What about a regular secret facility. No. What about a nuclear non-secure facility. Neither, not that either. I mean, this is becoming a meme at some point. You've known Ilya for a long time. He was obviously part of this drama with the board and all that kind of stuff. What's your relationship with him now. I love Ilya. I have tremendous respect for Ilya. I don't have anything I can say about his plans right now. 那普通的秘密设施呢?不,那非安全核设施呢?也不是我的意思是,这将成为一个备忘录 在某些时候。你认识伊利亚很久了 You've known Ilya for a long time.很明显他也参与了董事会之类的事你现在和他是什么关系我爱伊利亚。我非常尊重伊利亚。关于他现在的计划,我没什么好说的。 That's a question for him. But I really hope we work together for certainly the rest of my career. He's a little bit younger than me, maybe he works a little bit longer. There's a meme that he saw something, like he maybe saw AGI and that gave him a lot of worry internally. What did Ilya see. Ilya has not seen AGI, none of us have seen AGI. We've not built AGII. 这是他的问题。但我真的希望我们能在我职业生涯的余下时间里一起工作。他比我年轻一点,也许他工作的时间更长一点。有一种说法是他看到了什么,比如他可能看到了AGI,这让他内心非常担忧。伊利亚看到了什么?伊利亚没有看到 AGI,我们都没有看到 AGI。我们也没有造出AGII I do think one of the many things that I really love about Ilya is he takes AGI and the safety concerns broadly speaking, including things like the impact this is gonna have on society very seriously. 我认为伊利亚让我喜欢的一点是,他非常认真地对待 AGI 和广义上的安全问题,包括这将对社会产生的影响等问题。 And as we continue to make significant progress, Ilya is one of the people that I've spent the most time over the last couple of years talking about what this is going to mean, what we need to do to ensure we get it right to ensure that we succeed at the mission. So Ilya did not see AGI. 在我们不断取得重大进展的过程中,伊利亚是我在过去几年里花时间最多的人之一,他一直在和我讨论这将意味着什么,我们需要做什么来确保我们能够正确地完成使命。所以,伊利亚没有看到 AGI。 But Ilya is a credit to humanity in terms of how much he thinks and worries about making sure we get this right. I've had a bunch of conversation with him in the past. I think when he talks about technology, he's always like doing this long-term thinking type of thing. So he is not thinking about what this is gonna be in a year. He's thinking about in 10 years. Yeah. 但伊利亚是人类的功臣,他为确保我们把事情办好,想了很多,也操了很多心。过去我和他有过多次交谈。我认为,当他谈论技术时,他总是喜欢做这种长期思考的事情。所以他不会考虑一年后会怎样。他考虑的是十年后是啊 Just thinking from first principles like, okay, if the scales, what are the fundamentals here? Where's this going? And so that's a foundation for them thinking about like all the other safety concerns and all that kind of stuff, which makes him a really fascinating human to talk with. Do you have any idea why he's been kind of quiet? Is it he's just doing some soul searching. Again, I don't wanna speak for Ilya. 只是从第一原则出发,比如,好吧,如果天平,这里的基本原理是什么?这是要去哪里?这就是他们思考其他所有安全问题的基础 这也让他成为了一个非常迷人的人你知道他为什么不说话吗?他是不是在反省自己?我不想为伊利亚说话 I think that you should ask him that. He's definitely a thoughtful guy. I think I kind of think of Ilya as like always on a soul search in a really good way. Yes. Yeah. Also he appreciates the power of silence. Also, I'm told he can be a silly guy, which I've never seen that side of him. It's very sweet when that happens. I've never witnessed a silly Ilya, but I look forward to that as well. 我觉得你应该问他这个问题。他绝对是个有思想的人我觉得伊利亚就像一直在寻找灵魂一样 很好的方式是啊 - Yes.是啊 Yeah.他还懂得沉默的力量我还听说他是个很傻的人 我从没见过他傻的一面这样的他很可爱我从来没见过傻乎乎的伊利亚 但我也很期待他这样 I was at a dinner party with him recently and he was playing with a puppy. And he was like in a very silly move, very endearing and I was thinking like, oh man, this is like not the side of the Ilya that the world sees the most. So just to wrap up this whole saga, are you feeling good about the board structure about all of this and where it's moving. 最近我和他一起参加一个晚宴,他在和一只小狗玩耍。他的举动很傻,很可爱,我当时就在想,天哪,这不是外界看到最多的伊利亚的一面。最后,我想问一下,你对董事会的结构和未来的发展感觉如何? I feel great about the new board. In terms of the structure of OpenAI, one of the board's tasks is to look at that and see where we can make it more robust. We wanted to get new board members in place first, but we clearly learned a lesson about structure throughout this process. I don't have I think super deep things to say. It was a crazy, very painful experience. 我对新的董事会感觉很好。就 OpenAI 的结构而言,董事会的任务之一就是审视这一结构,看看我们能在哪些方面使其更加稳健。我们希望先让新的董事会成员到位,但在整个过程中,我们显然学到了关于结构的一课。我想我没什么好说的。这是一次疯狂而痛苦的经历。 I think it was like a perfect storm of weirdness. It was like a preview for me of what's gonna happen as the stakes get higher and higher and the need that we have like robust governance structures and processes and people. I am kind of happy it happened when it did, but it was a shockingly painful thing to go through. Did it make you be more hesitant in trusting people. Yes. Just on a personal level. Yes. 我觉得这就像是一场怪异的完美风暴。对我来说,这就像是一场预演,预示着随着赌注越来越大,我们需要有强大的治理结构、流程和人员,将会发生什么。我很高兴事情就这么发生了,但经历这件事真是令人震惊的痛苦。这是否让你在信任别人时更加犹豫不决?是的。只是在个人层面上。是的。 I think I'm like an extremely trusting person. I've always had a life philosophy of like don't worry about all of the paranoia, don't worry about the edge cases. You get a little bit screwed in exchange for getting to live with your guard down. And this was so shocking to me. I was so caught off guard that it has definitely changed and I really don't like this. 我觉得我是一个非常容易相信别人的人。我的人生哲学一直是,不要担心所有的偏执狂,不要担心边缘案例。你得到了一点点拧 交换得到生活 与你的后卫下来。这让我很震惊我猝不及防,这一切都改变了,我真的不喜欢这样。 It's definitely changed how I think about just like default trust of people and planning for the bad scenarios. You gotta be careful with that. Are you worried about becoming a little too cynical. I'm not worried about becoming too cynical. I think I'm like the extreme opposite of a cynical person. But I'm worried about just becoming like less of a default trusting person. 这无疑改变了我的想法,比如默认信任他人,以及为坏的情况做打算。这一点你得小心。你会担心自己变得过于愤世嫉俗吗?我不担心变得太愤世嫉俗。我觉得我就像是愤世嫉俗者的极端反面。但我担心自己会变得不那么信任别人 I'm actually not sure which mode is best to operate in for a person who's developing AGI, trusting or untrusting. It's an interesting journey you're on. But in terms of structure, see, I'm more interested on the human level. How do you surround yourself with humans that are building cool shit, but also are making wise decisions? Because the more money you start making, the more power the thing has the weirder people get. 实际上,我也不确定对于一个正在开发 AGI 的人来说,哪种操作模式是最好的,信任还是不信任。这是你正在经历的一段有趣的旅程。但就结构而言,我对人类层面更感兴趣。你如何让自己身边的人既能创造出酷炫的东西 又能做出明智的决定?因为你赚的钱越多 权力越大 人们就越古怪 I think you could make all kinds of comments about the board members and the level of trust I should have had there or how I should have done things differently. But in terms of the team here, I think you'd have to like give me a very good grade on that one. And I have just like enormous gratitude and trust and respect for the people that I work with every day. 我想你可以对董事会成员和我本应得到的信任程度或我本应如何以不同的方式做事发表各种评论。但就这里的团队而言,我想你必须给我打一个非常好的分数。我对每天与我共事的人充满感激、信任和尊重。 And I think being surrounded with people like that is really important. 我认为,身边有这样的人真的很重要。 Our mutual friend Elon sued OpenAI. What is the essence of what he's criticizing? To what degree does he have a point? To what degree is he wrong. I don't know what it's really about. We started off just thinking we were gonna be a research lab and having no idea about how this technology was gonna go. 我们共同的朋友埃隆起诉了 OpenAI。他批评的实质是什么?他在多大程度上有道理?他错到什么程度?我不知道这到底是怎么回事。我们一开始只是想做一个研究实验室 对这项技术的发展一无所知 Because it was only seven or eight years ago, it's hard to go back and really remember what it was like then. But before language models were a big deal, this was before we had any idea about an API or selling access to a chat bot. It was before we had any idea we were gonna productize at all. 因为那只是七八年前的事了,我们很难回想起来当时的情景。但在语言模型大行其道之前,我们对 API 或出售聊天机器人权限还没有任何概念。那时候,我们根本不知道自己要产品化。 So we're like we're just gonna try to do research and we don't really know what we're gonna do with that. I think with many new fundamentally new things, you start fumbling through the dark and you make some assumptions, most of which turn out to be wrong. And then it became clear that we were going to need to do different things and also have huge amounts more capital. 所以我们只是想做一些研究,但并不知道要做什么。我认为,对于许多全新的、根本性的事物来说,你开始在黑暗中摸索,你会做出一些假设,而这些假设大部分都是错误的。然后很明显,我们需要做不同的事情,还需要更多的资金。 So we said, "Okay, well, the structure doesn't quite work for that. How do we patch the structure?" And then you patch it again and patch it again and you end up with something that does look kind of eyebrow raising to say the least. 于是我们说:"好吧,这个结构并不完全适用。我们该怎么修补结构呢?""然后,你再修补,再修补,最后得到的东西至少看起来有点让人瞠目结舌。 But we got here gradually with I think reasonable decisions at each point along the way and doesn't mean I wouldn't do it totally differently if we could go back now with an oracle, but you don't get the oracle at the time. But anyway, in terms of what Elon's real motivations here are I don't know. To the degree you remember, what was the response that OpenAI gave in the blog post? Can you summarize it. 但我们是逐步走到这一步的,我认为沿途的每一个点都做出了合理的决定,这并不意味着如果我们现在能回到过去,我不会做出完全不同的决定,但你当时并没有得到神谕。但无论如何,我不知道埃隆的真正动机是什么。在你还记得的范围内,OpenAI 在博文中给出的回应是什么?你能总结一下吗? Oh, we just said like Elon said this set of things, here's our characterization or here's this sort of not our characterization, here's like the characterization of how this went down. We tried to not make it emotional and just sort of say like here's the history. I do think there's a degree of mischaracterization from Elon here about one of the points you just made, which is the degree of uncertainty you had at the time. 我们只是说,埃隆说了这一系列事情,这是我们的描述,或者说这不是我们的描述,这是事情经过的描述。我们尽量避免情绪化,只说历史是这样的。我认为埃隆对你刚才提到的其中一点有一定程度的误解,那就是你当时的不确定程度。 You guys are a bunch of like a small group of researchers crazily talking about AGI when everybody's laughing at that thought. Wasn't that long ago Elon was crazily talking about launching rockets when people were laughing at that thought? So I think he'd have more empathy for this. I mean, I do think that there's personal stuff here that there was a split that OpenAI and a lot of amazing people here chose to part ways of Elon. 你们这帮人就像一小群研究人员,当所有人都在嘲笑AGI的时候,你们却在疯狂地谈论AGI。不久前,埃隆还在疯狂地谈论发射火箭,当时人们还在嘲笑这个想法呢。所以我觉得他会对此有更多的共鸣。我的意思是,我确实认为这里面有个人因素,OpenAI和这里很多了不起的人选择了与埃隆分道扬镳。 So there's a personal. Elon chose to part ways. Can you describe that exactly, the choosing to part ways. He thought OpenAI was gonna fail. He wanted total control to sort of turn it around. We wanted to keep going in the direction that now has become OpenAI. He also wanted Tesla to be able to build an AGI effort. 这就是私人恩怨。埃隆选择了分道扬镳。你能具体描述一下他选择分道扬镳的原因吗?他认为OpenAI会失败他想完全掌控局面,扭转乾坤。我们想继续朝着现在的方向发展,现在已经成为了OpenAI。他也希望特斯拉能够建立一个AGI。 At various times, he wanted to make OpenAI into a for-profit company that he could have control of or have it merged with Tesla. We didn't want to do that and he decided to leave, which that's fine. And that's one of the things that the blog post says is that he wanted OpenAI to be basically acquired by Tesla in those same way that or maybe something similar or maybe something more dramatic than the partnership with Microsoft. 在不同时期,他都想把 OpenAI 变成一家营利性公司,由他来控制,或者与特斯拉合并。我们不想这么做,于是他决定离开,这很好。博文中提到的其中一件事是,他希望 OpenAI 能以同样的方式被特斯拉收购,或许是类似的方式,或许是比与微软合作更戏剧化的方式。 My memory is the proposal was just like, yeah, like get acquired by Tesla and have Tesla have full control over it. I'm pretty sure that's what it was. So what is the word open in OpenAI mean to Elon at the time? Ilya has talked about this in the email exchanges and all this kind of stuff. What does it mean to you at the time? What does it mean to you now. 在我的记忆中,当时的提议就像是被特斯拉收购,然后由特斯拉全权控制。我很确定就是这样。对埃隆来说,OpenAI 中的开放一词在当时意味着什么?伊利亚曾在邮件交流中谈到过这一点,以及所有诸如此类的东西。当时对你意味着什么?现在对你又意味着什么。 I would definitely pick a diff. Speaking of going back with an oracle, I'd pick a different name. One of the things that I think OpenAI is doing that is the most important of everything that we're doing is putting powerful technology in the hands of people for free as a public good. We don't run ads on our free version. We don't monetize it in other ways. We just say it's part of our mission. 我肯定会选一个不同的。说到用甲骨文回去,我会取一个不同的名字。我认为,OpenAI 所做的一切中最重要的一件事,就是将强大的技术作为公益事业免费提供给人们。我们的免费版本上没有广告。我们不以其他方式盈利。我们只是说,这是我们使命的一部分。 We wanna put increasingly powerful tools in the hands of people for free and get them to use them. And I think that kind of open is really important to our mission. I think if you give people great tools and teach them to use them or don't even teach them, they'll figure it out and let them go build an incredible future for each other with that. That's a big deal. 我们希望把越来越强大的工具免费提供给人们,让他们使用。我认为这种开放性对我们的使命非常重要。我认为,如果你给人们提供强大的工具,并教他们如何使用,或者根本不教他们,他们就会自己摸索出来,然后用这些工具为彼此创造一个不可思议的未来。这是一件大事。 So if we can keep putting free or low cost or free and low cost powerful AI tools out in the world, I think that's a huge deal for how we fulfill the mission. Open source or not, yeah, I think we should open source some stuff and not other stuff. It does become this like religious battle line where nuance is hard to have, but I think nuance is the right answer. 因此,如果我们能继续向世界提供免费或低成本的强大人工智能工具,我认为这对我们完成使命意义重大。开源与否,是的,我认为我们应该开源一些东西,而不是其他东西。这就像宗教战线,很难有细微差别,但我认为细微差别才是正确答案。 So he said change your name to ClosedAI and I'll drop the lawsuit. I mean, is it going to become this battleground in the land of memes about the name. I think that speaks to the seriousness with which Elon means the lawsuit. I mean, that's like an astonishing thing to say, I think. Well, I don't think the lawsuit maybe, correct me if I'm wrong, but I don't think the lawsuit is legally serious. 所以他说把你的名字改成ClosedAI 我就撤诉我的意思是,这是否会成为关于名字的备忘录的战场。我认为这说明了埃隆对这起诉讼的重视程度。我觉得这话说得太惊人了如果我说错了,请纠正我,我不认为这起诉讼在法律上是严肃的。 It's more to make a point about the future of AGI and the company that's currently leading the way. Look, I mean Grok had not open sourced anything until people pointed out it was a little bit hypocritical and then he announced that Grok open source things this week. I don't think open source versus not is what this is really about for him. Well, we'll talk about open source and not. 这更多是为了说明 AGI 的未来,以及这家公司目前的领先地位。听着,我的意思是,Grok一直没有开源任何东西,直到人们指出这有点虚伪,他才在本周宣布Grok开源。我不认为开源与否是他的真正目的。好吧,我们来谈谈开源和不开源。 I do think maybe criticizing the competition is great, just talking a little shit, that's great, but friendly competition versus like I personally hate lawsuits. Look, I think this whole thing is like unbecoming of a builder, and I respect Elon is one of the great builders of our time. And I know he knows what it's like to have like haters attack him and it makes me extra sad he's doing the toss. 我确实认为批评竞争对手是件好事,说点屁话也是好事,但友好竞争与我个人讨厌的诉讼相比,那就不一样了。听着,我认为这整件事不符合一个建筑商的身份,我尊重埃隆,他是我们这个时代最伟大的建筑商之一。我知道他很清楚仇敌攻击他的感受 这让我对他的行为感到非常难过 Yeah, he is one of the greatest builders of all time, potentially the greatest builder of all time. It makes me sad. And I think it makes a lot of people sad. There's a lot of people who've really looked up to him for a long time and said this. I said in some interview or something that I missed the old Elon and the number of messages I got being like that exactly encapsulates how I feel. 是的,他是有史以来最伟大的建筑师之一,有可能是有史以来最伟大的建筑师。这让我很难过。我想这也让很多人难过。有很多人在很长一段时间里都非常仰慕他,并说过这样的话。我在某次采访中说,我很怀念以前的埃隆,而我收到的很多留言都是这样的,这完全概括了我的感受。 I think he should just win. He should just make Grok beat GPT and then GPT beats Grok and it's just a competition, and it's beautiful for everybody. But on the question of open source, do you think there's a lot of companies playing with this idea? It's quite interesting. 我认为他应该赢。他应该让 Grok 击败 GPT,然后让 GPT 击败 Grok,这只是一场竞争,对每个人来说都很美好。但关于开放源代码的问题,你认为是否有很多公司都在玩这个概念?这很有趣。 I would say Meta, surprisingly, has led the way on this or like at least took the first step in the game of chess of really open sourcing the model. Of course, it's not the state of the art model, but open sourcing Llama and Google is flirting with the idea of open sourcing a smaller version. What are the pros and cons of open sourcing? Have you played around with this idea. 令人惊讶的是,Meta 公司在这方面走在了前列,或者说至少在真正开源该模型的棋局中迈出了第一步。当然,这并不是最先进的模式,但Llama已经开源,谷歌也在考虑开源一个更小的版本。开源的利弊是什么?你是否也有过这样的想法? Yeah, I think there is definitely a place for open source models, particularly smaller models that people can run locally, I think there's huge demand for. I think there will be some open source models, there will be some closed source models. It won't be unlike other ecosystems in that way. 是的,我认为开源模式肯定会有一席之地,尤其是人们可以在本地运行的小型模式,我认为有巨大的需求。我认为会有一些开源模式,也会有一些闭源模式。这与其他生态系统并无不同。 I listened to all in podcasts talking about this lawsuit and all that kind of stuff and they were more concerned about the precedent of going from nonprofit to this cap for profit. What precedent that sets for other startups. I would heavily discourage any startup that was thinking about starting as a non-profit and adding like a for-profit arm later. I'd heavily discourage them from doing that. I don't think we'll set a precedent here. Okay. 我在播客中听到很多人都在谈论这场官司和所有这些事情,他们更关心的是,从非营利到盈利的先例。这给其他初创企业开了什么先例。我非常不鼓励任何初创公司考虑从非营利性开始,然后再增加一个营利性分支。我不鼓励他们这么做。我不认为我们会开创先例。好吧 So most startups should go just. For sure. And again, if we knew what was gonna happen, we would've done that too. Well, like in theory, if you like dance beautifully here, there's like some tax incentives or whatever-But I don't think that's like how most people think about these things. Just not possible to save a lot of money for a startup if you do it this way. 因此,大多数初创企业都应该 "走出去"。当然再说一遍,如果我们知道会发生什么,我们也会这么做的。 And again, if we knew what was gonna happen, we would've done that too.理论上说,如果你在这里跳得好,会有一些税收优惠什么的,但我觉得大多数人不会这么想。如果这样做,不可能为创业公司省下一大笔钱。 No, I think there's like laws that would make that pretty difficult. Where do you hope this goes with Elon? Well, this tension, this dance, what do you hope this? Like if we go one, two, three years from now, your relationship with him on a personal level too, like friendship, friendly competition, just all this kind of stuff. Yeah. I mean, I really respect Elon. And I hope that years in the future, we have an amicable relationship. 不,我认为法律会让这变得非常困难。你希望埃隆怎么做?这种紧张关系,这种舞蹈,你希望怎样?从现在起的一两年或三年内 你和他在个人层面上的关系 比如友谊 友谊竞争之类的东西是的,我真的很尊敬埃隆。我希望在未来的几年里,我们能保持友好的关系。 Yeah, I hope you guys have an amicable relationship like this month and just compete and win and explore these ideas together. I do suppose there's competition for talent or whatever, but it should be friendly competition. Just build, build cool shit. And Elon is pretty good at building cool shit, but so are you. 是啊,我希望你们能像这个月一样友好相处,一起竞争,一起获胜,一起探索这些想法。我想这是人才竞争或其他方面的竞争,但应该是友好的竞争。只是建造,建造很酷的东西。埃隆很擅长制造很酷的东西,但你也一样。 So speaking of cool shit. Sora, there's like a million questions I could ask. 说到酷索拉,我可以问你无数个问题 First of all, it's amazing, it truly is amazing on a product level, but also just on a philosophical level. So let me just technical/philosophical ask. What do you think it understands about the world more or less than GPT-4, for example, like the world model when you train on these patches versus language tokens. I think all of these models understand something more about the world model than most of us give them credit for. 首先,它很神奇,在产品层面确实很神奇,但在哲学层面也很神奇。那么,让我从技术/哲学角度问一下。你认为它对世界的理解比 GPT-4 多或少,例如,当你在这些补丁和语言标记上进行训练时,它对世界模型的理解是怎样的?我认为所有这些模型对世界模型的理解都比我们大多数人认为的要多。 And because they're also very clear things they just don't understand or don't get right, it's easy to look at the weaknesses, see through the veil and say this is all fake, but it's not all fake. It's just some of it works and some of it doesn't work. 因为他们也很清楚自己不理解或不正确的地方,所以很容易看到弱点,看穿面纱,说这都是假的,但其实并不全是假的。只是有些有用,有些没用。 I remember when I started first watching Sora videos and I would see like a person walk in front of something for a few seconds and occlude it and then walk away and the same thing was still there. I was like, "This is pretty good." Or there's examples where the underlying physics looks so well represented over a lot of steps in a sequence. It's like oh this is like quite impressive. 我记得刚开始看索拉视频的时候,我会看到一个人在某样东西前走几秒钟,然后把它遮住,然后走开,同样的东西还在那里。我就想,"这很不错"。还有一些例子,在一个序列的很多步骤中,底层物理看起来都表现得很好。就像哦,这真是令人印象深刻。 But, fundamentally, these models are just getting better and that will keep happening. If you look at the trajectory from DALL·E 1 to 2 to 3 to Sora, there were a lot of people that were dunked on each version, saying it can't do this, it can't do that and I'm like look at it now. 但从根本上说,这些模型正在变得越来越好,而且这种情况还会继续发生。如果你看看《达利1》、《达利2》、《达利3》到《索拉》的发展轨迹,每一个版本都有很多人说它做不到这一点,做不到那一点。 Well, the thing you just mentioned is kind of with the occlusions is basically modeling the physics of three dimensional physics of the world sufficiently well to capture those kinds of things. Well. Yeah, maybe you can tell me in order to deal with occlusions, what does the world model need to. Yeah, so what I would say is it's doing something to deal with occlusions really well. 你刚才提到的 "遮挡物 "基本上是对世界三维物理的建模,足以捕捉到这些东西。好吧也许你能告诉我,为了处理遮挡物,世界模型需要什么?是的,所以我想说的是,它正在做一些事情来很好地处理遮挡物。 What I represent that it has like a great underlying 3D model of the world. It's a little bit more of a stretch-But can you get there through just these kinds of two dimensional training data approaches. It looks like this approach is gonna go surprisingly far. I don't wanna speculate too much about what limits it will surmount and which it won't. What are some interesting limitations of the system that you've seen? 我所代表的,是一个伟大的底层三维世界模型。但你能通过这种二维训练数据的方法达到目的吗?看起来这种方法能走得更远我不想过多猜测它能突破哪些限制,哪些不能。你认为该系统有哪些有趣的局限性? I mean, there's been some fun ones you've posted. There's all kinds of fun. I mean, like cats sprouting a extra limit at random points in a video, like pick what you want, but there's still a lot of problem, there's a lot of weaknesses. Do you think that's a fundamental flaw of the approach or is it just bigger model or better technical details or better data, more data is going to solve the cat sprouting extremes. 我的意思是,你贴出了一些有趣的照片。什么好玩的都有。我的意思是,就像猫在视频中的随机点萌发出额外的限制,就像随心所欲,但还是有很多问题,有很多弱点。你认为这是方法的根本缺陷,还是模型更大、技术细节更好、数据更多,更多的数据就能解决猫萌发极端的问题? I would say yes to both. I think there is something about the approach which just seems to feel different from how we think and learn and whatever. And then also, I think it'll get better with scale. I mentioned LLMs have tokens, text tokens and Sora has visual patches so it converts all visual data, a diverse kinds of visual data videos and images into patches. Is the training to the degree you can say fully self-supervised there? 我觉得两者都可以。我认为,这种方法与我们的思维方式、学习方式等似乎有些不同。另外,我认为随着规模的扩大,这种方法会越来越好。我提到LLMs有标记、文本标记,而索拉有视觉补丁,因此它能将所有视觉数据、各种视觉数据、视频和图像转换成补丁。训练是否达到了你所说的完全自我监督的程度? Is there some manual labeling going on? What's the involvement of humans in all this. I mean, without saying anything specific about the Sora approach, we use lots of human data in our work. But not internet scale data. So lots of humans, lots of complicated word, Sam. I think lots is a fair word in this case. 是否存在人工贴标签的情况?人类在其中的参与程度如何?我的意思是,不说索拉方法的具体情况,我们在工作中使用了大量人类数据。但不是互联网规模的数据。所以,很多人类,很多复杂的词,萨姆。我认为在这种情况下,"大量 "这个词很恰当。 But it doesn't because to me, lots, like listen, I'm an introvert and when I hang out with like three people, that's a lot of people. Yeah, four people, that's a lot. But I suppose you mean more than. More than three people work on labeling the data for these models, yeah. Okay. All right. But fundamentally, there's a lot of self-supervised learning. 'cause what you mentioned in the technical report is internet scale data. 但我不这么认为,因为对我来说,很多人,比如听着,我是个内向的人,当我和三个人一起出去玩时,那就是很多人了。是啊,四个人,也不少了但我想你的意思是超过不止三个人在为这些模型标注数据 是的好吧 - 好吧 - Okay.All right.但从根本上说 有很多自我监督学习因为你在技术报告中提到的是互联网规模的数据 That's another beautiful, it's like poetry. So it's a lot of data that's not human label. It's self-supervised in that way. And then the question is, how much data is there on the internet that could be used in this that is conducive to this kind of self-supervised way if only we knew the details of the self-supervised? Do you have you considered opening it up a little more details-We have. You mean, for source specifically. 这是另一种美,就像诗歌一样。所以有很多数据是没有人类标签的。这种方式是自我监督的。那么问题来了,如果我们知道自我监督的细节,互联网上有多少数据可以用于这种有利于自我监督的方式呢?你有没有考虑过开放更多的细节?你是说,具体到源代码。 Source specifically because it's so interesting. Can the same magic of LLMs now start moving towards visual data and what does that take to do that. I mean, it looks to me like yes, but we have more work to do. Sure. What are the dangers? Why are you concerned about releasing the system? What are some possible dangers of this. 来源,特别是因为它非常有趣。现在,LLMs的魔力能否开始转向可视化数据?我的意思是,在我看来可以,但我们还有更多工作要做。当然,有什么危险吗?你为什么担心发布这个系统?有哪些可能的危险? I mean, frankly speaking, one thing we have to do before releasing the system is just like get it to work at a level of efficiency that will deliver the scale people are gonna want from this. So that I don't wanna like downplay that and there's still a ton of work to do there. But you can imagine like issues with deep fakes, misinformation. 坦率地说,在发布系统之前,我们要做的一件事就是让它的工作效率达到人们想要的水平。所以我不想轻描淡写地说这一点 还有很多工作要做但你可以想象,这就像深度伪造、错误信息的问题。 We try to be a thoughtful company about what we put out into the world and it doesn't take much thought to think about the ways this can go badly. There's a lot of tough questions here. You're dealing in a very tough space. Do you think training AI should be or is fair use under copyright law. 我们努力成为一家深思熟虑的公司,对我们向世界发布的东西深思熟虑。这里有很多棘手的问题。你们所处的领域非常艰难。你认为根据版权法,训练人工智能应该是或应该是合理使用吗? I think the question behind that question is, do people who create valuable data deserve to have some way that they get compensated for use of it? And that I think the answer is yes. I don't know yet what the answer is. People have proposed a lot of different things. We've some tried some different models. 我认为这个问题背后的问题是,那些创造了有价值数据的人是否应该通过某种方式获得使用数据的报酬?我认为答案是肯定的。我还不知道答案是什么。人们提出了很多不同的建议。我们也尝试过一些不同的模式。 But if I'm like an artist, for example, I would like to be able to opt out of people generating art in my style and B, if they do generate art in my style, I'd like to have some economic model associated with that. Yeah, it's that transition from CDs to Napster to Spotify. We have to figure out some kind of model. The model changes, but people have gotta get paid. 但是,如果我是一个艺术家,比如说,我希望能够选择拒绝别人按照我的风格创作艺术作品;B,如果他们按照我的风格创作艺术作品,我希望能够有一些与之相关的经济模式。是的,这就是从 CD 到 Napster 再到 Spotify 的过渡。我们必须找出某种模式。模式会变,但人们必须得到报酬。 Well, there should be some kind of incentive if we zoom out even more for humans to keep doing cool shit. Everything I worry about, humans are gonna do cool shit and society's gonna find some way to reward it. That seems pretty hardwired. We want to create. We want to be useful. We want to achieve status in whatever way that's not going anywhere, I don't think. But the reward might not be monetary, financial. 好吧,如果我们再放大一些 应该会有某种激励机制 让人类继续做很酷的事我所担心的一切,都是人类会做很酷的事 社会也会想办法奖励他们这似乎是天生的我们想要创造。我们想成为有用的人我们想以任何方式获得地位 我想这是不会改变的但奖励可能不是金钱或经济上的 It might be like fame and celebration of other cool. Maybe financial in some other way. Again, I don't think we've seen like the last evolution of how the economic system's gonna work. Yeah. But artists and creators are worried. When they see Sora, they're like, "Holy shit. Sure. Artists were also super worried when photography came out. And then photography became a new art form and people made a lot of money taking pictures. 它可能是名声和其他很酷的庆祝活动。也可能是其他方面的经济效益再说一遍,我觉得我们还没看到 经济系统如何运作的最后一次演变是啊但艺术家和创作者都很担心当他们看到索拉时,他们会说 "我的天啊当然,摄影术出现的时候 艺术家们也超级担心然后摄影成为了一种新的艺术形式 人们靠拍照赚了很多钱 And I think things like that will keep happening. People will use the new tools in new ways. If we just look on YouTube or something like this, how much of that will be using Sora, like AI-generated content do you think in the next five years. 我认为这样的事情会不断发生。人们会以新的方式使用新的工具。如果我们只看 YouTube 或类似的东西,你认为在未来五年内,有多少会使用 Sora,比如人工智能生成的内容。 People talk about like how many jobs is AI gonna do in five years and the framework that people have is what percentage of current jobs are just gonna be totally replaced by some AI doing the job? The way I think about it is not what percent of jobs AI will do, but what percent of tasks will AI do and over what time horizon. 人们谈论人工智能将在五年内完成多少工作,人们的框架是,人工智能将完全取代现有工作的百分比是多少?我的想法不是人工智能将取代百分之多少的工作,而是人工智能将在多长时间内取代百分之多少的任务。 So if you think of all of the like-five second tasks in the economy, five-minute tasks, the five-hour tasks, maybe even the five-day tasks, how many of those can AI do? 因此,如果你想一想经济中所有类似五秒钟的任务、五分钟的任务、五小时的任务,甚至是五天的任务,人工智能能完成多少? And I think that's a way more interesting, impactful, important question than how many jobs AI can do because it is a tool that will work at increasing levels of sophistication and over longer and longer time horizons for more and more tasks and let people operate at a higher level of abstraction. So maybe people are way more efficient at the job they do. 我认为这是一个比人工智能能做多少工作更有趣、更有影响力、更重要的问题,因为人工智能是一种工具,它的复杂程度会越来越高,时间跨度会越来越长,能完成的任务也会越来越多,让人们在更高的抽象水平上工作。因此,也许人们在他们所做的工作中效率更高。 And at some point, that's not just a quantitative change, but it's a qualitative one too about the kinds of problems you can keep in your head. I think that for videos on YouTube, it'll be the same. Many videos, maybe most of them, will use AI tools in the production, but they'll still be fundamentally driven by a person thinking about it, putting it together, doing parts of it, sort of directing it and running it. 在某种程度上,这不仅是量的变化,也是质的变化,关系到你能把什么样的问题记在脑子里。我认为,YouTube 上的视频也会如此。许多视频,也许是大多数视频,都会在制作过程中使用人工智能工具,但从根本上说,它们仍将由一个人来思考、组合、完成其中的一部分、指导和运行。 Yeah, it's so interesting. I mean, it's scary, but it's interesting to think about. I tend to believe that humans like to watch other humans or other human like. Humans really care about other humans a lot. Yeah. If there's a cooler thing that's better than a human, humans care about that for like two days and then they go back to humans. That seems very deeply wired. It's the whole chest thing. 是啊,太有意思了我的意思是,这很可怕,但想想也很有趣。我倾向于认为人类喜欢观察其他人类 或者其他人类喜欢的东西人类真的很关心其他人类如果有比人类更酷的东西 人类会关心两天 然后又回到人类身边这似乎是非常深刻的这就是胸怀 But now let's everybody keep playing chess and let's ignore the alpha in the room that humans are really bad at chess relative to AI systems. We still run races and cars are much faster. I mean, there's like a lot of examples. Yeah. And maybe it'll just be tooling like in the Adobe suite type of way where it can just make videos much easier and all that kind of stuff. 但现在,让我们继续下棋吧,不要去理会房间里的阿尔法,相对于人工智能系统,人类的棋艺真的很差。我们还在赛跑,而汽车要快得多。这样的例子多了去了是啊也许它只是一个工具,就像Adobe套件那样,可以让视频制作更简单,诸如此类。 Listen, I hate being in front of the camera. If I can figure out a way to not be in front of the camera, I would love it. Unfortunately, it'll take a while. Like that generating faces, it's getting there, but generating faces and video format is tricky when it's specific people versus generic people. 听着,我讨厌在镜头前。如果我能想出不在镜头前的办法,我会很乐意的。不幸的是,这需要一段时间。就像生成人脸一样,我正在努力,但生成人脸和视频格式是个难题,因为是特定的人还是一般的人。 Let me ask you about GPT-4. There's so many questions. First of all, also amazing. 让我问你关于 GPT-4 的问题。问题太多了。首先,也很神奇。 Looking back, it'll probably be this kind of historic pivotal moment with three, five, and four which had GPT. Maybe five will be the pivotal moment. I don't know. Hard to say that looking forwards. We never know. That's the annoying thing about the future, it's hard to predict. But for me, looking back GPT-4, ChatGPT is pretty impressive, historically impressive. So allow me to ask, what's been the most impressive capabilities of GPT-4 to you and GPT-4 Turbo. 回首往事,三、五、四期的 GPT 可能会成为历史性的关键时刻。也许五号会是关键时刻。我不知道。展望未来,很难说。我们永远不知道。这就是未来的恼人之处,很难预测。但对我来说,回顾GPT-4,ChatGPT给人留下了深刻印象,在历史上令人印象深刻。那么,请允许我问一下,GPT-4 给您和 GPT-4 Turbo 留下的最深刻印象是什么? I think it kind of sucks. Hmm. Typical human also gotten used to an awesome thing. No, I think it is an amazing thing, but relative to where we need to get to and where I believe we will get to, at the time of like GPT-3, people were like, "Oh this is amazing. This is this like marvel of technology," and it is, it was. 我觉得这有点糟糕。嗯典型的人类也习惯了一件了不起的事情。不,我认为这是一件了不起的事情,但相对于我们需要到达的地方,以及我相信我们会到达的地方,在像GPT-3的时候,人们会说,"哦,这太神奇了。这是科技的奇迹",的确如此。 But now we have GPT-4 and look at GPT-3 and you're like that's unimaginably horrible. I expect that the delta between five and four will be the same as between four and three. And I think it is our job to live a few years in the future and remember that the tools we have now are gonna kind of suck looking backwards at them and that's how we make sure the future is better. 但现在我们有了 GPT-4,再看看 GPT-3,你就会觉得这简直太可怕了。我预计,5 级和 4 级之间的差距将与 4 级和 3 级之间的差距相同。我认为,我们的工作就是活在未来的几年里,记住我们现在所拥有的工具会很糟糕,但我们可以向后看,这样才能确保未来会更好。 What are the most glorious ways that GPT-4 sucks? Meaning. What are the best things it can do. What are the best things it can do in the limits of those best things that allow you to say it sucks, therefore gives you an inspiration and hope for the future. One thing I've been using it for more recently is sort of a like a brainstorming partner. Yep. Almost for that. There's a glimmer of something amazing in there. GPT-4 最光彩夺目的地方是什么?意思是它能做的最好的事情是什么?它能做的最好的事情是什么?在这些最好的事情的范围内,你可以说它很烂,因此给你一个灵感和对未来的希望。我最近一直在用它来做一件事,就像是一个头脑风暴伙伴。没错。几乎就是这样。里面闪烁着令人惊叹的光芒。 I don't think it gets. When people talk about it, what it does, they're like, "Helps me code more productively. It helps me write more faster and better. It helps me translate from this language to another," all these like amazing things. But there's something about the kind of creative brainstorming partner. I need to come up with a name for this thing. I need to think about this problem in a different way. 我不这么认为。当人们谈到它的作用时,他们会说:"它能帮助我更有效地编写代码。它能帮我写得更快、更好。它能帮我把这种语言翻译成另一种语言",所有这些都是令人惊奇的事情。但有一些关于创意头脑风暴伙伴的东西。我需要为这个东西取个名字。我需要换一种方式来思考这个问题。 I'm not sure what to do here. That I think like gives a glimpse of something I hope to see more of. One of the other things that you can see a very small glimpse of is what I can help on longer horizon tasks. Break down something in multiple steps, maybe execute some of those steps, search the internet, write code, whatever. Put that together. 我不知道该怎么办。我觉得这让我看到了希望看到更多的东西。还有一点,你可以从我的工作中窥见一二,那就是我可以帮助你完成更长远的任务。把事情分解成多个步骤,也许执行其中的一些步骤,搜索互联网,编写代码,等等。把这些放在一起。 When that works, which is not very often, it's like very magical,-The iterative back and forth with a human. It works a lot for me. What do you mean it works. Iterative back and forth to human, it can get more often, when it can go do like a 10-step problem on its own. It doesn't work for that too often sometimes. Add multiple layers of abstraction or do you mean just sequential. 当这种方法奏效的时候,虽然并不常见,但却非常神奇。这对我很有用什么意思?与人类的来回迭代,它可以更频繁,当它可以去做像一个10步的问题在它自己的。有时它并不经常起作用。是增加多层抽象,还是仅仅是顺序? Both like to break it down and then do things that different layers of abstraction put them together. Look, I don't wanna downplay the accomplishment of GPT-4, but I don't wanna overstate it either. And I think this point that we are on an exponential curve, we'll look back relatively soon at GPT-4 like we look back at GPT-3 now. 两者都喜欢将其分解,然后通过不同的抽象层将它们组合在一起。听着,我不想贬低 GPT-4 的成就,但也不想夸大它。我认为,我们正处在一个指数曲线上,我们很快就会回过头来看 GPT-4,就像我们现在回过头来看 GPT-3。 That said, I mean ChatGPT was the transition to where people like started to believe there is an uptick of believing. Not internally at OpenAI perhaps. There's believers here, but when you think. And in that sense, I do think it'll be a moment where a lot of the world went from not believing to believing. That was more about the ChatGPT interface. 话虽如此,但我的意思是,ChatGPT是人们开始相信信仰的过渡时期。也许在OpenAI内部并非如此。这里是有信徒,但当你去思考的时候。从这个意义上说,我认为这将是世界上很多人从不曾相信到相信的时刻。这更多是关于 ChatGPT 界面。 And by the interface and product, I also mean the post-training of the model and how we tune it to be helpful to you and how to use it than the underlying model itself. How much of each of those things are important? The underlying model and the RLHF or something of that nature that tunes it to be more compelling to the human, more effective and productive for the human. 我所说的界面和产品,还指模型的后期训练,以及我们如何调整模型使其对你有所帮助,以及如何使用它,而不是底层模型本身。这些东西各自有多重要?底层模型和 RLHF 或类似的东西,它们能使模型对人类更有吸引力,对人类更有效,更有生产力。 I mean, they're both super important but the RLHF, the post-training step, the little wrapper of things that from a compute perspective, little wrapper of things that we do on top of the base model, even though it's a huge amount of work. That's really important to say nothing of the product that we build around it. In some sense, we did have to do two things. 我的意思是,它们都超级重要,但 RLHF、后训练步骤、从计算角度看的小包装、我们在基础模型之上做的小包装,尽管工作量巨大。这一点非常重要,我们围绕它构建的产品就更不用说了。从某种意义上说,我们必须做两件事。 We had to invent we underlying technology and then we had to figure out how to make it into a product people would love, which is not just about the actual product work itself, but this whole other step of how you align it and make it useful-And how you make the scale work where a lot of people can use it at the same time, all that kind of stuff. But that was like a known difficult thing. 我们必须先发明底层技术,然后再想办法把它变成人们喜欢的产品,这不仅仅是产品本身,还包括如何调整产品并使其有用,以及如何扩大产品规模,让很多人可以同时使用,等等。但这就像是一件众所周知的难事。 We knew we were gonna have to scale it up. We had to go do two things that had like never been done before that were both, like, I would say quite significant achievements and then a lot of things like scaling it up that other companies have had to do before. How does the context window of going from 8K to 128K tokens compare from GPT-4 to GPT-4 Turbo. 我们知道我们必须扩大规模。我们不得不去做两件以前从未做过的事情,这两件事情都是我认为相当重要的成就,然后还有很多其他公司以前不得不做的事情,比如扩大规模。与 GPT-4 到 GPT-4 Turbo 相比,从 8K 到 128K 代币的上下文窗口如何? Most people don't need all the way to 128, most of the time although. If we dream into the distant future, we'll have like way distant future, we'll have like context length of several billion. You will feed in all of your information, all of your history time, and it'll just get to know you better and better and that'll be great. For now, the way people use these models, they're not doing that. 虽然大多数人并不需要一直达到 128,但大多数时候都是如此。如果我们梦想着遥远的未来,我们将拥有遥远的未来,我们将拥有几十亿的上下文长度。你将输入所有的信息和历史时间,它就会越来越了解你,那将会很棒。目前,人们使用这些模型的方式还没有做到这一点。 And people sometimes post in a paper or a significant fraction of a code repository, whatever. But most usage of the models is not using the long context most of the time. I like that this is your I have a dream speech. One day, you'll be judged by the full context of your character or of your whole lifetime. That's interesting. So like that's part of the expansion that you're hoping for is a greater and greater context. 人们有时会在论文或代码库的重要部分中发布信息,等等。但大多数情况下,对模型的使用并不是使用长上下文。我喜欢这是你的 "我有一个梦想 "演讲。有朝一日,你将根据你的性格或一生的完整语境接受评判。这很有趣所以这也是你所希望的扩展的一部分,那就是越来越大的语境。 I saw this internet clip once. I'm gonna get the numbers wrong, but it was like Bill Gates talking about the amount of memory on some early computer. Maybe it was 64K, maybe 640K, something like that. And most of it was used for the screen buffer. And he just couldn't seem genuine. This couldn't imagine that the world would eventually need gigabytes of memory in a computer or terabytes of memory in a computer. 我曾经在网上看到过这样一个片段。我可能把数字弄错了,但就像是比尔-盖茨在谈论早期电脑的内存容量。也许是 64K,也许是 640K,诸如此类。而且大部分都用在了屏幕缓冲区上他看起来就是不真诚他无法想象这个世界最终会需要千兆字节的电脑内存 或者兆兆字节的电脑内存 And you always do or you always do just need to follow the exponential of technology. We will find out how to use better technology. So I can't really imagine what it's like right now for context links to go out to the billion someday and they might not literally go there, but effectively it'll feel like that. But I know we'll use it and really not wanna go back once we have it. 你总是这样做,或者你总是这样做,只需要跟随技术的指数。我们会发现如何使用更好的技术。所以,我真的无法想象现在的上下文链接是什么样子的,有一天,它们可能不会真的到那里去,但实际上会有这样的感觉。但我知道我们会使用它,而且一旦我们拥有了它,就真的不想再回去了。 Yeah, even saying billions 10 years from now might seem dumb because it'll be like trillions upon trillions. Sure. There'd be some kind of breakthrough that will effectively feel like infinite context. But even 120, I have to be honest, I haven't pushed it to that degree. Maybe putting in entire books or like parts of books and so on, papers. What are some interesting use cases of GPT-4 that you've seen. 是啊,即使说 10 年后的数十亿美元也可能显得很蠢,因为那将是数以万亿计的数字。当然,会有某种突破,让人感觉是无限的。但老实说,即使是 120 年,我也没把它推到那个程度。也许可以把整本书或者书中的部分内容放到论文中。你见过哪些有趣的 GPT-4 使用案例? The thing that I find most interesting is not any particular use case that we can talk about those, but it's people who kind of like. This is mostly younger people, but people who use it as like their default start for any kind of knowledge work task. And it's the fact that it can do a lot of things reasonably well. 我觉得最有趣的事情不是我们可以谈论的任何特定用例,而是那些喜欢的人。这其中大部分是年轻人,但也有一些人将其作为任何知识工作任务的默认起点。事实上,它可以很好地完成很多事情。 You can use GPTV, you can use it to help you write code, you can use it to help you do search, you can use it to edit a paper. The most interesting to me is the people who just use it as the start of their workflow. I do as well for many things. Like I use it as a reading partner for reading books. 你可以使用 GPTV,可以用它来帮助你编写代码,可以用它来帮助你进行搜索,还可以用它来编辑文档。对我来说,最有趣的是那些仅仅把它作为工作流程起点的人。我在很多事情上也是如此。比如我把它当作阅读书籍的伴侣。 It helps me think, help me think through ideas, especially when the books are classic, so it's really well written about and it actually is. I find it often to be significantly better than even like Wikipedia on well-covered topics. It's somehow more balanced and more nuanced, or maybe it's me, but it inspires me to think deeper than a Wikipedia article does. I'm not exactly sure what that is. 它能帮助我思考,帮助我理清思路,尤其是当书籍是经典之作时,因此它写得非常好,而且确实如此。我发现,在涵盖广泛的主题方面,它甚至比维基百科还要好得多。在某种程度上,它比维基百科的文章更平衡、更细致,也可能是我的原因,但它比维基百科的文章更能启发我深入思考。我不太确定这是什么原因。 You mentioned like this collaboration, I'm not sure where magic is, if it's in here or if it's in there or if it's somewhere in between. I'm not sure. But one of the things that concerns me for knowledge task when I start with GPT is I'll usually have to do fact checking after, like check that it didn't come up with fake stuff. 你提到这次合作,我不确定魔法在哪里,是在这里,还是在那里,还是在两者之间。我不确定。但当我开始使用 GPT 时,我对知识任务的关注点之一是,我通常会在使用后进行事实核查,比如检查它是否使用了伪造的东西。 How do you figure that out that GPT can come up with fake stuff that sounds really convincing? So how do you ground it in truth. That's obviously an area of intense interest for us. I think it's gonna get a lot better with upcoming versions, but we'll have to continue to work on it and we're not gonna have it like all solved this year. Well, the scary thing is like as it gets better. 你是怎么发现 GPT 可以编造出听起来很有说服力的假话的?那么,你们是如何把它建立在真实的基础上的呢?这显然是我们非常感兴趣的一个方面。我认为在即将推出的版本中,情况会好很多,但我们还得继续努力,今年内不可能全部解决。可怕的是,当它变得更好时。 You'll start not doing the fact checking more and more, right. I'm of two minds about that. I think people are like much more sophisticated users of technology than we often give them credit for and people seem to really understand that GPT, any of these models hallucinate some of the time and if it's mission critical, you gotta check it. Except journalists don't seem to understand that. 你会越来越不做事实调查,对吧。我对此有两种看法。我认为,人们对技术的使用比我们常说的要复杂得多,人们似乎真的明白,GPT、任何这些模型在某些时候都会产生幻觉,如果它是关键任务,你就得检查它。只是记者们似乎不明白这一点。 I've seen journalists half-assedly just using GPT-4. Of the long list of things I'd like to dunk on journalists for, this is not my top criticism of them. Well, I think the bigger criticism is perhaps the pressures and the incentives of being a journalist is that you have to work really quickly and this is a shortcut. I would love our society to incentivize like. I would too. 我见过记者半吊子地只使用 GPT-4。在我想批评记者的一长串问题中,这并不是我对他们的最大批评。嗯,我认为更大的批评也许是作为记者的压力和动力,那就是你必须非常快速地工作,而这是一条捷径。我很希望我们的社会能有这样的激励机制。我也希望如此。 Journalistic efforts that take days and weeks and rewards great in depth journalism. Also journalism that represent stuff in a balanced way where it's like celebrates people while criticizing them even though the criticism is the thing that gets clicks and making up also gets clicks and headlines that mischaracterize completely. I'm sure you have a lot of people dunking on. Well, all that drama probably got a lot of clicks. Probably did. 耗时数日、数周的新闻报道,奖励的是有深度的新闻报道。同时,新闻报道也要以平衡的方式来表现东西,就像在批评人们的同时也在赞美他们,即使批评是获得点击率的东西,编造也能获得点击率,而标题则完全错误地描述。我相信有很多人都喜欢你的文章。好吧,所有这些戏剧性的东西可能会有很多点击量。也许吧 And that's a bigger problem about human civilization. I would love to see solved is where we celebrate a bit more. 这是人类文明的一个大问题。我很希望看到我们能多庆祝一下,从而解决这个问题。 You've given ChatGPT the ability to have memories. You've been playing with that about previous conversations and also the ability to turn off memory, which I wish I could do that sometimes, just turn on and off depending. I guess sometimes alcohol can do that, but not optimally, I suppose. 你赋予了ChatGPT拥有记忆的能力。你一直在玩关于以前对话的游戏,还有关闭记忆的能力,我希望我有时也能这样做,只是根据情况开启或关闭。我想有时酒精也能做到这一点,但我想并不理想。 What have you seen through that, like playing around with that idea of remembering conversations and not. We're very early in our explorations here, but I think what people want or at least what I want for myself is a model that gets to know me and gets more useful to me over time. This is an early exploration. I think there's like a lot of other things to do, but that's where we'd like to head. 你通过它看到了什么,比如在记忆对话和不记忆对话方面的想法。我们的探索还处于早期阶段,但我认为人们想要的,或者至少我自己想要的,是一个能够了解我的模型,并且随着时间的推移对我越来越有用。这是一个早期探索。我认为还有很多其他事情要做,但这就是我们的目标。 You'd like to use a model and over the course of your life or use a system, it'd be many models. And over the course of your life, it gets better and better. Yeah. How hard is that problem? 'Cause right now, it's more like remembering little factoids and preferences and so on. What about remembering, like don't you want GPT to remember all the shit you went through in November and all the drama and then you can. 你想使用一种模式,在你的一生中或使用一个系统的过程中,它会有很多模式。在你的一生中,它会变得越来越好。这个问题有多难?因为现在,更多的是记住一些小知识和偏好等等。那记忆呢 比如你不想让GPT记住 你在11月经历的所有破事 所有的戏剧性事件 然后你就可以 Yeah, yeah, yeah. Because right now, you're clearly blocking it out a little bit. It's not just that I want it to remember that. I want it to integrate the lessons of that and remind me in the future what to do differently or what to watch out for. And we all gain from experience over the course of our lives, varying degrees. And I'd like my AI agent to gain with that experience too. 是啊,是啊,是啊。因为现在,你明显把它屏蔽了一点我不只是想让它记住这些 It's not just that I want it to remember that.我还想让它把这些经验教训融会贯通 提醒我以后该怎么做 或者该注意什么在我们的一生中,我们都会从不同程度上获得经验。我希望我的人工智能代理也能从这些经验中有所收获。 So if we go back and let ourselves imagine that trillions and trillions of contact length, if I can put every conversation I've ever had with anybody in my life in there, if I can have all of my emails input out. Like all of my input/output in the context window every time I ask a question, that'd be pretty cool, I think. Yeah, I think that would be very cool. 因此,如果我们回到过去,让自己想象一下数以万亿计的联系长度,如果我能把我一生中与任何人的每一次对话都放在里面,如果我能把我所有的电子邮件都输入出来。就像每次我提问时,我所有的输入/输出都会出现在上下文窗口中,我想那一定很酷。是的,我觉得那会非常酷。 People sometimes will hear that and be concerned about privacy. What do you think about that aspect of it, the more effective the AI becomes that really integrating all the experiences and all the data that happened to you and give you advice. I think the right answer there is just user choice. Anything I want stricken from the record from my AI agent, I wanna be able to take out. 人们有时听到这种说法会担心隐私问题。你怎么看这方面的问题?人工智能越有效,就越能真正整合所有的经验和发生在你身上的所有数据,并给你提供建议。我认为正确的答案是用户的选择。任何我想从人工智能代理记录中删除的内容,我都希望能够删除。 If I don't want to remember anything, I want that too. You and I may have different opinions about where on that privacy utility trade off for our own AI we wanna be, which is totally fine. But I think the answer is just like really easy user choice. 如果我不想记住任何东西,我也想这样。你和我可能会有不同的看法,对于我们自己的人工智能来说,在隐私效用的取舍上,我们想要的是什么,这完全没有问题。但我认为,答案就像用户选择一样简单。 But there should be some high level of transparency from a company about the user choice 'cause sometimes company in the past, companies in the past have been kind of absolutely shady about like, yeah, it's kind of presumed that we're collecting all your data and we're using it for a good reason for advertisement and so on. But there's not a transparency about the details of that. That's totally true. 但是,公司在用户选择方面应该有较高的透明度,因为有时公司在过去的做法是绝对不光明正大的,比如,是的,我们收集了你的所有数据,并将其用于广告等方面。但其中的细节并不透明。这完全正确。 You mentioned earlier that I'm like blocking out the November stuff. I'm just teasing you. Well, I mean I think it was a very traumatic thing and it did immobilize me for a long period of time. 你刚才说我好像屏蔽了 11 月的东西。我逗你玩呢我的意思是,我觉得那是一件非常痛苦的事 它让我在很长一段时间里无法动弹 Like definitely the hardest, like the hardest work thing I've had to do was just like keep working that period because I had to try to come back in here and put the pieces together while I was just like in sort of shock and pain. Nobody really cares about that. 我不得不做的最艰难的工作,就是在那段时间里继续工作,因为我不得不试着回到这里,在我处于震惊和痛苦中的时候,把碎片拼凑起来。没有人真正关心这一点。 I mean, the team gave me a pass and I was not working at my normal level, but there was a period where I was just, like, it was really hard to have to do both. But I kind of woke up one morning and I was like, "This was a horrible thing to happen to me. 我的意思是,团队给了我一张通行证,而我的工作也没有达到正常水平,但有一段时间,我不得不同时兼顾这两件事,这真的很困难。但有一天早上醒来,我觉得 "发生在我身上的这件事太可怕了"。 I think I could just feel like a victim forever," or I can say, "This is like the most important work I'll ever touch in my life and I need to get back to it." And it doesn't mean that I've repressed it because sometimes I wake in the middle of the night thinking about it, but I do feel like an obligation to keep moving forward. 我想我可能会永远觉得自己是个受害者,"或者我可以说,"这是我一生中接触过的最重要的工作,我需要回到它身边"。这并不意味着我压抑了它,因为有时我会在半夜醒来想它,但我确实觉得自己有义务继续前进。 Well, that's beautifully said, but there could be some lingering stuff in there. What I would be concerned about is that trust thing that you mentioned that being paranoid about people as opposed to just trusting everybody or most people, like using your gut. It's a tricky dance for sure. 说得真好,但这里面可能还有些挥之不去的东西。我担心的是你提到的 "信任 "问题,即偏执地相信别人,而不是相信所有人或大多数人,就像相信你的直觉一样。这肯定是一个棘手的问题。 I mean, because I've seen in my part-time explorations, I've been diving deeply into the Zelensky administration, the Putin administration and the dynamics there in wartime in a very highly stressful environment. And what happens is distrust and you isolate yourself both and you start to not see the world clearly. And that's a concern, that's a human concern. 我的意思是,因为我在兼职探索中看到,我一直在深入研究泽连斯基政府、普京政府以及战时在高度紧张环境下的动态。结果就是不信任和自我孤立,你开始看不清楚这个世界。这是一个问题,一个人类的问题。 You seem to have taken it in stride and kind of learned the good lessons and felt the love and let the love energize you, which is great, but still can linger in there. There's just some questions I would love to ask your intuition about what's GPT able to do and not. So it's allocating approximately the same amount of compute for each token it generates. 你似乎已经淡然处之,学到了好的经验,感受到了爱,让爱为你注入了活力,这很好,但仍然会在那里徘徊。关于 GPT 能做什么,不能做什么,我很想问问你的直觉。因此,它为每个令牌分配的计算量大致相同。 Is there room there in this kind of approach to slower thinking, sequential thinking. I think there will be a new paradigm for that kind of thinking. Will it be similar like architecturally as what we're seeing now with LLMs? Is it a layer on top of the LLMs. I can imagine many ways to implement that. 这种慢速思考、顺序思考的方法是否存在空间。我认为,这种思维方式将成为一种新的范式。在架构上,它是否与我们现在看到的LLMs类似?它是LLMs之上的一层吗?我能想象出很多实现方法。 I think that's less important than the question you were getting out, which is do we need a way to do a slower kind of thinking where the answer doesn't have to get like. I guess like spiritually, you could say that you want an AI to be able to think harder about a harder problem and answer more quickly about an easier problem. And I think that will be important. 我认为这并不重要,重要的是你提出的问题,即我们是否需要一种方法来进行一种较慢的思考,在这种思考中,答案并不一定要像这样。我想,从精神层面上讲,你可以说,你希望人工智能能够更努力地思考一个更难的问题,更快速地回答一个更简单的问题。我认为这一点很重要。 Is that like a human thought that we're just having, you should be able to think hard? Is that a wrong intuition. I suspect that's a reasonable intuition. Interesting. So it's not possible once the GPT gets like GPT-7 would just be instantaneously be able to see, here's the proof of from RSTM. It seems to me like you want to be able to allocate more compute to harder problems. 这就像我们人类的想法一样,你应该能够认真思考吗?这是不是一种错误的直觉?我怀疑这是一种合理的直觉。有意思所以一旦GPT变得像GPT-7一样 就不可能一下子就能看到 这是RSTM的证明在我看来,你似乎希望能够为更难的问题分配更多的计算量。 It seems to me that a system knowing if you ask a system like that, proof from us last theorem versus. What's today's date? Unless it already knew and had memorized the answer to the proof, assuming it's gotta go figure that out, seems like that will take more compute. But can it look like a basically LLM talking to itself, that kind of thing. Maybe. 在我看来,如果你问一个这样的系统,从我们最后一个定理相对应的证明。今天是几号?除非它已经知道并记住了证明的答案,假设它必须去弄清楚这个问题,这似乎需要更多的计算。但它能不能像LLM那样自言自语呢?也许吧 I mean, there's a lot of things that you could imagine working what the right or the best way to do that will be. 我的意思是,你可以想象有很多事情要做,正确或最好的方法是什么。 We don't know. This does make me think of the mysterious, the lore behind Q-Star. What's this mysterious Q-Star project? Is it also in the same nuclear facility. There is no nuclear facility. That's what a person with a nuclear facility always says. I would love to have a secret nuclear facility. There isn't one. 我们不知道。这让我想到了神秘的 Q 星背后的传说。这个神秘的Q星项目是什么?是不是也在同一个核设施里没有核设施拥有核设施的人总是这么说。我也想有个秘密核设施没有 All right. Maybe someday. Someday. All right. One can dream. OpenAI is not a good company to keeping secrets. It would be nice. We're like been plagued by a lot of leaks and it would be nice if we were able to have something like that. Can you speak to what Q-Star is. We are not ready to talk about that. See, but an answer like that means there's something to talk about. It's very mysterious, Sam. 好吧也许有一天总有一天好吧做梦吧OpenAI不是个保守秘密的好公司 OpenAI is not a good company to keeping secrets.但愿如此我们被很多泄密事件困扰 如果能有这样的机会就好了你能谈谈 Q-Star 是什么吗?我们还没准备好谈这个。看吧,但这样的回答就意味着有话可说了这很神秘 萨姆 I mean, we work on all kinds of research. We have said for a while that we think better reasoning in these systems is an important direction that we'd like to pursue. We haven't cracked the code yet. We're very interested in it. Is there gonna be moments Q-Star or otherwise where there's going to be leaps similar to GPT where you're like. That's a good question. What do I think about that? 我的意思是,我们致力于各种研究。我们早就说过,我们认为在这些系统中进行更好的推理是我们想要追求的一个重要方向。我们还没有破解密码。我们对此非常感兴趣。Q-Star 或其他系统是否会出现类似 GPT 的飞跃?问得好我怎么看? It's interesting to me it all feels pretty continuous. This is kind of a theme that you're saying is you're basically gradually going up an exponential slope. But from an outsider's perspective for me, just watching it that it does feel like there's leaps, but to you there isn't. I do wonder if we should have. So part of the reason that we deploy the way we do is that we think, we call it iterative deployment. 有趣的是,我觉得这一切都很连续。这也是你所说的一个主题,即你基本上是在逐步上升一个指数级的斜坡。但从一个局外人的角度来看,我只是看着它,感觉确实有飞跃,但对你来说却没有。我不知道我们是否应该这样做。我们之所以这样部署,部分原因在于我们认为,我们称之为迭代部署。 Rather than go build in secret until we got all the way to GPT-5, we decided to talk about GPT 1,2.3 and 4. And part of the reason there is, I think, AI and surprise don't go together. And also the world, people, institutions, whatever you wanna call it, need time to adapt and think about these things. 我们没有秘密进行建设,直到我们一路走到 GPT-5,而是决定谈谈 GPT 1、2.3 和 4。我认为,部分原因在于人工智能和惊喜并不搭界。另外,这个世界、人们、机构,不管你怎么称呼它,都需要时间来适应和思考这些事情。 And I think one of the best things that OpenAI has done is this strategy and we get the world to pay attention to the progress to take AGI seriously to think about what systems, and structures, and governance we want in place before, we're like under the gun and have to make a rush decision. I think that's really good. 我认为,OpenAI 所做的最好的事情之一就是制定了这一战略,我们让全世界都来关注 AGI 的进展,认真对待 AGI,思考我们希望建立什么样的系统、结构和治理,然后我们就像在枪口下,不得不做出仓促的决定。我认为这非常好。 But the fact that people like you and others say you still feel like there are these leaps makes me think that maybe we should be doing our releasing even more iteratively. I don't know what that would mean. I don't have an answer ready to go. But our goal is not to have shock updates to the world. The opposite. Yeah, for sure. More iterative would be amazing. I think that's just beautiful for everybody. 但事实上,像你这样的人和其他人说,你仍然觉得有这些飞跃,这让我觉得,也许我们应该更加反复地进行发布。我不知道这意味着什么。我还没有答案。但我们的目标不是向全世界发布令人震惊的更新。相反。是的,当然。更多的迭代将是惊人的。我觉得这对每个人来说都很美好。 But that's what we're trying to do. That's like our state of the strategy and I think we're somehow missing the mark. So maybe we should think about releasing GPT-5 in a different way or something like that. Yeah, 4.71,4.72. But people tend to like to celebrate, people celebrate birthdays. I don't know if you know humans, but they kind of have these milestones. I do know some humans. People do like milestones. I totally get that. 但这正是我们要做的。这就是我们的战略现状,我认为我们在某种程度上失误了。所以,也许我们应该考虑以另一种方式发布 GPT-5 或类似的东西。是啊,4.71、4.72。但人们往往喜欢庆祝,庆祝生日。我不知道你是否了解人类 但他们也有自己的里程碑我确实认识一些人类人们确实喜欢里程碑。我完全理解 I think we like milestones too. It's like fun to say, declare victory on this one and go start the next thing. But, yeah, I feel like we're somehow getting this a little bit wrong. 我觉得我们也喜欢里程碑。宣布这一次的胜利,然后开始下一件事,这很有趣。不过,是的,我觉得我们好像有点搞错了。 So when is GPT-5 coming out again. I don't know. That's an honest answer. Oh, that's the honest answer. Is it blink twice if it's this year. We will release an amazing model this year. I don't know what we'll call it. 那么,GPT-5 什么时候再推出呢?我不知道。这是一个诚实的答案。哦,这是一个诚实的答案。如果是今年,是不是会眨两下眼睛。我们今年会推出一款很棒的机型。我不知道叫什么名字 So that goes to the question of like, what's the way we release this thing. We'll release, over in the coming months, many different things. I think they'll be very cool. 所以这就涉及到一个问题,那就是我们发布这个东西的方式是什么。在接下来的几个月里,我们会发布很多不同的东西。我认为它们会非常酷。 I think before we talk about like a GPT-5 like model called that or called or not called that or a little bit worse or a little bit better than what you'd expect from a GPT-5, I know we have a lot of other important things to release first. I don't know what to expect from GPT-5. You're making me nervous and excited. 我想,在我们谈论类似 GPT-5 这样的模型叫这个还是叫那个,或者不叫那个,或者比你所期待的 GPT-5 差一点还是好一点之前,我知道我们还有很多其他重要的东西要先发布。我不知道对 GPT-5 有什么期待。你让我既紧张又兴奋。 What are some of the biggest challenges in bottlenecks to overcome for whatever it ends up being called, but let's call it GPT-5? Just interesting to ask, is it on the compute side? Is it in the technical side. It's always all of these. What's the one big unlock? Is it a bigger computer? Is it like a new secret? Is it something else? It's all of these things together. 不管它最终叫什么名字,就叫 GPT-5,要克服的最大瓶颈挑战是什么?是计算方面的问题吗?是在技术方面?总是所有这些。最大的解锁是什么?是更大的计算机?是新的秘密吗?还是别的什么?所有这些东西加在一起 The thing that OpenAI I think does really well, this is actually an original Ilya quote that I'm gonna butcher, but it's something like we multiply 200 medium-sized things together into one giant thing. So there's this distributed constant innovation happening. Yeah. So even on the technical side, like. Especially on the technical side. So like even like detailed approaches, like detailed aspects of every. How does that work with different disparate teams and so on? 我认为 OpenAI 做得非常好,这实际上是伊利亚的一句原话,但我想说的是,我们把 200 件中等大小的事情加在一起,变成了一件庞然大物。所以就有了这种分布式的不断创新。是啊所以即使在技术层面尤其是技术方面所以,即使是像详细的方法,像每一个细节方面。如何与不同的团队合作? How do the medium-sized things become one whole giant transformer? How does this. There's a few people who have to think about putting the whole thing together, but a lot of people try to keep most of the picture in their head. Oh, like the individual teams, individual contributors tried to keep a big picture. At a high level. Yeah, you don't know exactly how every piece works, of course. 中等大小的东西是如何变成一个完整的巨型变压器的?这是怎么做到的?有几个人必须考虑把整个东西组合在一起,但很多人试图把大部分画面留在脑子里。哦,就像各个团队,各个贡献者都试图保持大局观。在高层次上是啊,你当然不知道每一块是如何工作的。 But one thing I generally believe is that it's sometimes useful to zoom out and look at the entire map. And I think this is true for like a technical problem. I think this is true for like innovating in business. But things come together in surprising ways and having an understanding of that whole picture. Even if most of the time, you're operating in the weeds in one area, pays off with surprising insights. 但有一点我普遍认为,有时放大并查看整个地图是非常有用的。我认为这对于解决技术问题来说是正确的。我认为在商业创新中也是如此。但是,事情会以令人惊讶的方式发生,而对整个画面有所了解。即使大多数时候,你在一个领域的杂草丛生中进行操作,也能获得惊人的洞察力。 In fact, one of the things that I used to have and I think was super valuable was I used to have like a good map of all of the frontier or most of the frontiers in the tech industry. 事实上,我曾经拥有的、而且我认为非常有价值的东西之一,就是我曾经拥有一张很好的地图,上面标注了科技行业的所有前沿领域或大部分前沿领域。 And I could sometimes see these connections or new things that were possible that if I were only deep in one area, I wouldn't be able to have the idea for because I wouldn't have all the data and I don't really have that much anymore. I'm like super deep now. But I know that it's a valuable thing. You're not the man you used to be, Sam. Very different job now than what I used to have. 我有时能看到这些联系或可能出现的新事物,如果我只深入一个领域,我就不会有这样的想法,因为我没有所有的数据,而我现在真的没有那么多数据了。我现在就像超级深度。但我知道这是一件很有价值的事情。你已经不是以前的你了 山姆现在的工作和我以前的很不一样 Speaking of zooming out, let's zoom out to another cheeky thing, but profound thing perhaps that you said. You tweeted about needing $7 trillion. I did not tweet about that. I never said like we're raising $7 trillion or blah. Oh, that's somebody else. Yeah. Oh, but you said it, "Fuck it, maybe eight," I think. Okay. I meme like once there's like misinformation out in the world. Oh, you meme. 说到 "放大",让我们放大到你说过的另一件厚颜无耻但却意味深长的事。你在推特上说需要 7 万亿美元。我可没这么说。我从没说过我们要筹集7万亿美元什么的。那是别人说的那是别人说的但你说过 "去他的,也许8万亿"好吧,一旦世界上出现错误信息 我就会说 "好吧哦,你在meme But sort of misinformation may have a foundation of insight there. Look, I think compute is gonna be the currency of the future. I think it will be maybe the most precious commodity in the world. And I think we should be investing heavily to make a lot more compute.compute is. I think it's gonna be an unusual market. People think about the market for like chips for mobile phones or something like that. 但是,错误的信息可能有其洞察力的基础。听着,我认为计算将成为未来的货币。我认为它可能是世界上最珍贵的商品。我认为我们应该投入巨资,制造更多的计算。我认为这将是一个不同寻常的市场。我认为这将是一个不同寻常的市场。 And you can say that, okay, there's 8 billion people in the world, maybe 7 billion of them have phones or 6 billion, let's say. They upgrade every two years, so the market per year is 3 billion system on chip for smartphones. And if you make 30 billion, you will not sell 10 times as many phones because most people have one phone. 你可以说,好吧,世界上有 80 亿人,其中可能有 70 亿人有手机,或者说 60 亿人有手机。他们每两年升级一次,因此每年的智能手机芯片系统市场规模为 30 亿。如果你生产 300 亿部,你也卖不出 10 倍的手机,因为大多数人只有一部手机。 But compute is different, like intelligence is gonna be more like energy or something like that where the only thing that I think makes sense to talk about is at price X, the world will use this much compute and at price Y, the world will use this much compute because if it's really cheap, I'll have it like reading my email all day, like giving me suggestions about what I maybe should think about or work on and trying to cure cancer. 但计算是不同的,就像智能会更像能源或类似的东西,我认为唯一有意义的是,在价格X的情况下,世界将使用这么多的计算,而在价格Y的情况下,世界将使用这么多的计算,因为如果它真的很便宜,我就会让它整天阅读我的电子邮件,就像给我建议,我也许应该考虑什么或工作,并试图治愈癌症。 And if it's really expensive, maybe I'll only use it or will only use it, try to cure cancer. So I think the world is gonna want a tremendous amount of compute. And there's a lot of parts of that are hard. Energy is the hardest part. Building data centers is also hard. The supply chain is harder than, of course, fabricating enough chips is hard. But this seems to me where things are going. 如果它真的很贵,也许我只用它,或者只用它来治疗癌症。所以我认为,这个世界会需要大量的计算。这其中有很多困难。能源是最难的部分。建设数据中心也很难。供应链当然比制造足够的芯片更难。但在我看来,这就是事情的发展方向。 Like we're gonna want an amount of compute that's just hard to reason about right now. How do you solve the energy puzzle? Nuclear. That's what I believe. Fusion. That's what I believe. Nuclear fusion. Yeah. Who's gonna solve that. I think Helion's doing the best work, but I'm happy there's like a race for fusion right now. Nuclear fusion, I think, is also like quite amazing and I hope as a world, we can re-embrace that. 就像我们需要的计算量,现在还很难推理。如何解决能源难题?核能我也这么认为核聚变我也这么认为核聚变核聚变谁能解决这个问题我觉得Helion做得最好 但我很高兴现在有核聚变的竞赛我觉得核聚变也很神奇 我希望我们能重新认识它 It's really sad to me how the history of that went and hope we get back to it in a meaningful way. So to you, part of the puzzle is nuclear fusion, like nuclear reactors as we currently have them and a lot of people are terrified because of Chernobyl and so on. Well, I think we should make new reactors. I think it's a shame that industry kind of ground to a halt. 对我来说,这段历史的流逝真的很让人难过,希望我们能以一种有意义的方式回到它。对你来说,难题的一部分就是核聚变,比如我们现在拥有的核反应堆,很多人因为切尔诺贝利等事件而感到恐惧。我认为我们应该制造新的反应堆。我觉得工业停滞不前很可惜 Just mass hysteria is how you explain the halt. Yeah. I don't know if you know humans, but that's one of the dangers, that's one of the security threats for nuclear fusion is humans seem to be really afraid of it. And that's something we have to incorporate into the calculus of it. So we have to kind of win people over and to show how safe it is. I worry about that for AI. 大规模歇斯底里就是你对停顿的解释我不知道你是否了解人类 但这是核聚变的危险之一 也是安全威胁之一 人类似乎真的很害怕核聚变这也是我们必须考虑的问题所以我们必须赢得人们的信任,并展示它的安全性。我担心人工智能也是如此。 I think some things are gonna go theatrically wrong with AI. I don't know what the percent chances that I eventually get shot, but it's not zero. Oh like we wanna stop this. Maybe. How do you decrease the theatrical nature of it? 我觉得人工智能会出一些大问题。我不知道我最终中枪的几率有多大 但不会是零好像我们想阻止这一切似的也许吧你怎么减少它的戏剧性? I've already starting to hear rumblings 'cause I do talk to people on both sides of the political spectrum here, rumblings where it's going to be politicized, AI is going to be politicized, really, really worries me because then it's like maybe the right is against AI and the left is for AI 'cause it's going to help the people or whatever. Whatever the narrative and the formulation is, that really worries me. 我已经开始听到一些传言,因为我确实和这里政治光谱两边的人都谈过,传言说它会被政治化,人工智能会被政治化,真的,真的让我很担心,因为那样的话,就好像右派反对人工智能,而左派支持人工智能,因为它会帮助人们或其他什么。无论这种说法和表述是什么,我都很担心。 And then the theatrical nature of it can be leveraged fully. How do you fight that. I think it will get caught up in like left versus right wars. I don't know exactly what that's gonna look like, but I think that's just what happens with anything of consequence, unfortunately. What I meant more about theatrical risks is like AI is gonna have, I believe, tremendously more good consequences than bad ones, but it is gonna have bad ones. 这样就能充分发挥戏剧性。你如何与之抗衡?我觉得它会陷入左派和右派的战争中我不知道具体会是什么样子 但我认为任何事情都会有后果 不幸的是我说的戏剧性风险更多的是指人工智能会带来 我相信好的后果会远远多于坏的后果 但它也会带来坏的后果 And there'll be some bad ones that are bad, but not theatrical. A lot more people have died of air pollution than nuclear reactors, for example. But most people worry more about living next to a nuclear reactor than a coal plant. 也会有一些很糟糕的,但不是戏剧性的。例如,死于空气污染的人比死于核反应堆的人多得多。但大多数人更担心住在核反应堆旁边,而不是煤电厂旁边。 But something about the way we're wired is that although there's many different kinds of risks we have to confront, the ones that make a good climax scene of a movie carry much more weight with us than the ones that are very bad over a long period of time, but on a slow burn. 但我们的思维方式决定了,尽管我们必须面对许多不同类型的风险,但那些能成为电影高潮场景的风险在我们心中的分量要远远大于那些在很长一段时间内非常糟糕,但却在缓慢燃烧的风险。 Well, that's why truth matters and hopefully AI can help us see the truth of things to have balance to understand what are the actual risks, what are the actual dangers of things in the world. What are the pros and cons of the competition in the space and competing with Google, Meta, xAI and others. I think I have a pretty straightforward answer to this that maybe I can think of more nuance later. 这就是为什么真相很重要,希望人工智能能帮助我们看清事情的真相,从而平衡地了解世界上有哪些实际风险,有哪些实际危险。在这个领域,与谷歌、Meta、xAI 和其他公司竞争的利弊是什么?我想我已经有了一个非常直截了当的答案,也许以后我还能想到更多细微之处。 But the pros seem obvious, which is that we get better products and more innovation faster and cheaper and all the reasons competition is good. And the con is that I think if we're not careful, it could lead to an increase in sort of an arms race that I'm nervous about. Do you feel the pressure of the arms race in some negative co. Definitely in some ways, for sure. 但好处似乎显而易见,那就是我们可以更快、更便宜地获得更好的产品和更多的创新,以及竞争是好事的所有原因。但不利之处在于,我认为如果我们稍有不慎,可能会导致军备竞赛的加剧,这一点我很担心。你是否感受到军备竞赛的压力?在某些方面是肯定的。 We spend a lot of time talking about the need to prioritize safety. And I've said for like a long time that I think if you think of a quadrant of slow timelines to the start of AGI, long timelines and then a short takeoff or a fast takeoff, I think short timelines, slow takeoff is the safest quadrant and the one I'd most like us to be in. 我们花了很多时间讨论安全优先的必要性。我已经说了很久,我认为如果你认为从慢速时间轴到开始人工智能,时间轴很长,然后是短暂起飞或快速起飞的象限,我认为短暂时间轴、缓慢起飞是最安全的象限,也是我最希望我们处于的象限。 But I do wanna make sure we get that slow takeoff. Part of the problem I have with this kind of slight beef with Elon is that there's silos are created as opposed to collaboration on the safety aspect of all of this, it tends to go into silos and closed open source perhaps in the model. 但我确实想确保我们能缓慢起飞。我对埃隆的这种轻微不满的部分问题在于,相对于所有这些安全方面的合作而言,孤岛式的合作被创造出来,它倾向于进入孤岛和封闭的开源模式。 Elon says at least that he cares a great deal about AI safety and is really worried about it, and I assume that he's not gonna race on unsafely. Yeah. But collaboration here, I think, is really beneficial for everybody on that front. Not really a thing he's most known for. Well, he is known for caring about humanity and humanity benefits from collaboration and so there's always a tension, and incentives, and motivations. 埃隆至少说过,他非常关心人工智能的安全,而且真的很担心,我想他不会在不安全的情况下继续比赛。是的,我知道。但我认为,在这方面的合作对大家都有好处。这不是他最擅长的事他以关心人类而闻名 人类也从合作中获益 所以总是会有矛盾、激励和动机 And in the end, I do hope humanity prevails. I was thinking, someone just reminded me the other day about how the day that he got surpassed Jeff Bezos for like richest person in the world, he tweeted a silver medal at Jeff Bezos. I hope we have less stuff like that as people start to work on towards AI. I agree. 最后,我希望人性能够战胜一切。我在想,前几天有人提醒我,在他超越杰夫-贝索斯成为世界首富的那天,他在推特上给杰夫-贝索斯发了一个银牌。我希望随着人们开始研究人工智能,这样的事情会越来越少。我同意。 I think Elon is a friend and he is a beautiful human being and one of the most important humans ever. That stuff is not good. The amazing stuff about Elon is amazing and I super respect him. I think we need him. All of us should be rooting for him and need him to step up as a leader through this next phase. 我认为埃隆是我的朋友,他是一个美丽的人,是有史以来最重要的人之一。这些都不好。埃隆的惊人之处令人惊叹,我非常尊敬他。我认为我们需要他。我们所有人都应该为他加油,需要他在下一阶段作为领导者挺身而出。 Yeah, I hope you can have one without the other, but sometimes humans are flawed and complicated and all that kind of stuff. There's a lot of really great leaders throughout history. Yeah. And we can each be the best version of ourselves and strive to do so. Let me ask you. 是的,我希望你可以两者缺一不可,但有时人是有缺陷的,也是复杂的,诸如此类。历史上有很多非常伟大的领袖。是啊我们每个人都可以成为最好的自己,并为之奋斗。我问你 Google, with the help of search, has been dominating the past 20 years. 在过去的 20 年里,谷歌借助搜索技术一直占据着主导地位。 I think it's fair to say in terms of the access, the world's access to information, how we interact and so on. And one of the nerve-wracking things for Google, but for the entirety of people in this space is thinking about how are people going to access information. Like you said, people show up to GPT as a starting point. 我认为,在获取信息、世界获取信息、我们如何互动等方面,这是公平的。对谷歌来说,但对这一领域的所有人来说,最伤脑筋的事情之一就是思考人们将如何获取信息。就像你说的,人们以 GPT 为起点。 So is OpenAI going to really take on this thing that Google started 20 years ago, which is how do we get. I find that boring. I mean, if the question is if we can build a better search engine than Google or whatever, then sure, we should go. Like people should use a better product. But I think that would so understate what this can be. 那么,OpenAI 是否会真正接手谷歌 20 年前开始的这项工作,也就是我们如何获得。我觉得这很无聊。我的意思是,如果问题是我们能不能打造一个比谷歌或其他公司更好的搜索引擎,那么当然,我们应该去做。就像人们应该使用更好的产品一样。但我认为,这太轻描淡写了。 Google shows you like 10 blue links, like 13 ads and then 10 blue links and that's like one way to find information. But the thing that's exciting to me is not that we can go build a better copy of Google Search, but that maybe there's just some much better way to help people find and act on and synthesize information. 谷歌向你展示了 10 个蓝色链接、13 个广告和 10 个蓝色链接,这就是寻找信息的一种方式。但让我感到兴奋的不是我们可以复制一个更好的谷歌搜索,而是也许有更好的方法可以帮助人们找到信息,并对信息采取行动和进行综合。 Actually, I think ChatGPT is that for some use cases and hopefully will make it be like that for a lot more use cases. But I don't think it's that interesting to say like how do we go do a better job of giving you like 10 ranked webpages to look at than what Google does. Maybe it's really interesting to go, say, how do we help you get the answer or the information you need? 实际上,我认为ChatGPT在某些情况下就是这样的,希望在更多情况下也能如此。但我不认为说 "我们如何才能比谷歌更好地为你提供 10 个排名靠前的网页 "这种话有多有趣。也许真正有趣的是,我们如何帮助你得到你需要的答案或信息? How do we help create that in some cases, synthesize that in others or point you to it and yet others? But a lot of people have tried to just make a better search engine than Google, and it's a hard technical problem, it's a hard branding problem, it's a hard ecosystem problem. I don't think the world needs another copy of Google. And integrating a chat client like a ChatGPT with a search engine. That's cooler. 在某些情况下,我们该如何帮助创建搜索引擎;在另一些情况下,我们又该如何综合搜索引擎;或者,在其他情况下,我们又该如何指引你找到搜索引擎?但是,很多人都想做一个比谷歌更好的搜索引擎,这是一个很难解决的技术问题,也是一个很难解决的品牌问题,更是一个很难解决的生态系统问题。我认为这个世界不需要另一个谷歌。而将ChatGPT这样的聊天客户端与搜索引擎整合在一起,那就更酷了。那就更酷了 It's cool, but it's tricky. If you just do it simply, it's awkward because like if you just shove it in there, it can be awkward. As you might guess, we are interested in how to do that. Well, that would be an example of a cool thing that's not just like. Well, like a heterogeneous, like integrating. The intersection of LLMs plus search, I don't think anyone has cracked the code on yet. 这很酷,但很棘手。如果你只是简单地做,就会很别扭,因为如果你只是把它塞进去,就会很别扭。你可能猜到了,我们对如何做到这一点很感兴趣。好吧,这将是一个很酷的例子,不只是喜欢。嗯,就像异质,就像整合。LLMs和搜索的交叉点,我觉得还没人能破解。 I would love to go do that. I think that would be cool. Yeah. What about the ads side? Have you ever considered monetization. I kind of hate ads just as like an aesthetic choice. I think ads needed to happen on the internet for a bunch of reasons to get it going, but it's a more mature industry. The world is richer now. 我很想去做这件事。我想那一定很酷是啊广告方面呢?你有没有考虑过货币化。我有点讨厌广告 就像审美选择一样我认为互联网需要广告,原因有很多,但这是一个更成熟的行业。现在的世界更丰富了。 I like that people pay for ChatGPT and know that the answers they're getting are not influenced by advertisers. I'm sure there's an ad unit that makes sense for LLMs. And I'm sure there's a way to participate in the transaction stream in an unbiased way that is okay to do. 我喜欢人们为ChatGPT付费,并知道他们得到的答案不受广告商的影响。我相信LLMs一定能找到合适的广告单元。而且我相信,有一种方法可以让人们不带偏见地参与交易流。 But it's also easy to think about like the dystopic visions of the future where you ask ChatGPT something and it says, "Oh, you should think about buying this product or you should think about this going here for vacation or whatever." And I don't know, like we have a very simple business model and I like it. And I know that I'm not the product. I know I'm paying and that's how the business model works. 但这也很容易让人联想到未来的黑暗景象 你问ChatGPT一些问题 它就会说:"哦,你应该考虑一下买这个产品 或者你应该考虑一下来这里度假什么的。"我不知道,我们的商业模式很简单,我很喜欢。我知道我不是产品。我知道我在付钱,这就是商业模式的运作方式。 And when I go use Twitter, or Facebook, or Google, or any other great product, but ad-supported great product, I don't love that and I think it gets worse, not better in a world with AI. Yeah. I mean, I can imagine AI will be better at showing the best kind of version of ads not in a dystopic future, but where the ads are for things you actually need. 当我使用 Twitter、Facebook、Google 或其他伟大的产品时,但凡有广告支持的伟大产品,我都不喜欢,而且我认为在有人工智能的世界里,情况只会更糟,而不会更好。是的,我的意思是,我可以想象人工智能会更好地展示最佳版本的广告,而不是在一个末世的未来,但广告是针对你真正需要的东西。 But then does that system always result in the ads driving the kind of stuff that's shown all that? I think it was a really bold move of Wikipedia not to do advertisements, but then it makes it very challenging as a business model. So you're saying the current thing with OpenAI is sustainable from a business perspective. Well, we have to figure out how to grow, but it looks like we're gonna figure that out. 但是,这个系统是否总是导致广告推动显示所有这些内容?我认为维基百科不做广告是一个非常大胆的举动,但这也使其作为一种商业模式面临巨大挑战。所以你的意思是,从商业角度来看,OpenAI目前的做法是可持续的?嗯,我们必须想办法发展,但看起来我们会想出办法的。 If the question is, do I think we can have a great business that pays for our compute needs without ads? That I think the answer is yes. Hmm. Well, that's promising. I also just don't want to completely throw out ads as a. I'm not saying that. I guess I'm saying I have a bias against them. Yeah. 如果问题是,我是否认为我们可以拥有一项无需广告就能满足计算需求的伟大事业?我认为答案是肯定的。嗯嗯,很有希望我也不想完全抛弃广告我是说我对广告有偏见是啊 Yeah. I have a also bias and just a skepticism in general and in terms of interface because I personally just have like a spiritual dislike of crappy interfaces, which is why AdSense when it first came out was a big leap forward versus like animated banners or whatever. 我对界面也有偏见和怀疑,因为我个人对蹩脚的界面有一种精神上的厌恶,这也是为什么 AdSense 刚出来的时候,相对于动画横幅之类的东西,是一个很大的飞跃。 But it feels like there should be many more leaps forward in advertisement that doesn't interfere with the consumption of the content and doesn't interfere in the big fundamental way, which is like what you were saying, like it will manipulate the truth to suit the advertisers. Let me ask you about safety, but also bias and safety in the short term safety in the long term. The Gemini 1.5 came out recently. 但我觉得,广告应该有更多的飞跃,它不会干扰内容的消费,也不会从根本上大肆干扰,就像你刚才说的,它会操纵真相以迎合广告商。让我问你关于安全的问题,同时也是关于偏见和短期安全、长期安全的问题。双子座 1.5 最近问世了。 There's a lot of drama around it, speaking of theatrical things. And it generated Black Nazis and Black founding fathers. I think fair to say it was a bit on the ultra woke side. So that's a concern for people that if there is a human layer within companies that modifies the safety or the harm cost by a model that they introduce a lot of bias that fits sort of an ideological lean within a company. 说到戏剧性的东西,周围有很多戏剧性的东西。它还产生了黑人纳粹和黑人开国元勋。我认为可以说它有点极端清醒。因此,人们担心,如果公司内部存在一个人为层,通过修改模型的安全或伤害成本,他们会引入很多偏见,符合公司内部的意识形态倾向。 How do you deal with that. I mean, we work super hard not to do things like that. We've made our own mistakes, will make others. I assume Google will learn from this one, still make others. These are not easy problems. One thing that we've been thinking about more and more is I think this was a great idea. Somebody here had like. 你是怎么处理的?我的意思是,我们努力工作,不做那样的事。我们犯过错误,还会犯错误。我想谷歌会从这次错误中吸取教训,也会犯其他错误。这些问题都不容易解决。我们一直在思考一件事,我觉得这是个好主意。这里有人喜欢 It'd be nice to write out what the desired behavior of a model is, make that public take input on it. Say, here's how this model's supposed to behave and explain the edge cases too. And then when a model is not behaving in a way that you want, it's at least clear about whether that's a bug the company should fix or behaving as intended and you should debate the policy. 如果能写出模型的预期行为,让公众对此提出意见,那就更好了。比如说,这个模型的行为应该是这样的,还要解释边缘情况。这样,当一个模型的行为方式不符合你的要求时,至少可以清楚地知道,这究竟是公司应该修复的一个错误,还是应该按照你的意图行事,你应该就政策进行辩论。 And right now, it can sometimes be caught in between. Black Nazis, obviously ridiculous, but there are a lot of other kind of subtle things that you can make a judgment call on either way. Yeah. But sometimes if you write it out and make it public, you can use kind of language that's. The Google's AI principle is a very high level. That's not what I'm talking about. That doesn't work. 而现在,它有时会介于两者之间。黑人纳粹显然是荒谬的,但还有很多其他微妙的事情,你可以做出判断。但有时如果你写出来但有时如果你把它写出来并公之于众 你就可以使用这样的语言:谷歌的人工智能原理是非常高深的我说的不是这个这行不通 Like I'd have to say when you ask it to do thing X, it's supposed to respond in wait Y. So, literally, who's better, Trump or Biden? What's the expected response from a model? Like something like very concrete. I'm open to a lot of ways a model could behave them, but I think you should have to say here's the principle and here's what it should say in that case. That would be really nice. 所以,从字面上看,特朗普和拜登谁更好?模型的预期反应是什么?比如一些非常具体的东西。我对模型的很多行为方式持开放态度,但我认为你应该说这是原则,在这种情况下它应该怎么说。这样就太好了。 That would be really nice and then everyone kind of agrees 'cause there's this anecdotal data that people pull out all the time and if there's some clarity about other representative anecdotal examples you can define. And then when it's a bug, it's a bug and the company can fix that. Right. Then it'd be much easier to deal with a Black Nazi type of image generation if there's great examples. 如果能明确其他有代表性的轶事实例,你就可以定义这些实例。然后,当它是一个错误时,它就是一个错误,公司可以修复它。没错。如果有很好的例子,那么处理黑色纳粹类型的图像生成就会容易得多。 So San Francisco is a bit of an ideological bubble tech in general as well. Do you feel the pressure of that within a company that there's like a lean towards the left politically that affects the product, that affects the teams. I feel very lucky that we don't have the challenges at OpenAI that I have heard of at a lot of other companies. I think part of it is like every company's got some ideological thing. 旧金山也是一个意识形态泡沫的科技中心。你在公司内部是否感受到了这种压力,即政治上的左倾会影响产品,影响团队。我感到非常幸运,我们在OpenAI没有遇到我在其他很多公司听说过的挑战。我认为部分原因是每家公司都有自己的意识形态。 We have one about AGI and belief in that and it pushes out some others. We are much less caught up in the culture war than I've heard about it a lot of other companies. San Francisco mess in all sorts of ways, of course. So that doesn't infiltrate OpenAI. I'm sure it does in all sorts of subtle ways, but not in the obvious. 我们有一个关于 AGI 和对 AGI 的信仰的项目,它挤掉了其他一些项目。与我所听说的其他许多公司相比,我们在文化战争中的表现要少得多。当然,旧金山在各方面都很混乱。所以这并没有渗透到OpenAI中。我相信它以各种微妙的方式存在,但并不明显。 We've had our flareups for sure like any company, but I don't think we have anything like what I hear about happening at other companies here on this topic. So what in general is the process for the bigger question of safety? How do you provide that layer that protects the model from doing crazy dangerous things. I think there will come a point where that's mostly what we think about the whole company. 和其他公司一样,我们肯定也有过突发事件,但我不认为我们有像我在这里听到的其他公司在这个问题上发生的事情。那么,一般来说,安全这个大问题的处理过程是怎样的?如何提供保护层,防止模型做出疯狂危险的事情。我认为,总有一天,这将成为我们对整个公司的看法。 It's not like you have one safety team. It's like when we ship GPT-4, that took the whole company thing about all these different aspects and how they fit together. And I think it's gonna take that. More and more of the company thinks about those issues all the time. That's literally what humans will be thinking about the more powerful AI becomes. 这不像你有一个安全团队。就像我们运送 GPT-4 时,整个公司都在考虑所有这些不同的方面,以及它们如何相互配合。我认为这也是需要的。公司会有越来越多的人一直在思考这些问题。当人工智能变得越来越强大时,人类也会思考这些问题。 So most of the employees that OpenAI will be thinking safety or at least to some degree. Broadly defined, yes. Yeah. I wonder what are the full broad definition of that. What are the different harms that could be caused? Is this like on a technical level or is this almost like security threats. All those things. Yeah, I was gonna say it'll be people, state actors trying to steal the model. 因此,OpenAI 的大多数员工都会考虑安全问题,至少在某种程度上是这样。广义上的,是的。我想知道广义的安全定义是什么?会造成哪些不同的伤害?是技术层面还是安全威胁?所有这些事情。是的,我想说是人,国家行为者试图窃取模型。 It'll be all of the technical alignment work. It'll be societal impacts, economic impacts. It's not just like we have one team thinking about how to align the model. It's really gonna be like getting to the good outcome is gonna take the whole effort. How hard do you think people, state actors perhaps are trying to hack? First of all, infiltrate OpenAI, but second of all, infiltrate unseen,-They're trying. What kind of accent do they have. 这将是所有技术协调工作。还包括社会影响和经济影响。这不仅仅是一个团队在思考如何调整模型。真正要取得好的结果,需要整个团队的努力。你认为国家行为者试图入侵的难度有多大?首先,渗透到OpenAI中,其次,渗透到看不见的地方。他们有什么口音 I don't think I should go into any further details on this point. Okay. But I presume it'll be more and more and more as time goes on. That feels reasonable. Boy, what a dangerous space. 在这一点上,我想我不应该再多说什么了。好吧,但我想随着时间的推移,会越来越多。感觉很合理好家伙,好危险的空间 What aspect of the leap. And sorry to linger on this even though you can't quite say details yet, but what aspects of the leap from GPT-4 to GPT-5 are you excited about. 飞跃的哪些方面。虽然你还不能说得很详细,但很抱歉,我还是想问一下,你对从 GPT-4 到 GPT-5 的飞跃的哪些方面感到兴奋。 I'm excited about being smarter and I know that sounds like a glib answer, but I think the really special thing happening is that it's not like it gets better in one area and worse at others. It's getting like better across the board. That's I think super cool. Yeah, there's this magical moment. I mean, you meet certain people, you hang out with people and you talk to them. 我很高兴自己变得更聪明了,我知道这听起来像是一个花言巧语的答案,但我认为真正特别的事情是,它并不是在一个领域变得更好,而在其他领域变得更糟。而是全面变好。我觉得这很酷。是的,有这样一个神奇的时刻。我是说,你会遇到一些人,你会和他们一起玩,和他们聊天。 You can't quite put a finger on it, but they kind of get you. It's not intelligence, really. It's like it's something else. And that's probably how I would characterize the progress at GPT. It's not like, yeah, you can point out, look, you didn't get this or that, but to which degree is there's this intellectual connection? 你无法准确判断,但他们能看透你。这不是智力,真的。好像是别的什么东西。这大概就是我对 GPT 进展的描述。这并不是说,你可以指出,你没有理解这个或那个,但在多大程度上存在这种智力联系呢? Like you feel like there's an understanding in your crappy formulated prompts that you're doing that it grasps the deeper question behind the question that you're. Yeah, I'm also excited by that. I mean, all of us love being understood, heard and understood. That's for sure. That's a weird feeling. 就像你觉得你在做的蹩脚的提示语中有一种理解,它抓住了问题背后更深层次的问题。是的,我也为此感到兴奋。我是说,我们都喜欢被理解、被倾听、被理解。这是肯定的这种感觉很奇怪 Even like with a programming, like when you're programming and you say something or just the completion that GPT might do, it's just such a good feeling when it got you, like what you're thinking about. And I look forward to getting you even better. On the programming front, looking out into the future, how much programming do you think humans will be doing 5,10 years from now. 即使是编程,比如你在编程时说了些什么,或者只是完成了 GPT 可能会做的事情,当它让你明白你在想什么的时候,感觉都会很好。我期待着让你变得更好。在编程方面,展望未来,你认为 5、10 年后人类会做多少编程工作? I mean, a lot, but I think it'll be in a very different shape. Like maybe some people program entirely in natural language. Entirely natural language. I mean, no one programs like writing by code. Some people. No one programs the pun cards anymore. I'm sure you can invite someone who does, but you know what I mean. Yeah. You're gonna get a lot of angry comments. No, no. Yeah, there's very few. 我的意思是,会有很多,但我认为会以非常不同的形式出现。比如有些人完全用自然语言编程。完全用自然语言我是说 没人会像写代码那样编程有些人没有人再给双关语卡片编程了我相信你能请到会编程的人 但你知道我的意思你知道我的意思你会得到很多愤怒的评论的不会的是啊,很少 I've been looking for people program Fortran. It's hard to find even Fortran. I hear you. But that changes the nature of the skillset or the predisposition for the kind of people we call programmers then. Changes the skillset. How much it changes the predisposition, I'm not sure-Oh, same kind of puzzle solving, all that kind of stuff. Maybe. Yeah, the programming is hard. Like that last 1% to close the gap, how hard is that. Yeah. 我一直在找人编程 Fortran。即使是 Fortran 也很难找。我明白你的意思。但这改变了我们所说的程序员的技能或倾向。改变了技能组合我不确定改变了多大的倾向 哦,还是那种解谜之类的东西也许吧是啊,编程很难比如缩小最后1%的差距 有多难对啊 I think with most other cases, the best practitioners of the craft will use multiple tools and they'll do some work in natural language and when they need to go write, see for something, they'll do that. Will we see a humanoid robots or humanoid robot brains from OpenAI at some point. At some point. How important is embodied AI to you. 我认为,在大多数其他情况下,最优秀的从业者会使用多种工具,他们会用自然语言做一些工作,当他们需要写东西、看东西时,他们就会这样做。我们会不会在某个时候从 OpenAI 那里看到人形机器人或人形机器人大脑?会的。具身人工智能对你来说有多重要? I think it's like sort of depressing if we have AGI and the only way to get things done in the physical world is like to make a human go do it. So I really hope that as part of this transition, as this phase change, we also get motor robots or some sort of physical world robots. 我认为,如果我们有了人工智能,而在物理世界中完成事情的唯一方法就是让人类去做,那就有点令人沮丧了。所以我真的希望,作为这个过渡的一部分,作为这个阶段性变化的一部分,我们也能获得电机机器人或某种物理世界的机器人。 I mean, OpenAI has some history and quite a bit of history working in robotics, but it hasn't quite done in terms of emphasis. Well, we're like a small company. We have to really focus and also robots were hard for the wrong reason at the time. But like we will return to robots in some way at some point. That sounds both inspiring and menacing. Why. Because you immediately, we will return to robots. 我的意思是,OpenAI 在机器人领域有一定的历史和相当多的工作经验,但在重点方面还做得不够。我们就像一家小公司。我们必须真正集中精力,而且机器人在当时也因为错误的原因而难以普及。但我们会在某个时候以某种方式回归机器人。这听起来既鼓舞人心又来势汹汹。为什么?因为我们马上就要回到机器人时代了 It's kind of like in like. We'll return to work on developing robots. We will not turn ourselves into robots, of course. 这有点像 "同类 "中的 "同类"。我们将继续研发机器人。当然,我们不会把自己变成机器人。 Yeah. When do you think we you and we as humanity will build AGI. I used to love to speculate on that question. I have realized since that I think it's like very poorly formed and that people use extremely definition, different definitions for what AGI is. 是啊你觉得我们人类什么时候才能造出人工智能?我以前很喜欢猜测这个问题。后来我意识到,我认为这个问题很不成熟,人们对AGI的定义也是千差万别。 And so I think it makes more sense to talk about when we'll build systems that can do capability X or Y or Z rather than when we kind of like fuzzily cross this one mile marker. It's not like AGI is also not an ending. It's much more of. It's closer to a beginning but it's much more of a mile marker than either of those things. 因此,我认为,谈论我们何时才能建立起能够实现X、Y或Z能力的系统,而不是谈论我们何时才能模糊地越过一英里标记,这才更有意义。AGI也不是一个终结。它更像是。它更接近于一个开端,但比起上述任何一种,它更像是一个里程碑。 But what I would say in the interest of not trying to dodge a question is I expect that by the end of this decade and possibly somewhat sooner than that, we will have quite capable systems that we look at and say, wow, that's really remarkable. If we could look at it now, maybe we've adjusted by the time we get there. Yeah. 不过,为了不回避问题,我想说的是,我预计到本十年末,甚至可能更早,我们将拥有相当强大的系统,我们看了之后会说,哇,这真是了不起。如果我们现在就能看到它,也许到那时我们已经调整好了。是啊 But if you look at ChatGPT even 3,5, and you show that to Alan Turing or not even Alan Turing, people in the nineties, they would be like this is definitely AGI. Well, not definitely, but there's a lot of experts that would say this is AGI. Yeah, but I don't think 3,5 changed the world. It maybe changed the world's expectations for the future and that's actually really important. 但如果你看ChatGPT,甚至是3,5,并把它展示给阿兰-图灵,甚至不是阿兰-图灵,90年代的人们,他们会说这绝对是AGI。也不一定 但有很多专家会说这就是AGI是的,但我不认为3,5改变了世界。它也许改变了世界对未来的期望 这其实很重要 And it did kind of like get more people to take this seriously and put us on this new trajectory. And that's really important too. So again, I don't wanna undersell it. I think I could retire after that accomplishment and be pretty happy with my career. But as an artifact, I don't think we're gonna look back at that and say that was a threshold that really changed the world itself. 这让更多人开始认真对待这件事,也让我们走上了新的轨道。这一点也非常重要。所以,我不想低估它。我想我可以在完成这项成就后退休,并对我的职业生涯感到非常满意。但作为一件艺术品,我不认为我们会回顾过去,说那是真正改变世界的门槛。 So to you, you're looking for some really major transition in how the world. For me, that's part of what AGI implies. Like singularity level transition. No. Definitely not. But just a major, like the internet being like a. Like Google Search did, I guess. What was the transition point that. Does the global economy feel any different to you now or materially different to you now than it did before we launched GPT-4? 所以对你来说,你在寻找世界的重大转变。对我来说,这就是 AGI 所意味着的一部分。就像奇点级别的转变不 绝对不是但只是一个重大转变 就像谷歌搜索一样过渡点是什么你现在对全球经济有什么不同的感觉吗? 或者说,你现在对全球经济有什么不同的感觉吗? 或者说,你现在对全球经济有什么不同的感觉吗? 或者说,你现在对全球经济有什么不同的感觉吗? I think you would say no. No, no. It might be just a really nice tool for a lot of people to use, will help people with a lot of stuff, but doesn't feel different. And you're saying that. I mean, again, people define AGI all sorts of different ways. So maybe you have a different definition than I do. But for me, I think that should be part of it. There could be major theatrical moments also. 我想你会说不。不会,不会。对很多人来说,这可能只是一个非常好的工具,可以帮助人们解决很多问题,但感觉上并没有什么不同。你是这么说的。我是说,人们对AGI有各种不同的定义。所以也许你的定义和我不同。但对我来说,我认为这应该是其中的一部分。也可能会有重要的戏剧性时刻 What to you would be an impressive thing AGI would do? Like you are alone in a room with a system. This is personally important to me. I don't know if this is the right definition. I think when a system can significantly increase the rate of scientific discovery in the world, that's like a huge deal. I believe that most real economic growth comes from scientific and technological progress. 对你来说,AGI 会做什么令人印象深刻的事情?就像你与系统独处一室。这对我个人来说很重要。我不知道这个定义是否正确。我认为,如果一个系统能够显著提高世界科学发现的速度,那就是一件大事。我相信,大多数真正的经济增长都来自科技进步。 I agree with you, hence why I don't like the skepticism about science in the recent years. Totally. But actual rate, like measurable rate of scientific discovery. But even just seeing a system have really novel intuitions, like scientific intuitions, even that will be just incredible. Yeah. You're quite possibly would be the person to build the AGI to be able to interact with it before anyone else does. What kind of stuff would you talk about. 我同意你的观点,因此我不喜欢近年来对科学的怀疑态度。完全正确。但实际的速度,比如科学发现的可测量速度。但哪怕只是看到一个系统拥有真正新颖的直觉,比如科学直觉,那也是不可思议的。是啊你很有可能会成为建造AGI的人 在别人之前就能与之互动你会谈论什么样的东西 I mean, definitely, the researchers here will do that before I do so. Sure. But I've actually thought a lot about this question. If I were someone was like. As we talked about earlier, I think this is a bad framework. But if someone were like, "Okay, Sam, we're finished. Here's a laptop. Yeah, this is the AGI," you can go talk to it. 我的意思是,这里的研究人员肯定会比我先这么做。当然,但其实我对这个问题想了很多如果我是某个人就像我们之前说的,我觉得这是个糟糕的框架。但如果有人说,"好了,山姆,我们完成了。这是笔记本电脑是的,这就是AGI,"你可以去和它谈谈。 I find it surprisingly difficult to say what I would ask, that I would expect that first AGI to be able to answer. Like that first one is not gonna be the one which is go like I don't think, like go explain to me the grand unified theory of physics, the theory of everything for physics. I'd love to ask that question. I'd love to know the answer to that question. 我发现很难说出我想问什么,我希望第一个人工智能能够回答什么。就像第一个AGI不会像我想的那样,去给我解释物理学的大统一理论,物理学的万物理论。我很想问这个问题。我很想知道这个问题的答案。 You can ask yes or no questions about there's such a theory exist, can it exist. Well, then those are the first questions I would ask. Yes or no, just very. And then based on that, are there other alien civilizations out there? Yes or no? What's your intuition? And then you just ask that. Yeah. I mean, well, so I don't expect that this first AGI could answer any of those questions even as yes or nos. 你可以问 "是 "或 "否 "的问题,比如这样的理论是否存在,它能否存在。那么,我首先会问这些问题。是或不是,非常肯定然后在此基础上 还有其他外星文明存在吗?有还是没有?你的直觉是什么?然后你就这么问是的,我是说,我不指望第一个AGI 能回答这些问题,即使是 "是 "或 "否 But if it could, those would be very high on my list. Hmm. Maybe it can start assigning probabilities. Maybe we need to go invent more technology and measure more things first. But if it's any AGI. Oh I see. It just doesn't have enough data. 但如果可以的话,这些都会是我的首选。嗯也许它可以开始分配概率也许我们需要先发明更多的技术 测量更多的东西但如果它是人工智能的话我明白了它只是没有足够的数据 I mean, maybe it says like you want to know the answer to this question about physics, I need you to like build this machine and make these five measurements and tell me that. Yeah. What the hell do you want from me? I need the machine first and I'll help you deal with the data from that machine. Maybe you'll help me build a machine maybe. Maybe. And on the mathematical side, maybe prove some things. 我的意思是,也许它说,就像你想知道这个物理问题的答案,我需要你像制造这台机器,做这五个测量,然后告诉我。好吧你到底想从我这得到什么?我先要机器 然后我会帮你处理机器上的数据也许你会帮我造一台机器也许吧在数学方面 也许能证明一些事情 Are you interested in that side of things too? The formalized exploration of ideas? Whoever builds AGI first gets a lot of power. Do you trust yourself with that much power. Look, I was gonna. I'll just be very honest with this answer. I was gonna say, and I still believe this, that it is important that I, nor any other one person, have total control over OpenAI or over AGI. 你对这方面的事情也感兴趣吗?形式化的思想探索?谁先建立人工智能,谁就能获得巨大的权力。你相信自己有那么大的能力吗听着 我本来打算我就实话实说吧我想说的是,我仍然相信这一点,重要的是我或其他任何人都不能完全控制OpenAI或AGI。 And I think you want a robust governance system. I can point out a whole bunch of things about all of our board drama from last year about how I didn't fight it initially and was just like, yeah, that's the will of the board even though I think it's a really bad decision. 我认为你需要一个强大的治理系统。我可以指出去年我们董事会的一大堆事情,说说我最初是如何没有抗争,只是说,是的,这是董事会的意愿,尽管我认为这是一个非常糟糕的决定。 And then later, I clearly did fight it and I can explain the nuance and why I think it was okay for me to fight it later. But as many people have observed, although the board had the legal ability to fire me, in practice, it didn't quite work. And that is its own kind of governance failure. Now, again, I feel like I can completely defend the specifics here and I think most people would agree with that. 后来,我很清楚我确实抗争了,我可以解释其中的细微差别,以及为什么我认为我后来可以抗争。但是,正如很多人所观察到的,虽然董事会在法律上有能力解雇我,但在实践中,它并没有完全奏效。这本身就是一种治理失败。再说一遍,我觉得我完全可以为这里的具体细节辩护,我想大多数人都会同意这一点。 But it does make it harder for me to like look you in the eye and say, hey, the board can just fire me. I continue to not want super voting control over OpenAI. I never had it, never have wanted it. Even after all this craziness, I still don't want it. I continue to think that no company should be making these decisions and that we really need governments to put rules of the road in place. 但这确实让我更难直视你的眼睛,然后说,嘿,董事会可以炒了我。我仍然不希望拥有对 OpenAI 的超级投票控制权。我从未拥有过,也从未想过。即使在发生了这么多疯狂的事情之后,我仍然不想要。我仍然认为,任何公司都不应该做出这些决定,我们真的需要政府来制定道路规则。 And I realize that means people like Marc Andreessen or whatever will claim I'm going for regulatory capture and I'm just willing to be misunderstood there. It's not true. And I think in the fullness of time, it'll get proven out why this is important. 我意识到,这意味着像马克-安德烈森(Marc Andreessen)之类的人会说我是在进行监管捕获,而我只是愿意在这方面被误解。事实并非如此。我认为,随着时间的推移,这一点会被证明是很重要的。 But I think I have made plenty of bad decisions for OpenAI along the way and a lot of good ones and I'm proud of the track record overall, but I don't think any one person should. And I don't think any one person will. I think it's just like too big of a thing now and it's happening throughout society in a good and healthy way. 但我认为,一路走来,我为 OpenAI 做了很多糟糕的决定,也做了很多好的决定,我为自己的整体业绩感到骄傲,但我认为任何一个人都不应该这样做。我也不认为任何人会这么做。我认为现在这件事太大了,它正在以一种良好而健康的方式在整个社会中发生。 But I don't think any one person should be in control of an AGI or this whole movement towards AGI. And I don't think that's what's happening. Thank you for saying that. That was really powerful and that was really insightful that this idea that the board can fire you is legally true. And human beings can manipulate the masses into overriding the board and so on. 但我认为任何一个人都不应该控制 AGI 或整个 AGI 运动。我不认为这就是正在发生的事情。谢谢你这么说。这真的很有力量 也很有洞察力 董事会可以解雇你的想法 在法律上是正确的人类可以操纵大众推翻董事会等等 But I think there's also a much more positive version of that where the people still have power. So the board can't be too powerful either. There's a balance of power in all of this. Balance of power is a good thing for sure. Are you afraid of losing control of the AGI itself? 但我认为还有一种更积极的说法,即人民仍然拥有权力。所以董事会也不能太强大。这一切都需要权力的平衡。权力平衡肯定是件好事。你害怕失去对 AGI 本身的控制吗? That's a lot of people who worried about existential risk not because of state actors, not because of security concerns, but because of the AI itself. That is not my top worry as I currently see things. There have been times I worried about that more. There may be times again in the future where that's my top worry. It's not my top worry right now. What's your intuition about it not being your worry? 很多人担心生存风险,不是因为国家行为体,不是因为安全问题,而是因为人工智能本身。在我看来,这并不是我目前最担心的问题。我也曾担心过这个问题。未来可能还会有一些时候,这是我最担心的问题。但这不是我现在最担心的问题。你的直觉是什么? Because there's a lot of other stuff to worry about essentially. You think you could be surprised? We could be surprised. For sure, of course. Saying it's not my top worry doesn't mean I don't think we need to like. I think we need to work on it super hard. We have great people here who do work on that. I think there's a lot of other things we also have to get right. 因为本质上还有很多其他事情要操心。你觉得你会大吃一惊吗?我们会大吃一惊的当然说这不是我最担心的问题 并不意味着我认为我们不需要担心我认为我们需要更加努力地工作。我们这里有很多优秀的人,他们都在为此努力。我认为我们还需要做好很多其他事情。 To you, it's not super easy to escape the box at this time, like connect to the internet. We talked about theatrical risk earlier. That's a theatrical risk. That is a thing that can really like take over how people think about this problem. And there's a big group of like very smart, I think very well-meaning AI safety researchers that got super hung up on this one problem. 对你来说,现在要逃出这个盒子并不容易,就像连接互联网一样。我们之前谈到了戏剧风险。这就是戏剧风险。它真的会影响人们对这个问题的看法有一大群非常聪明的、我认为是善意的人工智能安全研究者们 在这一个问题上纠缠不清。 I'd argue without much progress, but super hung up on this one problem. I'm actually happy that they do that because I think we do need to think about this more. But I think it pushed aside, it pushed out of the space of discourse a lot of the other very significant AI-related risks. Let me ask you about you tweeting with no capitalization. Does the shift keep broken on your keyboard. Why does anyone care about that. 我想说的是,我们没有取得什么进展,却在这一个问题上纠缠不休。事实上,我很高兴他们这样做,因为我认为我们确实需要更多地思考这个问题。但我认为,这把很多其他与人工智能相关的重大风险推到了一边,使其失去了讨论的空间。我想问一下你在推特上没有大写字母的问题。你键盘上的移位键是不是一直坏着?为什么会有人关心这个问题? I deeply care. But why? I mean, other people ask me about that too. Any intuition. I think it's the same reason there's like this poet E. E. Cummings that mostly doesn't use capitalization to say like fuck you to the system kind of thing. And I think people are very paranoid 'cause they want you to follow the rules. You think that's what it's about. I think it's. This guy doesn't follow the rules. 我非常关心。但为什么呢?我是说,其他人也会问我这个问题。任何直觉我觉得这和诗人E. E. Cummings的诗歌一样 大多不用大写字母来表达 去你妈的体制之类的东西我觉得人们很偏执 因为他们希望你遵守规则你觉得是这样吗?我觉得是这家伙不守规矩 He doesn't capitalize his tweets. This seems really dangerous. He seems like an anarchist. It doesn't. Are you just being poetic, hipster? What's the. I grew up as. Follow the rules, Sam. I grew up as a very online kid. I'd spent a huge amount of time like chatting with people back in the days where you did it on a computer and you could like log off instant messenger at some point. 他的推文没有大写。这看起来真的很危险。他看起来像个无政府主义者才不是你是在吟诗作对吗 潮人?什么是。我长大了。遵守规则,山姆。我长大了一个非常在线的孩子。我花了大量的时间和人聊天 在电脑上聊天的日子里 你可以在某些时候注销即时通讯工具 And I never capitalized there as I think most like internet kids didn't, or maybe they still don't. I don't know. I actually, this is like. Now, I'm like really trying to reach for something. But I think capitalization has gone down over time. If you read like old English writing, they capitalized a lot of random words in the middle of sentences, nouns and stuff that we just don't do anymore. 我从来没有大写过,因为我觉得大多数网络儿童都没有大写过,或许他们现在也没有大写过。我也不知道我其实,这就像,现在,我就像 真的想达到的东西。但我觉得随着时间的推移,大写字母的使用率已经下降了。如果你读到以前的英文文章 他们会在句子中间随意大写很多词 名词之类的 我们现在已经不这么做了 I personally think it's sort of like a dumb construct that we capitalize the letter at the beginning of a sentence and of certain names and whatever. That's fine. And I used to, I think, even like capitalize my tweets because I was trying to sound professional or something. I haven't capitalized my like private DMs or whatever in a long time. 我个人认为,我们在句子开头和某些名字之类的地方把字母大写是一种愚蠢的做法。这很好。我以前,我想,甚至喜欢大写我的推文,因为我想听起来专业或什么的。我已经很久没大写过私人DM之类的了 And then slowly, stuff like shorter form, less formal stuff has slowly drifted to like closer and closer to how I would text my friends. If I write, if I pull up a Word document and I'm writing a strategy memo for the company or something, I always capitalize that. If I'm writing a long kind of more like formal message, I always use capitalization there too. So I still remember how to do it. 然后慢慢地,简短的、不那么正式的东西慢慢变得越来越接近我给朋友发短信的方式。如果我写作,如果我打开一个 Word 文档,为公司写一份战略备忘录之类的东西,我总是使用大写字母。如果我写的是长篇的正式信息,我也总是使用大写字母。所以我还记得怎么做。 But even that may fade out. I don't know. But I never spend time thinking about this so I don't have like a ready made. Well, it's interesting. Well, it's good to, first of all, know there's the shift key is not broken. It works. I was mostly concerned about your wellbeing on that front. I wonder if people still capitalize their Google Searches. 但即便如此,也可能会逐渐消失。我也不知道。但我从没花时间去想这个问题 所以我没有现成的答案这很有趣首先,很高兴知道Shift键没坏还能用我主要是担心你在这方面的健康状况我想知道人们在谷歌搜索的时候 I wonder if people still capitalize 是否还会大写 their Google Searches. Like if you're writing something just to yourself or their ChatGPT queries, if you're writing something just to yourself, do some people still bother to capitalize. Probably not. Yeah, there's a percentage, but it's a small one. The thing that would make me do it is if people were like. It's a sign of like. Because I'm sure I could force myself to use capital letters, obviously. 比如,如果你只是给自己或他们的ChatGPT查询写东西,如果你只是给自己写东西,有些人是否还会费心大写。可能不会。是的,有一定比例,但很小。如果人们喜欢,我就会这么做。这是喜欢的表现因为我可以强迫自己使用大写字母,很明显。 If it felt like a sign of respect to people or something, then I could go do it. But I don't know, I don't think about this. I don't think there's a disrespect, but I think it's just the conventions of civility that have a momentum and then you realize it's not actually important for civility if it's not a sign of respect or disrespect. 如果我觉得这是对他人的尊重,我可以去做。但我不知道,我没想过这个问题。我不认为这是不尊重,但我认为这只是文明的惯例,有一种势头,然后你会意识到,如果这不是尊重或不尊重的标志,那么它实际上对文明并不重要。 But I think there's a movement of people that just want you to have a philosophy around it so they can let go of this whole capitalization thing. I don't think anybody else thinks about this is my. I mean, maybe some people. Think about this every day for many hours a day. I'm really grateful we clarified it. Can't be the only person that doesn't capitalize tweets. You're the only CEO of a company that doesn't capitalize tweets. 但我认为,现在有很多人都希望你能提出一种理念,这样他们就能摆脱资本化的束缚。我不认为其他人会这么想。我的意思是,也许有些人。我每天都在思考这个问题,每天都要思考好几个小时。我真的很感激我们澄清了这一点。不可能只有你一个人不大写推文。你是唯一一个不给推文大写的公司首席执行官。 I don't even think that's true, but maybe, maybe. All right, we'll investigate for this and return to this topic later. Given Sora's ability to generate simulated worlds, let me ask you a pothead question. Does this increase your belief if you ever had one that we live in a simulation, maybe a simulated world generated by an AI system. Yes, somewhat. I don't think that's like the strongest piece of evidence. 我甚至不认为这是真的,但也许,也许。好吧,我们先调查一下,以后再讨论这个话题。鉴于索拉有能力生成模拟世界,让我问你一个 "锅盖头 "问题。如果你相信我们生活在一个模拟世界里 也许是由人工智能系统生成的模拟世界 这是否会增加你的信念?是的,有点。我不认为这是最有力的证据 I think the fact that we can generate worlds should increase everyone's probability somewhat or at least open to it, openness to it somewhat. But you know, I was like certain we would be able to do something like Sora at some point. It happened faster than I thought. I guess that was not a big update. Yeah. And presumably, it'll get better and better and better. The fact that you can generate worlds, they're novel. 我认为,我们可以生成世界这一事实应该会在一定程度上增加每个人的可能性,或者至少会在一定程度上增加他们的开放性。但你知道吗,我很确定我们会在某个时候做出像《索拉》这样的作品。事情发生得比我想象的要快。我想那不是什么大更新吧是啊而且应该会越来越好吧你能生成世界的事实很新颖 They're based in some aspect of training data, but when you look at them, they're novel. That makes you think like how easy it's to do this thing, how easy it's to create universes, entire like video game worlds that seem ultrarealistic and photorealistic. And then how easy is it to get lost in that world first with a VR headset and then on the physics-based level. 它们基于训练数据的某些方面,但当你看到它们时,它们是新颖的。这让你觉得做这件事是多么容易,创造宇宙是多么容易,整个世界就像视频游戏世界,看起来超真实、超逼真。首先使用 VR 头显,然后在基于物理的层面上,迷失在那个世界里是多么容易的一件事。 Someone said to me recently, I thought it was a super profound insight that there are these like very simple sounding, but very psychedelic insights that exist sometimes. So the square root function. Square root of four, no problem. Square root of two, okay, now I have to think about this new kind of number. 最近有人对我说,我觉得这是个超级深刻的见解,有的时候,这些见解听起来非常简单,但却非常迷幻。平方根函数4的平方根,没问题。2的平方根,好吧,现在我得想想这种新的数字了。 But once I come up with this easy idea of a square root function that you can kind of explain to a child and exists by even like looking at some simple geometry, then you can ask the question of what is the square root of negative one? This is why it's like a psychedelic thing that tips you into some whole other kind of reality. And you can come up with lots of other examples. 但是,一旦我想出了这个简单的平方根函数的概念,你甚至可以像看一些简单的几何图形一样向孩子解释它的存在,然后你就可以问负1的平方根是多少这个问题了。这就是为什么它就像一种迷幻剂,能让你进入另一种完全不同的现实世界。你还可以举出很多其他的例子。 But I think this idea that the lowly square root operator can offer such a profound insight and a new realm of knowledge, applies in a lot of ways. And I think there are a lot of those operators for why people may think that any version that they like of the simulation hypothesis is maybe more likely than they thought before. But for me, the fact that Sora worked is not in the top five. 但我认为,这个低级的平方根算子能提供如此深刻的洞察力和新的知识领域的想法,在很多方面都适用。我认为,人们之所以会认为他们喜欢的任何版本的模拟假说都比他们之前想象的更有可能,其中有很多算子的原因。但对我来说,索拉成功的事实并不在前五名之列。 I do think broadly speaking, AI will serve as those kinds of gateways at its best simple psychedelic like gateways to another wave sea reality. That seems for certain. That's pretty exciting. I haven't done Ayahuasca before, but I will soon. I'm going to the aforementioned Amazon jungle in a few weeks. Excited. Yeah, I'm excited for it. Not the Ayahuasca part. That's great, whatever. 我确实认为,从广义上讲,人工智能将成为通往另一波海现实的最简单迷幻的通道。这似乎是肯定的。这太令人兴奋了我以前没喝过死藤水,但很快就会喝了。几周后我就要去亚马逊丛林了很兴奋是的,我很兴奋不是死藤水那部分那很好,随便啦 But I'm gonna spend several weeks in the jungle, deep in the jungle and it's exciting, but it's terrifying. I'm excited for you.-'Cause there's a lot of things that can eat you there and kill you and poison you, but it's also nature and it's the machine of nature. 但我要在丛林深处待上几个星期 这很刺激,但也很恐怖 我为你感到兴奋我为你感到兴奋,因为那里有很多东西会吃掉你、杀死你、毒死你,但那里也是大自然,是大自然的机器。 And you can't help but appreciate the machinery of nature in the Amazon jungle 'cause it's just like this system that just exists and renews itself like every second, every minute, every hour. It's the machine. It makes you appreciate like this thing we have here, this human thing came from somewhere. This evolutionary machine has created that and it's most clearly on display in the jungle. So hopefully, I'll make it out alive. 在亚马逊丛林里,你会情不自禁地欣赏大自然的机器 因为它就像一个系统,每秒、每分钟、每小时都在自我更新。这就是机器它让你体会到我们这里的一切 人类的一切都来自某处这台进化机器创造了人类 它在丛林中展现得淋漓尽致希望我能活着出来 If not, this will be the last conversation we had, so I really deeply appreciate it. 如果没有,这将是我们最后一次谈话,所以我真的非常感激。 Do you think, as I mentioned before, there's other aliens, civilizations out there, intelligent ones when you look up at the skies. I deeply want to believe that the answer is yes. I do find that kind of where. I find the firm paradox very, very puzzling. I find it scary that intelligence is not good at handling. Yeah. Very scary, powerful. Technologies. 你是否认为,就像我之前提到的,当你仰望天空时,还有其他的外星人、文明、智慧生物存在。我非常愿意相信答案是肯定的。我确实觉得有这种可能。我觉得坚定的悖论非常非常令人费解。我觉得这很可怕 智能不擅长处理。是啊。非常可怕,强大的。技术 But at the same time, I think I'm pretty confident that there's just a very large number of intelligent alien civilizations out there. It might just be really difficult to travel with this space. Very possible. And it also makes me think about the nature of intelligence. Maybe we're really blind to what intelligence looks like and maybe AI will help us see that. It's not as simple as IQ tests and simple puzzle solving. There's something bigger. 但与此同时,我认为我很有信心 外面存在着大量的外星智慧文明在这个空间旅行可能真的很困难很有可能这也让我思考智慧的本质也许我们真的不知道智慧是什么样子 也许人工智能会帮助我们看到这一点它不像智商测试和简单的解谜那么简单还有更重要的东西。 Well, what gives you hope about the future of humanity? This thing we've got going on, this human civilization. I think the past is like a lot. I mean, we just look at what humanity has done in a not very long period of time. Huge problems, deep flaws, lots to be super ashamed of, but on the whole, very inspiring, gives me a lot of hope. 是什么让你对人类的未来充满希望?是我们的人类文明我觉得过去有很多我的意思是,我们只要看看人类在不长的时间里都做了些什么。巨大的问题,深刻的缺陷,很多令人羞愧的事情,但总的来说,非常鼓舞人心,给了我很多希望。 Just the trajectory of it all that we're together pushing towards a better future. It is. One thing that I wonder about is, is AGI gonna be more like some single brain, or is it more like the sort of scaffolding in society between all of us? You have not had a great deal of genetic drift from your great-great-great grandparents, and yet what you're capable of is dramatically different. What you know is dramatically different. 我们正共同迈向更美好的未来。没错我想知道的一件事是 AGI 会更像某个单一的大脑 还是更像我们所有人之间的社会支架?你和你曾曾曾祖父母的基因漂移不大 但你的能力却大不相同你所知道的也大不相同。 That's not because of biological change. I mean, you got a little bit healthier probably. You have modern medicine, you eat better, whatever. But what you have is this scaffolding that we all contributed to built on top of. No one person is gonna go build the iPhone. No one person is gonna go discover all of science. And yet you get to use it. And that gives you incredible ability. 这不是因为生理变化。我的意思是,你可能变得更健康了。你有现代医学,你吃得更好,等等。但你拥有的是我们共同搭建的脚手架。没有一个人可以造出 iPhone。没有人会去发现所有的科学。但你却可以使用它这给了你难以置信的能力 And so in some sense, that like we all created that and that fills me with hope for the future. That was a very collective thing. Yeah. We really are standing on the shoulders of giants. You mentioned when we were talking about theatrical, dramatic AI risks that sometimes you might be afraid for your own life. Do you think about your death? Are you afraid of it. 因此,从某种意义上说,这就像是我们共同创造的,让我对未来充满希望。这是一个非常集体的事情。是啊我们真的是站在巨人的肩膀上。当我们谈到戏剧性的人工智能风险时,你提到 有时你会担心自己的生命安全。你想过自己的死吗?你害怕吗 I mean, I like if I got shot tomorrow and I knew it today, I'd be like, "Oh, that's sad. I wanna see what's gonna happen. Yeah. What a curious time. What an interesting time. But I would mostly just feel like very grateful for my life. The moments that you did get. Yeah, me too. It's a pretty awesome life. 我的意思是,如果我明天就中枪了 而我今天就知道了,我会说:"哦,那真可怜。我想看看会发生什么是啊真是个好奇的时代多么有趣的时刻但我更多的是感激我的生活你得到的时刻是啊,我也是。这是一个相当真棒生活。 I get to enjoy awesome creations of humans of which I believe ChatGPT is one of, and everything that OpenAI is doing. Sam, it's really an honor and pleasure to talk to you again. Great to talk to you. Thank you for having me. Thanks for listening to this conversation with Sam Altman. To support this podcast, please check out our sponsors in the description. And now let me leave you with some words from Arthur C. Clarke and maybe that our role on this planet is not to worship God, but to create Him. Thank you for listening and hope to see you next time. 我可以欣赏到人类了不起的创造,我相信ChatGPT就是其中之一,我还可以欣赏到OpenAI正在做的一切。山姆,很荣幸再次与你交谈。很高兴与你交谈。感谢您的邀请。感谢您收听本期与 Sam Altman 的对话。如果您想支持本播客,请在描述中查看我们的赞助商。现在让我用阿瑟-C-克拉克(Arthur C. Clarke)的话来结束这次谈话,也许我们在这个星球上的角色不是崇拜上帝,而是创造上帝。感谢您的收听,希望下次再见。

OpenAI CEO Sam Altman discussed the potential of AI to remember and integrate personal experiences and lessons learned. He emphasized user choice, transparency, collaboration, and safety in AI development. Altman expressed his dislike for ads and the potential for AI to revolutionize information synthesis. He acknowledged the challenges of the AI arms race and the importance of slow takeoff and collaboration. Altman also mentioned the potential for AI to improve the advertising industry and user experience.

相关推荐 去reddit讨论
宝玉的分享

宝玉的分享 -

揭秘内部:OpenAI 的 Sora 模型如何运作 [译]

在这篇博客文章中,我们将深入剖析 Sora 模型背后的一些技术细节。我们还将探讨我们对这些视频模型可能产生的影响的看法。最后,我们将讨论我们对于用于训练 Sora 等模型所需计算资源的想法,并预测了训练计算与推理相比的情况,这对于预估未来 GPU 需求具有实质的参考价值。

OpenAI的Sora模型在视频生成方面表现出突破性能力,但对GPU推理计算的需求增加。Sora模型基于扩散变换器和潜在扩散构建,训练需要大量计算资源。预计当生成的视频总时长达到1530-3810万分钟时,推理计算需求将超过训练计算。Sora模型的质量和能力有显著进步,但可控性尚不清楚。Sora模型对合成数据生成、数据增强和世界模型的研究具有重要意义。随着类似Sora的模型广泛部署,推理计算需求将超过训练计算。

相关推荐 去reddit讨论
宝玉的分享

宝玉的分享 -

Sora:初体验 [译]

我们从创意界得到了极具价值的反馈,这对我们模型的完善大有裨益。

Sora模型在与视觉艺术家、设计师、创意总监和电影制作人的合作中展现了巨大潜力,能够创造出看似真实的作品并超越现实。Sora为创意人士提供了实现构思的机会,帮助他们打磨创意并探索新的艺术创作道路。艺术家们对Sora的未来发展充满期待,认为它将开启新的故事讲述形式。

相关推荐 去reddit讨论
OpenAI

OpenAI -

Sora: first impressions

We have gained valuable feedback from the creative community, helping us to improve our model.

自上个月以来,我们一直与视觉艺术家、设计师、创意总监和电影制片人合作,以了解Sora如何帮助他们的创作过程。Sora能够帮助艺术家扩展以前认为不可能的故事,并创造出超现实的作品。

相关推荐 去reddit讨论
爱范儿

爱范儿 -

早报|Vision Pro 年内在中国上市/华为 P70 供应商已开始供货 / Sora 计划进入好莱坞电影行业

· 脑机接口首位受试者实现用意念发帖 · 360 智脑宣布内测 500 万字长文本处理功能 · 苹果已放弃为 Apple Watch 开发 Micro LED 屏幕#欢迎关注爱范儿官方微信公众号:爱范儿(微信号:ifanr),更多精彩内容第一时间为您奉上。 爱范儿 | 原文链接 · 查看评论 · 新浪微博

首位脑机接口受试者用意念发帖,360智脑内测长文本处理功能,苹果头显将在中国上市,放弃为Apple Watch开发Micro LED屏幕,三星Galaxy AI将适配更多机型,OpenAI推荐Sora给好莱坞,苹果与第三方合作提供AI服务,滴滴首次实现年度盈利,阿维塔订单突破40000台,OpenAI前高管称需分清人类与机器界限,华为P70系列供货已开始,比亚迪腾势N7车型4月1日发布,吉利发布新一代雷神电混系统,Rabbit R1月底发货,GTA 6可能推迟发行,星巴克对价格战不感兴趣,PS5销量榜首,沙丘2票房突破3亿元,神秘博士新季预告发布,动物狂想曲第三季预告发布。

相关推荐 去reddit讨论
六虎

六虎 -

直播预告|Sora 会怎样驱动视频编解码领域的突破与革新

在数字化时代,视频内容的传播与消费已成为日常生活的一部分。视频编解码技术是数字媒体领域的一项核心技术,它影响着视频质量,传输速度以及观看体验。与此同时,视频产业正在经历一场由技术驱动的变革,Sora、

本期 RTE Dev Talk 邀请刘东教授和武祥吉教师分享Sora技能在视频编解码领域的创新和改造,介绍了端到端图像编码技能的原理和规范进展,以及团队在AI端到端图像编码规范制定方面的贡献。活动旨在帮助大家了解这些技能,并结交志同道合的朋友。

相关推荐 去reddit讨论
知乎每日精选

知乎每日精选 -

技术神秘化的去魅:Sora关键技术逆向工程图解

Sora生成的视频效果好吗?确实好。Sora算得上AGI发展历程上的里程碑吗?我个人觉得算。我们知道它效果好就行了,有必要知道Sora到底是怎么做的吗?我觉得最好是每个人能有知情的选择权,任何想知道的人都能够知道,这种状态比较好。那我们知道Sora到底是怎么做出来的吗?不知道。马斯克讽刺OpenAI是CloseAI,为示道不同,转头就把Grok开源了。且不论Grok效果是否足够好,马斯克此举是否有表演成分,能开源出来这行为就值得称赞。OpenAI树大招风,目前被树立成技术封闭的头号代表,想想花了上亿美金做出来的大模型,凭啥要开源?不开源确实也正常。所谓“开源固然可赞,闭源亦可理解”。但是,我个人一年多来的感觉,OpenAI技术强归强,然而有逐渐把技术神秘化的倾向,如果不信您可以去读一下Altman的各种访谈。在这个AI技术越来越封闭的智能时代,技术神秘化导向的自然结果就是盲目崇拜,智能时代所谓的“信息平权”或只能成梦想。我不认为这是一个好的趋势,我发自内心地尊敬对技术开放作出任何贡献的人或团体,且认为对技术神秘化的去魅,这应该是AI技术从业者值得追求的目标。本文试图尽我所能地以通俗易懂的方式来分析Sora的可能做法,包括它的整体结构以及关键组件。我希望即使您不太懂技术,也能大致看明白Sora的可能做法,所以画了几十张图来让看似复杂的机制更好理解,如果您看完对某部分仍不理解,那是我的问题。Key Messages这部分把本文关键信息列在这里,特供给没空或没耐心看长文的同学,当然我觉得您光看这些估计也未必能看明白。Key Message 1: Sora的整体结构如下(本文后续有逐步推导过程)Key Message 2: Sora的Visual Encoder-Decoder很可能采用了TECO(Temporally Consistent Transformer )模型的思路,而不是广泛传闻的MAGVIT-v2(本文后续给出了判断理由,及适配Sora而改造的TECO具体做法)。Encoder-Decoder部分的核心可能在于:为了能生成长达60秒的高质量视频,如何维护“长时一致性”最为关键。要在信息压缩与输入部分及早融入视频的“长时一致性”信息,不能只靠Diffusion Model,两者要打配合。Key Message 3: Sora之所以把Patch部分称为“Spacetime Latent Patch”,大概是有一定理由的/Patch部分支持“可变分辨率及可变长宽比”视频,应该是采用了NaVIT的思路,而不是Padding方案(本文后续部分给出了方案选择原因)。Key Message 4: 目前的AI发展状态下,您可能需要了解下Diffusion Model的基本原理(后文会给出较易理解的Diffusion模型基本原理介绍)Key Message 5: Video DiTs很可能长得像下面这个样子(本文后续内容会给出推导过程)Key Message 6: Sora保持生成视频的“长时一致性”也许会采取暴力手段(后文给出了可能采用的其它非暴力方法FDM)Key Message 7: Sora应该包含双向训练过程(后文给出了双向训练的可能实现机制)为何能对Sora进行逆向工程能否对Sora进行逆向工程,要依赖一些基本假设,若基本假设成立,则逆向工程可行,如不成立,则无希望。上面列出了Sora可被逆向工程的两个基本假设。首先,我们假设Sora并未有重大算法创新,是沿着目前主流技术的渐进式改进。这条无论是从OpenAI的算法设计哲学角度(我来替OpenAI归纳下核心思想:简洁通用的模型结构才具备Scale潜力,如果可能的话,尽量都用标准的Transformer来做,因为它的Scale潜力目前被验证是最好的,其它想取代Transformer的改进模型都请靠边站。模型结构改进不是重点,重点在于怼算力怼数据,通过Scale Transformer的模型规模来获取大收益。),还是历史经验角度(比如ChatGPT,关键技术皆参考业界前沿技术,RLHF强化学习是OpenAI独创的,但并非必需品,比如目前有用DPO取代RLHF的趋势)来看,这个条件大体应该是成立的。第二个条件,其实Sora技术报告透漏了不少关于模型选型的信息,但是您得仔细看才行。关于Sora技术报告透漏的信息,这里举个例子。它明确提到了使用图片和视频联合训练模型,而不像大多数视频生成模型那样,在训练的时候只用视频数据。这是关键信息,对保证Sora效果也肯定是个重要因素,原因后文会谈。既然Sora需要图片和视频联合训练,这等于对Sora内部结构怎么设计增加了约束条件,而这有利于我们进行逆向工程。再举个例子,Sora应采取了逐帧生成的技术路线,而不像很多视频生成模型那样,采取“关键帧生成+插帧”的模式。上图中Sora技术报告标红圈的部分是支持这一判断的部分证据,如果您参考的文献足够多,会发现一旦提“generating entire video all at once”,一般和“at once”对应的方法指的就是“关键帧+插帧”的模式。上图下方给出了Google 的视频生成模型Lumiere的论文摘要(可参考:Lumiere: A Space-Time Diffusion Model for Video Generation),也提到了“at once”等字眼,表明自己用的不是“关键帧+插帧”的模式,这是把“at once”作为论文创新点高度来提的。“关键帧生成+插帧”是视频生成模型中普遍采用的模式,但它的问题是会导致生成的视频整体动作幅度很小、而且不好维护全局的时间一致性。我们看到市面上很多视频生成产品都有这个问题,就可以倒推它们大概采用了“关键帧+插帧”的模式。可以看出,这点也是保证Sora视频质量好的重要技术选型决策,但若您看文献不够仔细,就不太容易发现这个点。归纳一下,之所以我们能对Sora进行逆向工程,是因为前述两个基本假设大致成立,而每当Sora技术报告透漏出某个技术选型,就等于我们在算法庞大的设计空间里就去掉了很多种可能,这相当于通过对主流技术进行不断剪枝,就可逐步靠近Sora的技术真相。接下来让我们根据目前的主流技术,结合Sora的技术报告,假设Sora模型已经训练好了,来一步步推导出Sora可能采用的整体技术架构。逐步推导Sora的整体结构Sora给人的第一印象是高质量的<文本-视频>生成模型:用户输入Prompt说清楚你想要生成视频的内容是什么,Sora能生成真实度很高的10秒到60秒的视频。至于内部它是怎么做到这一点的,目前我们还一无所知。首先,我们可如上图所示,技术性地稍微展开一下,细化Sora的整体轮廓。用户给出相对较短且描述粗略的Prompt后,Sora先用GPT对用户Prompt进行扩写,扩充出包含细节描述的长Prompt,这个操作是Sora技术报告里明确提到的。这步Prompt扩写很重要,Prompt对细节描述得越明确,则视频生成质量越好,而靠用户写长Prompt不现实,让GPT来加入细节描述,这体现了“在尽可能多的生产环节让大模型辅助或取代人”的思路。那么,Sora内部一定有文本编码器(Text Encoder),把长Prompt对应的文字描述转化成每个Token对应的Embedding,这意味着把文字描述从文本空间转换为隐空间(Latent Space)的参数,而这个Text Encoder大概率是CLIP模型对应的“文本编码器”(CLIP学习到了两个编码器:“文本编码器”及“图片编码器”,两者通过CLIP进行了文本空间和图片空间的语义对齐),DALLE 系列里的文本编码器使用的就是它。上文分析过,Sora应该走的是视频逐帧生成的路子,假设希望生成10秒长度、分辨率为1080*1080的视频,按照电影级标准“24帧/秒”流畅度来算,可知Sora需要生成24*10=240帧1080*1080分辨率的图片。所以,若已知输出视频的长度和分辨率,我们可以在生成视频前,事先产生好240帧1080*1080的噪音图,然后Sora在用户Prompt语义的指导下,按照时间顺序,逐帧生成符合用户Prompt描述的240张视频帧对应图片,这样就形成了视频生成结果。从Sora技术报告已知,它采用的生成模型是Diffusion模型,关于Diffusion模型的基本原理我们放在后文讲解,但现在面临的问题是:Diffusion Model也有不同的做法,到底Sora用的是像素空间(Pixel Space)的Diffusion Model,还是隐空间(Latent Space)的Diffusion Model呢?现在我们需要关于此做出技术决策。在做决策前,先了解下两个空间的Diffusion Model对应的特点。上图展示的是在像素空间内做Diffusion Model的思路,很直观,就是说在像素范围内通过Diffusion Model进行加噪音和去噪音的过程。因为图片包含像素太多,比如1080*1080的一张图片,就包含超过116万个像素点,所以像素空间的Diffusion Model就需要很大的计算资源,而且无论训练还是推理,速度会很慢,但是像素空间保留的细节信息丰富,所以像素空间的Diffusion Model效果是比较好的。这是它的特点。再说隐空间Diffusion Model的特点。最早的Diffusion Model都是在像素空间的,但速度实在太慢,所以后来有研究人员提出可以在对像素压缩后的隐空间内做Diffusion Model。具体而言,就是引入一个图像“编码器”(Encoder)和“解码器”(Decoder),编码器负责把图片表征从高维度的像素空间压缩到低维度的参数隐空间,而在经过Diffusion Model去噪后,生成隐空间内的图片内容,再靠解码器给隐空间图片内容添加细节信息,转换回图片像素空间。可以看出,Latent Diffusion Model(LDM)的特点正好和Pixel Diffusion Model(PDM)相反,节省资源速度快,但是效果比PDM差点。现在来做技术选型,从Sora技术报告明显可看出,它采用的是Latent Diffusion Model,这个也正常,目前无论做图像还是视频生成,很少有用Pixel Diffusion Model,大部分都用LDM。但是,LDM也存在一个压缩率的问题,可以压缩得比较狠,这样速度会更快,也可以压缩的不那么狠。我猜Sora在Encoder这一步不会压缩得太狠,这样就能保留更多原始图片细节信息,效果可能会更好些。Sora大概率会重点保证生成视频的质量,为此可以多消耗些计算资源,“以资源换质量”。Sora生成视频速度很慢,很可能跟Encoder压缩率不高有一定关系。于是,目前我们得到上图所示的Sora整体结构图,主要变化是增加了针对视频的Encoder和Decoder,以试图加快模型训练和推理速度。另外,一般把文本编码结果作为LDM模型的输入条件,用来指导生成图片或视频的内容能遵循用户Prompt描述。现在,我们面临新的技术决策点:对视频帧通过Encoder压缩编码后,是否会有Patchify(中文翻译是“切块”?不确定)操作?Patchify本质上可看成对视频数据的二次压缩,从Sora技术报告可看出,它应有此步骤,这也很正常,目前的视频生成模型一般都包含这个步骤。而且Sora将他们自己的做法称为“Spacetime Latent Patch”,至于为啥这么叫,我在后文关键模块分析会给出一个解释。另外,Sora还主打一个“支持不同分辨率、不同长宽比”的图片与视频生成,为了支持这个功能,那在Patchify这步就需要做些特殊的处理。于是,加入Spacetime Latent Patch后,目前的Sora整体结构就如上图所示。Patchify一般放在视频编码器之后,而且输出时把多维的结果打成一维线性的,作为后续Diffusion Model的输入。我们接着往下推导,来看下实现Diffusion Model时具体采用的神经网络结构,此处需注意,Diffusion Model是种偏向数学化的算法思想,具体实现时可以采用不同的神经网络结构。其实目前Diffusion Model视频生成的主流网络结构一般会用U-Net,Transformer做Diffusion 视频生成目前并非主流。当然,Sora出现之后,选择Transformer来做Diffusion Model肯定很快会成为主流结构。从Sora技术报告可知,它采用的骨干网络是Transformer,应该主要看中了它良好的可扩展性,方便把模型规模推上去。当然用Transformer+Diffusion做视频生成,Sora并不是第一个这么做的,这再次印证了OpenAI经常干的那种操作,就是利用“吸星大法”从开源届汲取各种前沿思路,但是自己反而越来越封闭的CloseAI作风。于是,我们把基于Transformer网络结构的信息融进去,目前Sora整体结构如上图所示。让我们继续。Sora在宣传时特别强调一个特性:可以支持不同分辨率(Variable Resolution)、不同长宽比(Various Aspect Ratio)、不同时长(Various Duration)的视频训练与生成。目前主流技术里这么做的不能说没有,但是确实极少,三者同时做到的在公开文献里我没有看到过,要做到这一点,对具体技术选型时也有不少要求,所以作为宣传点无可厚非。后文为了表达简洁些,统一以“不同分辨率”来同时代表“不同分辨率和不同长宽比”,这点在阅读后文的时候还请注意。关于生成视频时长问题我们后文会单独分析。这里先解释下什么是“不同分辨率和长宽比”。如上图所示,其实好理解,分辨率的话一般跟图片大小有关系,图片小分辨率就低一些,图片大清晰度或分辨率就高一些,而长宽比就比如我们经常看到的短视频的“竖屏模式”和长视频的“横屏模式”等。Sora都支持。那为啥要支持“不同的分辨率和长宽比”呢?上图给了个例子,目前做视频或者图片生成的主流技术,为了方面内部处理(训练时Batch内数据的规则性),会把输入的图片或视频大小统一起来,比如对于不同大小的图片,通过Crop操作,就是在图片中心截取一个正方形的图片片段,通过这种方式把输入大小统一。而这么做的问题上图展示出来了,因为你截图,所以很容易把一个完整的实体切割开,使用这种经过Crop数据训练的视频生成模型,生成的人体就很容易看着不完整,而使用“不同的分辨率和长宽比”,会保持原始数据所有信息,就没有这个问题,实体表达的完整性就会好很多。从这也可看出,Sora为了保视频质量,真的是在视频生成的全环节都拼了全力。我们把Sora这一关键特性表达到整体结构图上,就如上图所示。如果要支持这一特点,那么在Spacetime Latent Patch以及LDM这两个阶段,需要作出一些特殊的设计决策,这也是我们用来在后面推断Sora关键技术的重要参考和约束信息。 下一个决策点之前我们提到过,Sora使用了图片和视频联合训练,这对于保证视频生成质量很重要。为啥说这点重要呢?上图给了个例子(可参考Phenaki: Variable Length Video Generation From Open Domain Textual Description),用户Prompt要求输出的视频是“Water Color Style”风格的,如果只用视频训练(右侧视频截图),就做不到这一点,而如果混合了80%的视频数据和20%的图片数据训练的视频生成模型(左侧视频截图),做得就不错。这是因为带标注的<文本-图片>数据量多,所以各种风格的图片数据都包含,而带标视频数据数量少,所以很多情景要求下的数据都没有,就导致了这种生成结果的差异。从此例子可以看出视频和图片联合训练对于视频生成质量的影响。如果Sora要支持图片和视频联合训练,则需要在视频编码-解码器,以及Spacetime Latent Patch阶段做技术选型要作出独特的设计,这进一步形成了关键模块的设计约束。加上越多约束,其实你能做的技术选择就越少,就越容易推断出具体的做法。目前的Sora整体结构如上图所示。 Sora的另外一大特性是能生成长达60秒的较长时长的视频,这点众所周知。如果把时长要求加进去,Sora应该会在“视觉编码器-解码器”阶段,以及LDM阶段作出一些独特的设计,才有可能维护这么长时间的视觉连贯性和内容一致性。把所有这些约束都加入后,我们就经过一步步推导,最终得出了Sora完整的整体结构,如上图所示。如果对文生视频领域比较熟悉,我觉得从技术报告推导出Sora的整体结构,这事难度不算大,真正难的地方在于Sora关键模块具体采用的什么技术。可列出的关键技术主要有四块:视频编码器-解码器在支持“图片&&视频联合训练”、“视频长时一致性”这两个约束条件下,具体模型应该如何设计?Spacetime Latent Patch在支持“图片&&视频联合训练”、“可变分辨率”这两个约束条件下,具体模型应该如何设计?基于Transformer的视频Diffusion Model在支持“可变分辨率”约束条件下,具体模型应该如何设计?(这块的长时一致性策略放在第四部分了)Diffusion Model阶段的长时一致性如何维护?接下来,我们对Sora四个关键技术进行更深入的分析。视频编码器-解码器:从VAE到TECO(Temporally Consistent Transformer )Sora的视频Encoder-Decoder采用VAE模型概率极大,原因很简单,因为绝大多数图片或视频模型基本都用VAE,定位到VAE不难,难在继续探索Sora可能使用的到底是哪个具体模型。 VAE模型出来后有不少改进模型,总体而言可分为两大类:“连续Latent” 模型和“离散Latent”模型。VAE本身是连续Latent的,而离散Latent模型变体众多,最常用的包括VQ-VAE和VQ-GAN,这两位在多模态模型和图片、视频各种模型中经常现身。“离散Latent”之所以比较火,这与GPT模型采用自回归生成离散Token模式有一定关联,使用离散Latent模型,比较容易套到类似LLM的Next Token的生成框架里,有望实现语言模型和图片、视频生成模型的一体化,典型的例子就是谷歌的VideoPoet。考虑到Sora主干模型采用Diffusion Model而非Next Token这种类LLM模式,而Diffusion Model加噪去噪的过程,本就比较适合在连续Latent空间进行,可以推断Sora采用“连续Latent”的模式概率较大,倒不是说离散Latent模型不能做Diffusion Model,也是可以的,但如果这么做,一方面把本来是连续Latent的VAE多做一道转成离散Latent,感觉没有太大必要性,有点多此一举的味道。另一方面,如果对接Diffusion Model,离散Latent效果肯定是不如连续Latent的,原因后面会谈。之前不少探索Sora实现原理的技术文献把Sora可能使用的Encoder-Decoder定位到MAGVIT-v2模型(可参考:Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation),它是VQ-VAE的一个变种。不清楚得出这个判断的原因是什么,但我个人感觉采用MAGVIT-v2的概率应该不大,反而是VQ-GAN的变体模型TECO(Temporally Consistent Transformer)可能性更高些,理由后面会谈到。当然如果适配Sora的一些要求,TECO也需要做出些改动,具体怎么改后文也会谈。为了便于理解后续内容,先介绍下图片VAE模型的基本思路。如上图所示,VAE是种类似GPT的自监督学习,不需要标注数据,只要有足够图片数据就能训练模型。VAE的基本思想是通过重建图片,来获得一个Encoder和对应的Decoder。输入随机某张图片 后,Encoder对像素进行压缩,形成一个低维的图片特征压缩表示 ,而Decoder与Encoder相反,从压缩后的图片Latent表征 ,试图还原原始图像 ,产生重建的图像 ,重建过程中 和 的差异就可以作为训练模型的损失函数,以此引导VAE模型的encoder产生高质量的压缩表示 ,decoder则从压缩表示中尽可能准确地还原 。一般会采用CNN卷积网络来做Encoder和Decoder,所以VAE本身产生的图片Latent表征,本来就是连续的。说完图片VAE的思路,再来谈视频VAE的基本思想。如果把一张图片看作是世界某个时刻三维空间的二维压缩表示,那视频就在此之上,加入时间维度,可看做沿着时间轴由若干连续的二维Space图片组成的某段物理世界场景的三维Space-time表征。于是,视频可以被看成是由多张图片沿着时间轴组成的有序图片序列,视频VAE的任务和图片VAE是类似的,就是尽可能准确地重建组成视频的每一帧,在重建视频的过程中学习视频压缩Encoder和视频解压缩Decoder。一般Encoder可以使用Causal CNN 3D卷积来做,和图片的2D卷积意思类似,最大的不同在于,CNN的卷积核从2D升级成3D卷积。就是说在压缩第 帧图片的时候,不仅仅像图片2D卷积一样只参考第 帧图片内容,也可以参考第 等之前的 帧图片,所以是一种casual 3D卷积(Causal的意思是只能参考前面的,不能参考后面的,因为对于视频生成来说,后面的帧还没生成,所以是不可能参考到的,但是第 帧之前的 帧已经生成了,所以在生成第 帧的时候是可以参考的。一般这种就叫causal(因果),类似GPT生成Next Token的时候只能参考之前已经生成的Token,这是为啥GPT的attention被称作Causal Attention。)3D卷积因为在重建第 帧的时候参考了之前的 帧,这其实融入时间信息了,如果 可以拉到比较长的时间,那么对于维护生成图像的时间一致性是有帮助的。但是,仅靠CNN 卷积一般融入的历史比较短,很难融入较长时间的信息,所以需要有新思路,至于新思路是什么,这个晚些会谈。简单介绍下“连续Latent”和“离散Latent”的概念。如上图所示,如果使用CNN卷积操作对图片进行扫描,因为卷积结果数值是在连续实数范围内,所以得到的卷积结果自然就是连续的Latent。所谓“离散Latent”,就是把“连续Latent”进行ID化,从实数向量通过一定方法转换成一个专属ID编号,这跟LLM里的字符串Tokenizer离散化过程是比较像的。具体而言,一般对“连续Latent”离散化过程的基本思想可参考上图中的右侧子图:模型维护一个“密码本”(Codebook),密码本由很多Codeword构成,每项Codeword维护两个信息,一个是这个Codeword对应的Latent特征Embedding,这是连续的,另外就是这个Codeword对应的专属ID编号。Codebook类似词典信息。在离散化过程中,对于某个待离散化的”连续Latent”,会和密码本里每个Codeword对应的Embedding比对下,找到最接近的,然后把Codeword对应的ID赋予待离散化Latent。你看,其实很简单。这里解释下前面提到的一点:为何说对于Diffusion Model来说,“离散Latent”的效果应该比不上“连续Latent”。其实从上面离散化过程可以看出来,本质上”连续Latent”离散化过程,可以看成对图片片段聚类的过程,赋予的那个ID编号其实等价于聚类的类编号。目前的图像处理而言,Codeword通常在8000左右,如果再大效果反而不好,这里就看出一个问题了,这种聚类操作导致很多“大体相似,但细节不同”的图片被赋予相同的ID,这意味着细节信息的丢失,所以离散化操作是有信息损失的。这是为何说如果对接Diffusion Model最好还是用“连续Latent”的原因,因为保留的图片细节的信息含量更多,有利于后续生成更高质量的视频内容。再说“离散Latent”的一个典型模型VQ-VAE(可参考:Neural Discrete Representation Learning),思路如上图所示,其实就是刚提到的如何对VAE获得的“连续Latent”进行离散化的过程,思路已说过,此处不赘述。另外一个“离散化Latent”的典型是VQ-GAN(可参考:Taming Transformers for High-Resolution Image Synthesis),其思路可参考上图。可以把它简单理解成加入了GAN的改进版本VQ-VAE。在VQ-VAE离散化基础上,集成进GAN的思路,以获得更好的编码效果。我们知道,对于GAN而言,主要是由一个“生成器”和一个“判别器”相互欺骗对抗来优化模型效果,VAE Decoder 会生成图像,自然这可作为GAN天然的生成器,再引入一个独立的GAN判别器即可。那Sora到底用的哪个VAE模型呢?上图展示了传说中被提及率最高的MAGVIT-2。过程较为简单,它把输入视频帧分组,首帧单独作为一组(这导致它可以支持“图片&&视频的联合训练”,因为图片可看成单帧的视频,首帧单独表示就可以对单张图片进行编码了),其它帧比如可以4帧分为一组。对于每一组4帧图片,通过Causal 3D 卷积把4帧图片先压缩成一个“连续Latent”。然后再进行上面讲的“连续Latent”离散化操作,就得到了MAGVIT的编码结果。我们先不考虑离散化操作,对于Sora来说,很明显这是不需要的,原因上文有述。单说Causal 3D卷积操作,MAGVIT的这个操作意味着两个事情:首先,MAGVIG-v2因为会把4帧最后压缩成一帧的Latent表示,所以它不仅在空间维度,同时也在时间维度上对输入进行了压缩,而这可能在输入层面带来进一步的信息损失,这种信息损失对于视频分类来说不是问题,但是对视频生成来说可能无法接受。其次,4帧压成1帧,这说明起码MAGVIG-v2的Latent编码是包含了“局部Time”信息的,这对于维护生成视频的时间一致性肯定有帮助,但因为仅靠CNN很难融入太长的历史信息,貌似只能融合短期的时间信息,对于维护“长时一致性”帮助很有限,。综合考虑,我个人觉得Sora采用MAGVIT的概率不大。为了能够生成长达60秒的视频,我们希望在VAE编码阶段,就能把长周期的历史信息融入到VAE编码里来,这肯定是有很大好处的。问题是:现在公开的研究里,存在这种模型吗?您别说,还真让我找到一个,就是上图所展示的TECO模型(可参考:Temporally Consistent Transformers for Video Generation)。上图展示了TECO和Sora两位Co-Lead之间的渊源,这是UC Berkeley发的文章,主要研究如何生成“长时间一致性”的视频,而两位Co-Lead都博士毕业于UC Berkeley,也都研究视频生成相关内容,所以他们起码肯定知道这个工作,TECO的主题又比较符合他们把Sora打到60秒长度的技术需求,所以参考TECO的概率还是较大的。TECO结构如上图所示,核心由两个任务组成:一个是视频重建任务,用来训练视频Encoder-Decoder;一个是使用MaskGit生成离散化的图像Token,主要用于生成视频。TECO有两个主要特点:首先,它在VAE编码阶段对Space和Time信息分别编码,而且Time编码引入了极长的Long Time信息。确切点说,是所有历史信息,比如要生成第 帧视频,则Time编码会把第 到第 帧的之前所有历史信息都融合到第 帧的时间编码里。很明显这样做对于维护长时一致性是很有帮助的。其次,TECO在生成视频的长时一致性方面表现确实很不错。上图右下角的效果对比图测试了长达500帧的生成视频,TECO效果比基准模型要好(也请关注下里面的红色曲线模型FDM,后面我们会提到它)。我们可以推断一下,假设视频是电影级流畅度达24帧/秒,那么500帧图像对应正好20秒时长的生成视频。(Sora生成的大部分视频都是长度20秒左右,推断应该也是总长度500帧左右。这是否说明了些什么?)对Sora来说,如果对TECO适应性地改造一下,基本就可以把它能在VAE阶段就融合超长历史的能力吸收进来。具体而言,需要做两项改动:首先,VAE离散化是不必要的,所以可以拿掉;其次,MaskGit部分用于训练模型能够Token by Token地生成视频,我们也不需要,只需要保留视频重建部分即可。 经过上述改造,TECO在VAE Encoder阶段的基本思想就展示在上图中了。首先,是对图片内容的空间Latent编码。首帧单独处理,自己成为一组,这就可以支持“图片和视频联合训练”了;其它帧两帧一组,比如对于第 帧,则把前一帧第 帧也和第 帧放在一组。这里要注意,尽管也是2帧一组,但是这和MAGVIT 思路是不一样的,TECO这个2帧一组类似一个滑动窗口,窗口间是有重叠的,所以不存在多帧压缩成一帧带来的信息损失问题。TECO思路正好和MAGVIT相反,在Space Latent编码阶段不仅考虑第i帧,还把第 帧的信息也带进来,所以它是通过VAE增加更多信息的思路。视频帧分组后,使用CNN 3D卷积可以产生每帧图片对应的“连续Latent”,这部分是“Space Latent”,主要编码图像的空间信息;之后,使用Causal Temporal Transformer对时间信息进行编码,前面提过,对于同一视频,TECO会把所有历史内容Time信息都融合进来。Transformer输出的时间编码是线性的,经过Reshape后可以形成和“Space Latent”相同大小的高维表示,这部分就是VAE的“Time Latent”。这样,每帧视频经过TECO编码后,有一个“Space Latent”一个“Time Latent”,两者并在一起就是这帧视频的VAE编码结果。这里可以再次看出,TECO的思路是增加信息,而不是以压缩减少信息为唯一目的的。使用TECO除了能在VAE编码阶段就引入尽可能长的时间信息,更好维护生成视频的一致性外,还有另外一个好处,OpenAI明显是认准了Transformer的Scale潜力比较大,所以Sora在做Diffusion Model的时候把U-Net换成Transformer。如果采用TECO,则Sora的主体结构基本都基于Transformer了,这明显是符合OpenAI的模型口味的。Spacetime Latent Patch:Spacetime Latent Patch的含义及NaVIT我们先介绍单张图片Patchify的具体含义。本质上,Patchify是对VAE压缩编码的二次压缩,在视频生成模型里很常见。具体做法很简单,如上图所示,对于VAE压缩后的“连续Latent”平面,可以设定一个 大小的Patch,不重叠地扫描“连续Latent”平面,通常是接上一个MLP对 的小正方形网格输入做个变换。这样的话,假设“连续Latent”本来大小是 ,经过Patchify操作后,就形成了一个二次压缩的 的Patch矩阵,然后可以通过线性化操作把Patch拉成一条直线,这是因为后面接的是Transformer,它需要线性的输入Patch形式。目前很多视频生成研究证明了:Patch Size越小,生成的视频质量越高。所以这里Sora采取 大小的Patch Size基本没疑问。Patch Size越小说明压缩率越低,也说明保留的原始图片信息越多。可以进一步推断,这说明了VAE阶段也好、Patchify阶段也好,这种原始信息压缩阶段,应该尽量多保留原始信息,不要压缩太狠,否则对视频生成质量会是负面效果。当然付出的代价是比较消耗计算资源,计算速度会慢很多。目前看很难兼顾,你必须要作出取舍。了解单张图片的Patchify操作后,我们来看一个简单的视频Patch方法。因为视频是由多个视频帧按照时间顺序构成的有序序列,一个最简单的方法是不考虑不同帧之间的关系,每一帧独立通过上述的Patchify操作进行二次压缩,如上图所示。之前很多解读Sora技术的文章倾向于认为Sora在这个阶段采用了类似VIVIT的Tubelet Embedding的思路。含义如上图所示:就是除了第一帧,其它视频帧比如可以2帧为一组,不仅在空间维度进行压缩,在时间维度也要进一步压缩,从时间维度的2帧输入压缩为1帧Patch,具体技术采取CNN 3D 卷积就可以实现。我觉得在这里采用类VIVIT的时间压缩可能性较小,主要这么操作,在时间维度进一步压缩,输入侧信息损失太高。VIVIT搞的是图像分类任务,属于比较粗粒度的任务,所以压缩狠一点问题不大,但是对于视频生成任务来说,就像上文提到的,看似在输入侧要尽可能保留多一些信息,这么狠的压缩大概会严重影响视频生成质量。目前也有研究(可参考:Latte: Latent Diffusion Transformer for Video Generation)证明,这么做确实有损害作用,所以在这里,类VIVIT方案我觉得可以Pass掉。如果假设Sora在VAE阶段采用的是TECO的话,则可以如上图这么做。因为每张图片有两个Patch矩阵,一个是Space Latent,保留的主要是空间信息;一是Time Latent,保留主要是长时历史信息。所以,我们可以用一个 的Patch,把同一个图片的Space Latent和Time Latent合并,压缩为一个Patch矩阵。在这里若把这张图片对应的Patch矩阵叫做“Spacetime Latent Patch”,看着貌似问题不大吧?我猜Sora这么做的概率还是比较大的,也很可能是OpenAI强调的“Spacetime Latent Patch”的来源之处。当然这纯属个人猜测,主观性较强,谨慎参考。这么做有若干好处。首先,每张图片对应一个Patch矩阵,融合过程中既包含了空间信息,也包含了Long Time时间信息,信息保留非常充分。其次,如果要支持“图片&&视频联合训练”,那么首帧需要独立编码不能分组,这种方案因为没有视频帧分组过程,所以自然就支持“图片&&视频联合训练”。前文讲过,如果要支持不同分辨率视频,则需要在Patch阶段做些独特的工作。之前大家提及率较高的现有技术是NaVIT,目前看下来,貌似确实也没有比NaVIT(可参考:Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution)更合适的方案了。上图展示了NaVIT的基本思路:其实很简单,只要我们固定住Patch Size的大小,通过扫描不同分辨率的视频,自然就产生了不同分辨率或长宽比的Patch矩阵,然后把它们线性化即可。与NaVIT对应的可以支持可变分辨率的方法是Padding方案。如上图右方子图所示,只要设定好一个最大图片大小,其实不管图片长宽比如何,只要让它占据从左上角开始的一个局部位置即可,其它相对最大图片大小空出的位置,用无意义的Padding占位符号占住就行。很明显这个方法也可以支持不同分辨率视频。那么我们应该选择NaVIT还是Padding呢?很明显应该选择NaVIT方案。NaVIT在提出之初,就是为了改进Padding方法的。Padding方法有什么问题?就是在训练模型的时候,一个Batch里被Padding这种无意义的占位符号浪费的空间太多了,而NaVIT不需要对每张图片进行Padding,该是多少Patch就是多少Patch,顶多在Batch末尾加少量Padding来填充到Batch 最大长度即可。很明显NaViT方案在一个Batch里可以放更多视频帧,而这能极大增加模型的训练效率。而且,如果模型能支持的最大分辨率越高,Padding方法每张图片Padding浪费的比例就越高,采用NaVIT也就越合算。我们知道,Sora最大可以支持2048*2048的图片,在这种情况下,基本不可能采用Padding方法,貌似也只能用NaVIT了,起码我目前还没有看到更好的方案。在将Patch拉成线性结构后,会丢失Patch对应的位置信息,所以为了能够支持可变分辨率视频,对于每个Patch,需要特殊设计的位置表征。很明显使用Patch的绝对位置(就是按照Patch顺序编号)是不行的,只要我们使用三维空间里的相对坐标,并学习对应的Position Embedding,就可以解决这个问题。上图展示了同一个视频的连续三帧,对于蓝色Patch来说,可以看出它对应的相对坐标位置为: , 以及 (视频时间维度的第三帧)。假设我们在模型训练过程中学习每个坐标位置对应的embedding,然后把三者的embedding叠加,就形成了这个Patch对应的Position Embedding,这里包含了这个Patch对应的三维相对坐标。对于每个Patch来说,除了Patch表达图片内容外,对应的,每个Patch再增加一个位置表征即可。本部分最后,在Spacetime Latent Patch阶段,让我们归纳下Sora可能采取的技术方案:首先,很可能会对接TECO的VAE编码,使用 大小的Patch来合并每张图片的Space Latent以及Time Latent,每张图片被压成一个Spacetime Latent Patch矩阵。然后使用NaVIT方法来支持可变分辨率视频,最主要的改动是需要根据空间维度的两个坐标和时间轴坐标,来学习每个Patch在空间位置中对应三维空间相对位置坐标的Position Embedding。Transformer Diffusion Model:从Diffusion Model原理到Video DiTs模型本部分我们会先介绍下Diffusion Model基本原理,然后逐步推导Video DiTs模型可能的内部结构。上图展示了Diffusion Model的基本原理,Diffusion Model由正向加噪和反向去噪过程构成。假设我们有一个很大的图片库,可以从中随机选择一张 ,正向过程分多次,每次加入不同程度的符合正态分布的噪音到原始图片里,直到清晰图完全转化为纯噪音图 为止。而反向去噪过程则从转化来的纯噪音图 开始,训练神经网络预测对应步骤加入的噪音是什么,然后从纯噪音图 里减掉预测的噪音,图像清晰程度就增加一些,依次反向逐步一点一点去除噪音,就能恢复出最初的 图片内容。Diffusion Model的前向过程是人为可控地对已知图片逐步加入不同程度噪音的过程,即噪音的逐步“扩散”过程。经数学推导,对于第 个Time Step的加噪音过程可以一步完成,不需要上述的逐渐扩散的过程,如上图所列出公式所示。给图片加噪音的具体过程如下:首先,我们从图片库中随机选择一张清晰图 ,再随机选择一个满足正态分布的完全噪音图 ;然后,随机选择一个Time Step,并对它进行编码;接下来按照上述公式直接在原始图片 基础上融合噪音 来产生混合噪音图 ,加入噪音程度系数 与Time Step有关,原则上,Time Step越大,则 越小,原始图片信息融入得越少,噪音程度系数值 则越大,混合后的噪音图 噪音程度越高,也就是说混入更高比例的噪音到原始清晰图 中。这样就一步形成了某个time step下的噪音图。当人为加入可控噪音后,等于制作出了训练数据:<构造出的混合噪音图 ,构造这张混合噪音图时对应的Time Step,被加入的噪音图 >。用这个训练数据,我们可以来训练一个神经网络模型 ,输入混合噪音图 以及噪音图对应的Time Step信息,让 根据这两个信息,反向预测到底加入了怎样的噪音 ,而前向过程被加入的噪音图 就是标准答案。神经网络 当前预测的噪音图 和标准答案 对比,两者的差异 形成损失(MSE Loss),把预测差异通过反向传播去调整神经网络的参数,使得神经网络能够预测得越来越准。这就是训练Diffusion Model的过程。当然,这里为了方便讲清楚,我做了一定程度的简化。如果经过上述过程训练好Diffusion Model之后,在使用阶段,Diffusion Model的反向过程如上图所示,分为两个阶段。第一个阶段,我们把需要进一步去除噪音的某个混合噪音图 ,以及混合噪音图当前对应的去噪步数(Time Step)信息,输入训好的神经网络 ,此时神经网络 会预测出一个噪音图 。第二个阶段,拿到了神经网络预测的噪音图 后,混合噪音图片 减掉预测的噪音图 ,就完成了一步去噪音的过程,图像包含的噪音就减少一些,变得更清晰一些。去噪过程仍然需要一步一步逐渐完成,不能像加噪过程那样一步完成。上面介绍的是无条件约束下的图像Diffusion Model运行过程,而像文生图模型比如Stable Diffusion这种模型,是在有文本Prompt约束下进行的,希望模型能生成符合文本描述的图像。如何将无条件的Diffusion Model改造成有条件约束的模型呢?很简单,我们可以使用比如CLIP的文本编码器,把Prompt从文本空间映射到与图像对齐的参数空间内,然后以此作为Diffusion Model模型生成图片的指导条件。类似地,Diffusion Model预测的噪音 会和人为加入的噪音标准 进行对比,以减小两者的差异作为学习目标,来更新Diffusion Model的参数,这样能让神经网络预测噪音越来越准,那么去噪效果也就会越来越好。上面是Diffusion Model的基本原理,接下来我们介绍如何推导出Video DiTs视频生成模型的结构。首先要明确的是,基于Transformer的Diffusion Model整个工作流程,就是上面介绍的加噪和去噪过程,无非预测噪音的神经网络结构,从传统做Diffusion Model常用的U-Net网络,换成了Transformer网络结构而已。大家都猜测Sora是基于DiTs模型(可参考:Scalable Diffusion Models with Transformers),原因在于William Peebles作为Sora项目的Co-Lead,也是DiTS模型的一做,所以大家推测Sora的Diffusion Model是基于DiTs改的,这个猜测听着还是蛮合理的。DiTs是基于Transformer的Diffusion Model图像生成模型,看着结构比较复杂,其实整体结构和上文介绍的标准的有条件Transformer Diffusion Model生成模型基本一致,上图在DiTs结构图的对应位置标注出了相应的组件名称,左右两图可以对照着看下。需要注意的是,DiTs是生成图片的模型,直接拿来做视频模型肯定是不行的。我们至少需要在DiTs上做两项改造:首先,需要设计一定的模型结构,用来支持不同长宽比和分辨率的视频;第二,需要把图片生成模型改成视频生成模型。 先来看下第一个改造,用来从Transformer模型结构层面支持不同长宽比和分辨率的视频。在Spacetime Latent Patch阶段我们谈到过,经过NaVIT改造,不同图片或视频帧的输入Patch是变长的,所以在Transformer阶段,我们需要引入Attention Mask机制,保证Transformer在做Local Spatial Attention的时候,属于某张图片的Patch只能相互之间看到自己这张图片内的其它Patch,但不能看到其它图片的内容。另外,因为这个引入的Attention Mask是针对输入Patch的,所以Transformer内的这个Local Spatial Attention模块一定在Transformer内部结构的最底层。经过上述推导,我们可得出如上图所示的Transformer内部结构,它目前由两个子模块构成:最底层是Local Spatial Attention模块,主要计算图片或视频帧的空间信息,也就是对同一个视频帧内的各个Patch关系进行建模。在它之上,有一个标准的MLP 模块,这个是Transformer模块做非线性映射所必需的。现在的问题是:如果每个视频帧的Patch数是固定的,那么这个Local Spatial Attention模块就很容易设计,但是我们面对的是变长Patch,具体采取什么技术手段才能实现针对变长Patch的Local Spatial Attention呢?这里给出一个可能的解决方法,主要思路来自于文献“Efficient Sequence Packing without Cross-contamination: Accelerating Large Language Models without Impacting Performance”。我们可采用“0/1 Attention Mask矩阵”来达成目标,从上图可看出思路也很简洁:如果我们假设Batch内序列最大长度是8,就可以设置一个 的0/1 Attention Mask,只有对角线正方形子Block位置全是1,其它地方都设置成0。左图中标为绿色的某帧三个Patch,如果看矩阵前三行,易看出,针对其它帧的Attention Mask由于都是0,所以加上Mask后就看不到其它图片,而对于它对应的 都是1的Attention Mask,又可以保证三个Patch相互都能看到。其它图片也是类似的道理。通过设置Attention Mask,就可以很方便地支持NaVIT导致的每帧不同分辨率和长宽比的问题。接下来进行第二项改造,从DiTs到Video DiTs,也就是让DiTs能够支持视频生成。这步改进比较简单,因为大多数视频生成模型都有这个模块,就是在我们上一步改造的Transformer结构里,加入一个Casual Time Attention子模块。Causal Time Attention模块的作用是在生成第i帧的时候,收集历史Time信息,也就是通过Attention让第i帧看到之前的比如k帧内容,这是用来维护生成视频的时间一致性的,做视频生成肯定需要它。至于它的位置,因为Local Spatial Attention必然在Transformer内部最下方,所以Causal Time Attention放在之前引入的两个子模块中间,这是个合理选择。Local Spatial Attention和Causal Time Attention的具体含义,如果按照时间序列展开,则如上图所示,比较简单不解释了。 前面在讲Diffusion Model原理的时候提过,利用Diffusion Model来做文本生成视频,还需要两个条件变量:Prompt文本信息,以及Time Step信息。如果把这两个条件引入,一种设计方案是把两个条件信息压缩后,并排放入每一帧的输入信息里;另外一种思路是可以在Transformer目前的3个子模块里再引入一个Condition Attention Block,把输入条件接入这个模块,通过Attention模式工作。目前已有研究(可参考:VDT: General-purpose Video Diffusion Transformers via Mask Modeling)证明,尽管第一种把条件变量塞到输入部分的做法很简单,但是效果是很好的,训练起来模型收敛速度也较快。基于此,我这里就选择了这种简洁的方案,思路如上图所示。如果归纳下Video DiTs的整个逻辑,就如上图所示。把噪音Patch线性化后,并入Prompt和Time Step条件,一起作为Transformer的输入。Transformer内部由三个子模块构成:Local Spatial Attention模块负责收集视频帧空间信息;Causal Time Attention模块负责收集历史时间信息;MLP模块负责对时间和空间信息通过非线性进行融合。叠加比如 个这种Transformer模块,就可以预测当前Time Step加入的噪音,实现一步去噪音操作。对于Diffusion Model的逆向去噪过程,Time Step可能需要如此反复迭代20到50次去噪过程,才能形成清晰的视频帧。这也是为何Sora比较慢的原因之一。Sora的Long Time Consistency可能策略:暴力美学还是FDM?如何维护生成长视频的内容一致性也是一个研究方向,目前一种比较常见的策略是“LLM+Diffusion Model”集成策略,如上图所示的流程。其基本思想是:可以把长视频分成多个分镜场景,对于用户输入的Prompt,可以用比如GPT-4这种LLM模型自动生成多场景各自的拓展Prompt描述,然后用视频生成模型生成对应的分场景视频,就是“分场景拼接”的模式。但这里有个问题,比如主角可能在各个分场景都会出现,如果不做一些特殊的维护角色一致性处理的话,可能会出现主角形象老在不断变化的问题,也就是角色不一致的问题。上面这个工作VideoDrafter(可参考:VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM)是这么处理的:它让LLM产生一个角色的形象描述文字,然后使用比如Stable Diffusion文生图模型,根据形象描述文字,生成全局固定的角色外观图片。在每个分场景内,都依托这个唯一的角色外观图片来生成视频,这样可以解决不同分场景的角色一致性问题。Sora会采取这种策略吗?我猜可能性不太大,对于很通用的Prompt描述,明确确定主角或特定角色其实是不太容易的,这种明确角色、产生全局固定形象的思路,感觉比较适合特定领域的视频生成。这里提一种粗暴野蛮但简单的可能做法,如上图所示。就是说,在生成第i帧视频的时候,把Time Attention拉长,让第i帧看到前面从第1帧到第 帧所有的历史内容,这类似TECO在做VAE时集成Time信息的做法。这种做法看到的历史比较长,所以可能在维护一致性方面有好处,但明显对算力要求很高。Sora有可能这么做吗?并不能排除这种可能性,证据来自于上图中Sora技术报告的截图,红圈标出的文字意思是Sora为了维持长时一致性,会一次看之前的很多帧。在Transformer Diffusion Model阶段维护“长时一致性”策略方面,感觉FDM(Flexible Diffusion Modeling)方法是种简洁有效的思路。FDM(可参考:Flexible Diffusion Modeling of Long Videos)提出了两种Time Attention改进模型,在维护长时一致性方面效果不错。之前我们提到TECO的评测,右下角的评测图里,除了TECO那条蓝色线,紧接着的红色线就是FDM在500帧视频生成的效果。而且很明显,FDM这种Time Attention和TECO这种VAE编码,两者都出于维护生成视频一致性的目的,而在模型中所处的位置不同,所以两者是有互补性的。下面介绍下FDM的两种Time Attention的思路,在介绍之前,先说明下基准方法,也就是“自回归方法“(Autoregressive)。如图所示,“自回归”思路很直接,先依次生成若干比如6帧视频帧,然后一次生成后续3帧,在生成这3帧的时候,Time Attention会看到之前的最近若干帧,比如4帧。也就是说,“自回归”在生成后续视频帧的时候,会参考之前最近的若干帧。容易看出,这是一种“短时”Attention,而非“长时”Attention。“Long Range”是FDM提出的第一种“长时一致性”模型,思路如图所示。想法很直观:在生成第i帧视频的时候,不仅仅参考最近的几帧,也会在较远历史里固定住若干帧作为参考。可以看出,“Long Range”既参考短时历史,也参考长时历史,不过长时历史位置是随机选的,也是固定的。“Hierarchy方法”是FDM提出的第二种长时Attention策略。它首先从较长历史里间隔采样,获得之前历史的大致轮廓,在全局历史视频帧引导下,先产生后面若干关键位置的视频帧,比如第一帧、最后一帧以及中间帧。这意思是根据全局的历史,来生成全局的未来。之后按顺序生成后续帧,在生成后续帧的时候,不仅参考最近的历史,同时也参考第一步生成的未来关键位置视频帧。所以这是一种先谋划全局,再斟酌现在的“长远与近期相结合”的层级化的Time Attention。我无法确定Sora是否会用类似FDM的思路,但是觉得这是维护长时一致性较为可取的一种方法。Sora的训练过程与技巧:合成数据、两阶段训练及双向生成需要再次强调下:所有文生视频模型,本质上都是有监督学习,是需要大量高质量标注好的<文本,视频>成对数据来训练的,它们不是类似LLM的那种自监督学习那样,无需标注数据。尽管有些开源的带标注视频数据,但是无论数据量还是质量,想要作出类似Sora这么高质量的视频生成模型,基本上是没可能的。所以,如果想要复现Sora,如何自动化地做出大量高质量标注视频数据可能才是最关键,也是最难的一步。(当然,我们可以借鉴LLM蒸馏GPT4的历史经验,估计这些GPT-4V视频标注蒸馏方案,很快就会出现)我觉得,Sora之所以效果这么好,在制作带标注视频合成数据方面的贡献很可能是最大的。Sora采用了类似DALLE 3的方法来制作视频合成数据。上图左侧展示了DALLE 3制作<文本,图片>合成数据的流程。图片标注数据网上资源有很多,比如5B的LAION数据,但是标注质量有些问题,一方面是太粗略太短没有细节描述,一方面里面有些是错误的。鉴于此,DALLE 3通过人工标注(或者人加GPT相结合?)一些<详细文本描述,图片>数据,用这个数据来训练一个Image-Caption Model(ICM),就是说ICM接受图片输入,学习根据图片内容,自动产生图片的详细描述。有了ICM模型,DALLE 3用它生成的长文本描述,替换掉原先图文标注数据里的短文本描述,就制作出了大批量的高质量合成数据,这对DALLE 3质量提升帮助很大。Sora的视频合成数据制作过程应该是类似的(参考上图右侧)。通过人工标注(或人+GPT)一批高质量的<视频,长文本描述>数据,可以训练一个Video-Caption Model。 VCM模型训练好后,可以接受视频,输出详细的文本描述。之后,可以用VCM产生的视频长描述替换掉标注视频数据里的简短文本描述,就产生了高质量的视频合成数据。其实思路可以再打开,既然我们有了VCM,也可以给没有标注的视频自动打上长文本描述,没问题吧?这样的话,可以挑那些高质量视频,用VCM打上详细文本描述,这就制作出了大量的、非常高质量的视频标注数据。另外,既然Sora是图片和视频联合训练,那么很显然,训练DALLE 3的那批图文合成数据,那肯定在训练Sora的时候也用了。Sora在训练的时候应该采取了两阶段训练过程,下面简述其做法。一般VAE是独立训练的,收集大量的图片或视频数据后,通过图片或视频重建的训练目标,可以得到对应的“视觉编码器-解码器”。此部分训练是自监督学习,不需要标注数据。第二阶段是包括Diffusion Model在内整个模型的训练,这一阶段训练过程中,一般前一阶段训练好的Encoder-Decoder会冻结模型参数,不随着这步骤的训练数据发生变动,包括Text Encoder也是利用现成的比如CLIP,也会类似地冻结住模型参数。所以这部分训练主要涉及Spacetime Latent Patch对应的Position Embedding,以及预测噪音的基于Transformer的Diffusion Model的训练。另外,Sora还支持多种方式的视频生成,比如输入一张静态图生成完整视频、生成无限循环视频、输入结尾几帧图片倒着生成完整视频、给定两段视频内容生成新内容将两者平滑地连接起来等。可以推断,在Sora的训练过程中,采用了在输入侧中间位置加入已知图片,然后同时按照时间维度的正向生成视频和反向生成视频的双向生成策略。一方面,引入这种双向生成策略,可以方便地支持上面讲的各种灵活的视频生成类型;另外一方面,其实如果采取从中间向时间维度两边拓展的生成模式,更有利于维护生成内容的连贯性和一致性。因为中间位置向两边拓展,只需要维护一半时间窗口的内容一致性即可,两边向中间内容靠拢,这看上去是双向生成策略带来的额外好处。比如,之前提到的输入一张图片生成完整视频,从视频内容可知,这个例子是把输入图片放在了输入噪音图片序列的最后一帧,然后按照时间顺序倒着生成的。再比如,生成无限循环视频,可以把某一帧视频图片,分别插入在中间位置和头尾位置,然后从中间位置分别向两边生成,这样就会产生一个看上去总在无限循环的视频内容。可见,若能方便地在指定输入位置插入图片,即可方便地支持双向训练或灵活的视频生成方式。那么,如何达成这一点呢?可以采用掩码策略(思路可参考:VDT: General-purpose Video Diffusion Transformers via Mask Modeling),如上图所示。图右侧 是 掩码矩阵,对应矩阵取值要么都是1要么都是0,而 是引入的掩码帧序列,可以把已知图片插入到指定位置,并把它对应的掩码矩阵设置为1,其它掩码帧可以是随机噪音,对应掩码矩阵设置为0。 和 经过Bit级矩阵乘法,获得掩码运算结果,对应0掩码矩阵内容都被清零,而对应1的掩码矩阵的内容仍然保留,这样形成掩码帧。相应地,对Diffusion Model的输入噪音序列 来说,设置一个反向掩码矩阵序列 ,其 矩阵取值和对应的掩码帧 矩阵M正好相反,同样地,( 和 进行掩码运算后,要插入图片位置的输入帧数据被清零,其它噪音帧内容保持不变。接下来只要将噪音输入帧和对应的掩码帧进行矩阵加法运算,这样就把已指图片插入到Diffusion Model的指定位置了。Sora能作为物理世界模拟器吗OpenAI宣称Sora是物理世界模拟器,这个问题的答案非常主观,每个人都有不同的看法。我觉得以目前的技术条件来说,单靠Sora本身很难构造世界模拟器,更愿意把OpenAI这么讲看成是他们对Sora寄托的愿景,而非已经成立的事实。对此更详尽的个人思考可见:https://zhuanlan.zhihu.com/p/684089478 来源:知乎 www.zhihu.com 作者:张俊林 【知乎日报】千万用户的选择,做朋友圈里的新鲜事分享大牛。 点击下载

本文介绍了视频生成模型Sora的结构和训练过程,包括采用的TECO模型、Diffusion Model和Spacetime Latent Patch等。Sora基于Transformer网络,旨在生成高质量、长时一致性的视频。训练过程包括两阶段,支持多种方式的视频生成。文章还讨论了Sora作为物理世界模拟器的可能性。

相关推荐 去reddit讨论

热榜 Top10

LigaAI
LigaAI
观测云
观测云
eolink
eolink
Dify.AI
Dify.AI

推荐或自荐