Nicholas

LIVE: Google's Jeff Dean on the Coming Transformations in AI

Nicholas

At AI Ascent 2025, Jeff Dean makes bold predictions. Discover how the pioneer behind Google's TPUs and foundational AI research sees the technology evolving, from specialized hardware to more organic systems, and future engineering capabilities.

Published
Published May 16, 2025
Uploaded
Uploaded Jun 11, 2026
File type
Podcast
Queried
0

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:29

[00:00] Hi, and welcome to Training Day, though. [00:01] We are mixing it up for this week's episodes and dropping a conversation that was filmed live at Sequoia's annual AI conference in San Francisco with Google's chief scientist and AI lead Jeff Dean. Jeff is interviewed by our partner and Google alum, Bill Korn. [00:14] We hope you enjoy this special conversation with Jeff about the future of model development and compute, whether or not he likes vibes coding, hint, he does, and his expected timelines for a 24-7 software developer agent. [00:29] - We have Jeff Dean. And if you read Jeff's bio, he's run [00:34] everything at some point in Google, including overseeing the genesis of this industry and the Burt paper that kind of sparked things so many years ago. And we're very fortunate at Sequoia to have our partner, Bill Korn, who spent about a decade before Sequoia running most of engineering at Google with Jeff. And so please welcome Jeff and Bill. Thanks. No, thank you. And Jeff, it's [01:04] for a few years, and Jeff still is occasionally willing to talk to me, which I'm very proud of. We have an occasional dinner, which is great fun. Yeah, no, he's now the chief scientist, I think, at Alphabet. So I thought we'd start. Obviously, a lot of the people in the room are excited about AI and what's happened. Google clearly introduced a lot of the tech that the

1:34-3:11

[01:34] Where do you see things going these days as you look out, both within Google but also in the industry as a whole? [01:41] Yeah, I mean, I think... [01:43] This sort of period has been a fairly long time in developing, even though it's sort of come into sort of popular visibility only in the last three or four years. But really, starting maybe in 2012 and 13, people were starting to be able to use these really, you know, at that time, what seemed like large neural networks to solve interesting problems. And the same sort of algorithmic approach would work for vision and for speech and for language. [02:13] and kind of brought... [02:16] uh attention to you know machine learning as a way to solve those problems rather than sort of more traditional handcrafted approaches and one of the things we were interested in in 2012 even was how can you scale and train very very large neural networks so we've trained a neural network that at the time was 60x larger than anything else and we used 16 000 cpu cores because that's what we had [02:46] in our mind that scaling these approaches would really work well. And there's been, you know, a whole bunch of evidence of that and, you know, hardware improvements to help increase our ability to scale to larger and larger models, larger data sets. You know, we had an expression, bigger model, more data, better results, which has been sort of relatively true for the last 12 or 15 years.

3:16-4:48

[03:16] have are capable of doing really interesting things. They can't solve every problem. They can solve a growing set of problems year over year, because the models get better. We have better algorithmic improvements that [03:34] show us how to train larger models with the same compute cost, more capable models. And then we have scaling of hardware. We have increasing compute per-- [03:48] per unit of hardware. And also we have reinforcement learning and post-training kinds of approaches that are making the models better and sort of guiding them into the ways that we want them to behave. And that's really exciting, I think. Multi-modality is another big thing, like having the ability to put in audio or video or images or text or code and have it sort of output all those kinds of things as well is pretty useful. [04:18] The industry is, I think, mesmerized by agents right now. How real do you think agents are? I know Google introduced an agent framework. Some of this stuff, not Google's necessarily, but some of the agent stuff seems to be a little bit vaporware to me. So, yeah. Sorry, folks. I'm a little direct, as some folks will tell you. It's all good. I mean, I think there's a lot of promise there because, you know, we – [04:45] I do see a path for

4:48-6:32

[04:48] agents with [04:50] you know, the right training process to eventually be able to do many, many things in the virtual sort of computer environment that humans can do today. You know, right now, they can sort of do some things, but not most things, but... [05:04] the path for increasing the capability there is reasonably clear. You get more reinforcement learning going. You have more agent experience that it can learn from. You have early nascent products that can do some things, but not most things, but are still incredibly useful for people. [05:23] And I think similar things will happen in sort of physical robotic agents as well like, [05:29] Right now... [05:30] that were probably close to making that transition from, you know, [05:35] Robots in messy environments like this room kind of don't quite work today, but you can see a path where in the next year or two, they'll start to be able to do 20 useful things in this room. And that will introduce, you know, pretty expensive robotic products that can do those 20 things. And then learning from experience, they will then get cost engineered to now have something that's 10 times cheaper and can do, you know, a thousand things. [06:05] and more improvement in capability. [06:09] It's exciting. It is, and it does seem like it's coming, even though it's a favorite word today. I guess one of the other things that comes up, I think, with a lot of young companies is what's happening with the large models. I mean, clearly, Google has Gemini 2.5 Pro and Deep Research and so forth, and then there's OpenAI and a number of other players.

6:38-8:29

[06:38] open source, closed source, where are things going? How do you think about that? Obviously, Google has a strong position and wants to, I'm sure, dominate in that area. But how do you see the landscape? Yeah, I mean, I think-- [06:54] Clearly, it takes quite a lot of investment to build the absolute cutting-edge models, but [07:02] And I think there won't be 50 of those. There may be like a handful. And there are an awful lot-- once you have those capable models, it's possible to make much lighter weight models [07:16] that can be used for many more things because you can use techniques like distillation that I was a co-author on and got rejected from NeurIPS 2014 is unlikely to have impact. I've heard that technique may have helped deep seek. So that's a really nice technique. [07:34] Uh-- [07:35] technique if you have a better model and then you can put it into a smaller scale thing that actually is pretty lightweight and fast and all the kinds of properties you might want. [07:45] So [07:46] I mean, I think there will be quite a number of different players in this space because, you know, different shape models or models that focus on different kinds of things. But I also think, you know, a handful of really capable general purpose ones will do pretty well. [08:02] - Thank you. [08:03] Fair enough. I guess hardware is the other thing that's interesting. It looks to me like every large player is building their own hardware. Obviously, Google has been very public about the TPU program, but Amazon has their own. Rumors are Meta has one. Rumors are OpenAI is building one. You know, there's lots of hardware, and yet the industry seems to only hear about NVIDIA.

8:33-10:04

[08:33] That's not true in your office. But how do you think about that? How important is specialized hardware for this stuff? Yeah, well, I mean, it's very clear that having hardware that is focused on sort of machine learning style computations... [08:50] I like to say accelerators for reduced precision linear algebra are what you want, and you want them to be better and better generation over generation, and you want them to be connected together at large scale with super high-speed networking so that you can spread your model computation out over as many compute devices as possible. [09:09] I think it's super important. I helped bootstrap the TPU program in 2013 because it seemed obvious we would want a lot of compute for inference at that time. That was the first generation. And then the next generation of TPUs, TPUv2, was focused on both inference and training because we saw... [09:28] A big need there. And I think we're on now, we stopped numbering them for some annoying reason. So now we're on Ironwood, which is coming out any day now and a trillion before that. Be careful. That sounds like an Intel chip naming strategy, which hasn't worked that well. Small edit distance from Ikenium, which is a little scary. [09:58] in the room. [10:00] I have a lot of friends who are physicists.

10:05-11:56

[10:05] They were a little surprised when Jeff Hinton and his colleagues won the Nobel in physics. I guess, how do you see AI... [10:18] You know, some of the physicists I know are sort of offended that a non-physicist is starting to win Nobel Prizes. How far do you think AI is going to go in various fields at this point? Pretty far, I think. I mean, also this year, my colleague Demas and John Jumper won it for chemistry. I almost forgot that. Yes, yes. So double Nobel Prize celebration Monday and Tuesday or whatever it was. [10:48] that's a sign that really AI is influencing lots of different kinds of science because [10:54] You know, [10:56] At its core, you know, can you learn from interesting data? And a lot of [11:01] parts of science are about making connections between things and understanding them. And if you can have AI assisted help in doing that, you know, one of the things I've seen in many different fields of science is many disciplines often have incredibly expensive computational simulators of some process like weather forecasting is a good example or [11:23] fluid dynamics or quantum chemistry simulations. And often what you can do is, [11:29] use those simulators as training data for a neural net, and then build something that approximates the simulator, but now is 300,000 times faster. And that just changes how you do science, because all of a sudden, well, I'm going to go to lunch and screen 10 million molecules. That's now possible, instead of, you know, I would have to run that for a year on compute I don't have. And I think that just kind of fundamentally changes your...

11:56-13:27

[11:56] process of how you do things. [11:58] and we'll make faster discoveries. [12:01] I think it's probably the most interesting if there are questions from the audience at this point. I have other questions for Jeff, but... [12:12] Thank you. [12:12] Thank you. [12:13] Well, actually, just to quickly follow up on the, you know, Jeff Hinton famously left Google after studying, I guess, the effects of or the differences between digital and analog computing as a future platform for inference and learning. And I'm wondering, is the future of inference hardware analog computing? [12:33] It's definitely a possibility. I mean, I think, like, analog has some nice properties in terms of it being very, very power efficient. You know, I think there's a lot of room for digital things to be much more specialized for inference as well. So and it's a little bit easier to work with typically. But [12:54] I think there is a general direction of how can we make inference hardware that is [13:00] you know, 10, 20, 50,000 times more efficient than what we have today. [13:06] And that seems eminently possible if we put our minds to it. [13:10] It's actually something I'm spending a bit of time on. [13:14] Hi. I was just going to ask about [13:20] developer experience versus hardware. I think the TPU hardware is extremely impressive, but there's a lot in the zeitgeist about how

13:27-15:09

[13:27] Kuda or different technologies are easier to use than the TPU layer. And so I'd be curious for your perspective on that. And is that something you've been thinking about or getting a lot of angry emails about? Yeah. [13:40] Yeah, I mean, I don't connect with cloud TPU customers all that much, but definitely the experience can be improved. One of the things we started working on in 2018 is a system called Pathways. [13:52] which is really designed to enable us to take lots of different computing devices [13:58] and then give-- [14:00] sort of a really nice abstraction with those where you have a virtual to physical device mapping that is managed by the underlying runtime system. And, you know, we have support for that for both PyTorch and JAX. We primarily use JAX in-house. But what we have is a single JAX Python process just looks like it has 10,000 devices on it. [14:23] and you just write your code as you would as an ML researcher, and off you go. And you can prototype it with four, eight, or 16, or 64 devices, and then you change a constant, and you run against the different pathways back in with 1,000, 10,000 chips, and off you go. Like our largest Gemini models-- [14:42] are trained with a single Python process driving the entire thing, with tens of thousands of chips, and it works quite well. [14:50] So pretty good developer experience, I think. [14:53] One thing I would say is, to date, we had not offered that to cloud customers, but we just announced at Cloud Next that we're now going to have pathways available for cloud customers, so then everyone else can have the delightful experience of a single Python process with thousands of devices attached. Yeah.

15:09-16:42

[15:09] Thank you. [15:10] And I agree that's a much better experience than managing like 64 processors for your 256 chips like why do you why would you want to do that. [15:23] Thank you. [15:24] I love using the Gemini API. It would be even easier if it got one API key rather than the Google Cloud credential setup. Do you guys have a plan to unify the Google Cloud Gemini stack with the Gemini project setup right now that's more for testing stuff? [15:48] Yeah, I think there's a bunch of streamlining that is being looked at. It's a known problem, not something I spend a lot of time on personally, but I know like Logan and others are on the... [15:57] the developer side are aware of this friction. We'd like to make it frictionless to use our models. [16:07] Thank you. [16:08] Is that working? Okay. So it's an interesting time in computing. You've got the confluence of Moore's Law and Denard scaling being completely dead. [16:18] with AI just scaling like crazy. [16:22] You have a pretty unique position in the world of driving these supercomputers and infrastructure that [16:27] is being built. [16:28] And you know how to map the workloads onto these things, which is a unique sort of skill. What do you think the future of computing is going to look like? What is the computing infrastructure? [16:39] heading towards, like from an asymptotic

16:42-18:14

[16:42] thought experiment level. [16:44] Yeah, I mean, it's really clear that we won't [16:47] have dramatically changed the kinds of computations we want to run on computers. [16:52] in the last, say, [16:54] Five years. [16:55] 10 years. [16:57] And that was initially a small ripple, but it's pretty clear now that you want to run incredibly large neural networks at incredibly high performance and incredibly low power. [17:09] and you also want to train them. Training and inference are pretty [17:13] different kinds of workloads. So I think it's useful to think of those two as you probably want [17:20] different solutions for the two, or somewhat specialized solutions. And I think you're going to see all kinds of adaptation of compute platforms for this new reality that you really just want to run. [17:35] incredibly capable models, [17:37] Um... [17:38] And [17:40] So some of that will be in low power environments, like your phone. Like you'd like your phone to run incredibly good models with lots of parameters super fast, so that when you talk to your phone, it just talks back to you, and it can help you do all kinds of things. You're going to want to run these on robots. [17:58] and autonomous vehicles. We already do somewhat, but even better hardware for that will make those systems much easier to build much more capable. [18:09] physical agents in the world. And then you want to run them at incredibly large scale and data centers.

18:14-19:46

[18:14] And you also then want to use lots of inference time compute for some kinds of problems, but not others. So it's pretty clear you want to use 10,000 times as much compute for some problems as for others. [18:29] and that's a nice new scaling knob we have that can make your model much more capable or give you [18:35] you know, [18:36] better answers or make the model capable of doing things with that much compute that it can't do with, you know, 1x as much compute. But you shouldn't spend 10,000 times as much compute on everything. So how do you make your systems work well for that? [18:50] I think that's a combination of hardware, system software, model and algorithmic tricks, distillation, all these things can help you make [18:59] amazing models come to life in small compute footprints. [19:02] One thing I've noticed is the computer science, at least traditionally, you know, when people are studying algorithms and computational complexity, it was all op count based. And I think as people are rediscovering hardware and details of hardware and system design, I think one of the things that's come back into focus is you need to think about network bandwidth and memory bandwidth and so forth. [19:32] kind of [19:33] traditional algorithmic analysis needs to be completely rethought just because of realities of what real computation looks like. Yeah, one of my office mates in grad school did his thesis on like,

19:46-21:17

[19:46] cache aware algorithms because the order of magnitude big O kind of notation didn't account for the fact that some operations are 100x worse than others. Yeah, no, that's right. And I think in modern ML computing, you care about data movement at the incredibly small level, like moving things from SRAM into accumulators costs you some [20:08] tiny number of picajoules, but it's way more than the actual operation could [20:14] Cost you so it's important to have pico jewels at the tip of your tongue these days the I? One other quick question. Do you vibe code? [20:30] I've been trying it a little bit. It actually works surprisingly well. Yeah, I mean, we've had some nice-- [20:38] We have a little demo chat room. Actually, we have a lot of chat rooms. We sort of run Gemini via chat room. So I'm in like 200 chat rooms. And when I wake up and brush my teeth, I get like nine notifications because my London colleagues are busily doing things. [20:53] Like we had one where people can send out cool demos of things I've seen and, uh, [20:57] One that was particularly cool was... [21:01] You feed in a YouTube educational-oriented video. And the prompt is just something like, please make me an educational game that uses graphics and interactivity to help illustrate the concepts of this video.

21:17-22:49

[21:17] And, you know, it doesn't work every time, but 30% of the time you get something that's actually kind of cool and related to differential equations or traveling to Mars or, you know, doing some kind of cell aspect thing. And, you know, that's just an incredible sign for education. [21:34] The tools we now have and will have in the next few years, really have this amazing opportunity to change the world in so many positive ways. [21:44] So I think we should all remember that as [21:46] is kind of what we should be striving for. [21:50] Would you mind passing there and then maybe there? Yeah, we'd love to hear your thoughts about the future of search. And especially given Chrome... [22:01] such big distribution, right? And then especially Chrome already know the credentials like payments and then web signing credentials. [22:11] Have you... [22:12] I thought about getting Gemini just directly into Chrome, making the Chrome app Gemini app instead of have a separate app. I say this because I'm long-term Googler. So just think about-- Yeah, I mean, I think there are definitely lots of interesting downstream uses one could make of the core Gemini models or other models. [22:36] Can it? [22:36] help you do stuff in your browser or on your full computer desktop by observing what you're doing and, you know, doing OCR on tabs or maybe it has access to the--

22:49-24:36

[22:49] the raw tab contents, that seems like it will be incredibly helpful. [22:55] You know, I think [22:56] We have some early work in this area that we've published public demos of. [23:02] in video form that seem pretty useful, things like Mariner and things like that. [23:07] TBD. [23:08] Can you pass the... Jeff, question for you. So thank you for your comments. Very insightful. Earlier you mentioned the number of foundational model players will likely only be a handful. And this is largely because of the infrastructure costs and the scale of investment to remain at that cutting edge. And so as this battle for the frontier unfolds, how do you see the... [23:38] you see this end game going? Where does this lead us? Is it just whoever writes the biggest check to build the biggest cluster wins? Or is it better? You just talked about better utilization of unified memory optimization and different efficient uses of what you already have? Or is it the consumer experience? Where does this arms race lead us? [24:06] Isn't it just whoever gets to Skynet first, the game's over? [24:12] Yeah. [24:12] Yeah, I mean, I think-- [24:14] It... [24:16] It's going to require really good, insightful algorithmic work, as well as really good systems, hardware, and infrastructure work. I don't think either one of those is more important than the other, because what we've seen in, say, our Gemini progression from generation to generation is the algorithmic improvements are...

24:36-26:09

[24:36] as important or maybe even more so than the hardware improvements, or the larger amount of hardware we're putting to the problem. But both are incredibly important. And then I think from a product standpoint, [24:50] you know, what [24:51] It's... [24:53] There's sort of early stage products in this space, but I don't think we've collectively hit on what is the thing that-- or it's probably going to be many things-- [25:03] that become [25:05] the [25:06] the daily used products for billions of people. [25:09] Right. I think there's probably some in the educational space or in, you know, general information retrieval that is search like, but, but, [25:17] taking advantage of the strengths of [25:22] you know, [25:24] large multimodal models. I think, [25:27] Probably helping people get stuff done in, you know, whatever work environment they find themselves in is going to be an incredibly useful thing. And how will that get. [25:38] manifested in product settings. How do I manage my team of 50 virtual agents that are off doing things? And they'll probably be mostly doing the right thing, but occasionally they'll need to consult [25:50] with me about some choice they need to make. I need to give them a bit of steering. How do I manage [25:57] you know, 50 virtual interns. It's going to be complicated. [26:01] I... [26:03] - Hi Jeff, thanks for being here, right here. - Sorry. - I literally cannot think of anyone

26:09-27:43

[26:09] better in the world to ask this question. How far do you believe we are from [26:16] Having an AI... [26:18] operating 24/7 at the level of a junior engineer. [26:22] Thank you. [26:25] Thank you. [26:26] Not that far. [26:27] Yeah. Is that six weeks or six years? [26:36] Every year in AI seems like a dog seven or something. I will claim that's probably possible in the next year-ish. [26:44] Thank you. [26:45] Yeah. [26:46] Wow. [26:46] Hi, Jeff. You talked about scaling pre-training and now scaling RL. How do you think about, like, in the future trajectory of these models? Will it be, you know, one large model with all the compute or a constellation of smaller models that have been distilled from these larger models, both working in parallel? How do you see that? [27:06] the future landscape. [27:08] Yeah, I mean, I've always been a big fan of models that are kind of sparse and have different parts of expertise in different parts of the model. Because, you know, from our weak biological analogies, that's partly how our real brains get so power efficient is, you know, we're 20 watts or whatever. And we can do a lot of things. But our Shakespeare poetry part is not active when we're like worried about the garbage truck backing up at us in the car. [27:38] We do some of that with mixture of expert style models.

27:43-29:24

[27:43] We did some of the early work in that space where we had like, 2,048 experts and showed that it gave you dramatic improvements in efficiency, like 10 to 100 X more efficient. [27:55] sort of model quality per training flop. And that's super important. But it feels like we're not really fully exploring the space yet because right now the kinds of... [28:10] sparsity people tend to do is incredibly regular. It feels like you want paths through your model that are like 100 or 1,000 times more expensive than other paths, and you want experts or pieces of your model that are tiny amounts of compute and some that are very large amounts of compute. Maybe they should have different structures. [28:28] And I think you want to be able to extend your model with new parameters or new bits of space. And maybe you want to be able to compact parts of your model, running a distillation process on this piece of it to make it one quarter of the size, and then you have some background garbage collection thing that is now like, "Oh, great, I have more memory to use, so I'm going to put those parameters, or put those bytes of memory somewhere else and make more effective use of them somewhere else." [28:56] And so that to me seems like a much more organic, continuous learning system than what we have today. So I, you know, the only problem with this is what we're doing today is incredibly effective. So it becomes a bit hard to completely change what you're doing to be more like that. But I really do think there are huge benefits to doing things in that style rather than the sort of

29:24-30:52

[29:24] more rigidly defined model that we have today. [29:29] Thank you. [29:30] I think one more question and then we'll probably wrap up. [29:35] Hey, I wanted to return to the junior engineer inside a year. I'm curious, what advancements do you think we need [29:41] to get there. Obviously, just maybe code generation gets better. But outside of code generation, [29:46] What do you think gets us there? Tool use, agentic planning? [29:49] Yeah, I mean, I think they, you know, this hypothetical virtual engineer probably needs a better sense of many more things than just writing code in an IDE. Like it needs to know how to like run tests and like debug performance issues and all those kinds of things. And we know how human engineers do those things. They learn how to use various tools that we have and can make use of them to accomplish that. And they, you know, [30:17] get that wisdom from more experienced engineers typically. Um, [30:23] or reading lots of documentation. And I feel like, [30:26] you know, [30:27] junior virtual engineer is going to be pretty good at reading documentation and sort of trying things out in virtual environments. And so that seems like a way to get better and better at some of these things. Uh, [30:39] And, you know, I don't know how far we'll... [30:43] take us, but it seems like it'll take us pretty far. [30:47] Jeff, thank you for coming and sharing your wisdom. Thank you. Great to see you.

Want to learn more?