Tractable | Transcript: Modal CEO Erik Bernhardsson: Scalable on-demand compute for the cloud

March 21, 2024 • 38 Minutes

Modal CEO Erik Bernhardsson: Scalable on-demand compute for the cloud

Kshitij [00:00:04]:
Hello everyone welcome to another episode of the Tractable podcast. I'm Kshitij, cofounder and CTO here at Orb. And today, I have with me Erik. Erik is the founder and CEO of Modal, which is a serverless platform that makes deploying your code to the cloud and specifically, Modal's cloud really easy. Modal is known for being simple to get started with and super scalable even at really large workloads, both on CPUs and GPUs. Companies big and small use Modal, including Ramp, Substack, and Suno. So excited to dive in with Erik today.

Kshitij [00:00:36]:
Erik, welcome.

Erik [00:00:37]:
Thank you. And, yeah, it's great to be here. By the way, that was a phenomenal pitch. I'm just gonna steal that.

Kshitij [00:00:44]:
Well, glad to hear that. Look, I think I wanna dive right into the technical conversation. So I wanna talk a lot about Modal. Before we get there, tell me a little bit about your technical background before starting Modal. What sort of problems have you historically been interested in? And then maybe you can kind of pave the path from that to Modal and what you're doing now.

Erik [00:01:03]
Yeah. So kind of rewinding the clock a little bit. I grew up in Sweden. I started coding more than 30 years ago. I did a lot of programming competitions in high school and and university. And then when I graduated, I ended up joining this then obscure music streaming company, called Spotify, was employee number 30 or something that. This is back in Sweden and in particular, I for whatever reason, was able to convince them to hire me to build a music recognition system despite, barely having any experience with and started working on that and I started working on that and spent 7 years at Spotify. I did a little bit of everything, but, in particular, I built a music recommendation system, but I also did a lot of other sorts of data, AI, and machine learning stuff. Later left, and I was the CTO of a Fintech company called Better, which has its own sort of, you know, interesting kind of roller coaster story. I left 3 years ago and started Modal early 2021 roughly.

Kshitij [00:01:57]:
And that's interesting that you worked on the recommendation system at Spotify. What was the kind of primary challenges there? Was it, having to come up with a lot of things from scratch because this technology wasn't super hot at the time? I mean, it sounds pretty early in those days.

Erik [00:02:13]:
Yeah. I mean, this is do you remember, Netflix prize? That was kinda cool. And so, I took a lot of inspiration from that. It's a slightly different problem because they had ratings, Spotify didn't have ratings. But in many ways, it was quite, you know, quite inspired by that, and a lot of Spotify had a lot of data we looked at, you know, who listens to what, and we don't necessarily know, whether they it or not. But, just the fact that they listen to something pretty strong signal. And it turns out, roughly what you can do is, you take all that data and, put it in a big matrix and then you factorize that matrix.

Erik [00:02:45]:
And a lot of NLP models actually worked really well. So it's also so so we started looking at, you know, a lot of NLP models. Because you think of, what they listen to as a token essentially and it's a sequence prediction if you incorporate time. So I spent a lot of time, building that and I mean back then it was, 2008 - 2009.

Erik [00:03:03]:
There was Hadoop, there was stuff like that.

Erik [00:03:08]:
And it wasn't very nice to work with, but, yeah, we had to scale it out. And I spent a lot of time, you know, building distributed, you know, machine learning, large scale methods, and built a lot of data pipelines. I ended up open-sourcing, I think called Luigi to do these, because we had very complex data workflows. And then I ended up building a vector database because there was nothing like that, I needed it. And so I ended up building my own vector database called Annoy, which is also open source but has no use today. Yeah. I mean, it was, an amazing problem.

Erik [00:03:36]:
I think it was crazy looking back that, you know, I was in my twenties and they put so much faith. But I think, early Spotify was much very much a culture like that. You were given very sort of audacious goals, and I spent 2 years, just, you know, hacking on it with, 0 nothing in production, just, you know, pure research. And then 2, 3 years in, we were able to get into production and started building features on top of it.

Kshitij [00:04:00]:
Yeah. I wanna dwell on that a little bit. How do you think the development environment for and, obviously, Modal is now becoming a part of that ecosystem, but if you have a, let's say, new grad or an engineer starting today, starting to learn about the basics of how to, you know, deal with data, how to think about deployments, it's a very, very different experience than having the experience you had with Spotify. Right?

Erik [00:04:22]:
Yeah.

Kshitij [00:04:23]:
Because there's so much of the primitives already built for you or at least, you know, available to you. So what do you think about that journey? Is that us stepping in the right direction as an industry? Or, you know, is there some tension there?

Erik [00:04:36]:
I think, some stuff is good. Actually, I mean, I think well, I'll take that back. Most of it is quite good, but I think in the end of the day, we made it harder. And I think that's the story

Erik [00:04:44]:
of, software engineering as a whole. Right? When I started my career, the cloud didn't exist. Right? I mean, AWS launched, I think, S3 and EC2 around, 2000- maybe it was, around the same time, actually, 2008, something that. We started using it as Spotify at that time, but, very limited. Like most of it is, on-prem and I don't know. I mean, in a way, you know, because it was, pre Terraform, pre-cloud, all this stuff, Yeah. It was, conceptually easier. It's just, SSH into some, box and just write code.

Erik [00:05:13]:
Right? And, you know, I look back at, early coding in in many ways, you know, CGI bin and PHP. It's sort of an easier model with the cloud and many other things, like we've obviously added so many different abstractions, in many ways, we actually made it harder to do things, but also, easier in certain things. So I think conceptually, it was easier to get started when I started because there was, less mental overhead. But, obviously, you know, the ability to, use all these tools and to deliver value at scale it makes

Erik [00:05:43]:
it so much easier today to actually scale up and, you don't have to worry about, you know, distributed data or, you know, building your own scaling layer and all these things that that back then, Spotify had to invest an insane amount of time. Just one thing to think about, that we think that I think about a lot is, Spotify built its own, peer to peer system for just for distributing music data, which is today, a problem that's, typically solved by CDMs because the cost of CDs has dropped by, probably 2 - 3 orders of magnitude. But yeah. So, yeah, things are easier today in many ways, but it's also harder. And part of that is, you know, I wanted to build a tool kind of selfishly that I always wanted to have that makes it easy to work with the cloud again and conceptually removes a lot of the sort of dissonance of, I have a local environment. I have a cloud environment. How do I make it work? And a lot of that actually inspired how I how I built Modal.

Kshitij [00:06:36]:
Yeah. So yeah. It's a perfect segue for me. So let's talk about Modal. You know, I go to modal.com. I see cloud functions reimagined, what does that actually mean? So tell me a little bit about the the product thesis and the context in which you're working. So obviously, I think about serverless functions. Lots of people think about Lambda. Where does Modal fit in and what does it mean to reimagine a cloud function? Right?

Erik [00:06:59]:
Yeah. We actually ended up changing that headline just, a couple days ago. I forgot what what you said. But, kind of taking a step back, what what I wanted to create first was this ability to basically, you know, use this fantastic technology called the cloud, but not, you know, but while maintaining these, fast feedback loops. And when I think about, my own career as a developer, you know, started in ‘99 or something like that. Yeah. I think that feedback loop to me is, always, the best way to look at what makes me feel productive. Do I have fast feedback loops? And that was always, my challenge, you know, working with cloud and, working with data.

Erik [00:07:37]:
And I would say, you know, to some extent you look at frontend engineers, they kinda figure it out. You look at back-end engineers, they kinda figure it out. But, then you look at, data machine learning AI, and I just felt there's no, good infrastructure solution for, how these teams want to work with, you know, computer abstraction. So I started thinking about, how do I build that abstraction, how do I build something that, basically makes it super easy to take code locally and, execute it in the cloud, scale things out, deploy things to as maybe, you have an inference model and you wanna deploy it and you want it to be auto-scaled. Maybe you have some very bursty, one-offs. You just wanna, you know, process some video file, but now you wanna scale it out to 100,000 video files or something that. And there is a lot of those things that has been historically quite hard to do.

Erik [00:08:28]:
And so what I realized was a lot of that comes back to bad feedback loop you get something working locally, but now you have to write a YAML, you have to write a Docker file, you have to get things working Kubernetes, etcetera. That completely ruins that, you know, fast feedback loop. And so what what I wanted to have was something that lets me execute things in the cloud, but basically it retains the same developer experience. They're almost the same as if I'm running things locally, which actually turns out to be quite a hard technical problem in because need to take code, you need to containerize it, and you need to execute in the cloud, ideally, in a couple of seconds or less to have that sort of, you know, fast sort of fluent feedback loop.

Erik [00:09:07]:
So we ended up having to go pretty deep in the infrastructure land and spend time on, how do we start, containers very quickly, realized in order to do that, you have to solve container image distributions. We ended up building a new file system to serve container data. And we ended up realizing, Kubernetes is just not fast enough, so we built our own scheduler. And, yeah. But it was totally worth it because now I feel we have this system that lets you iterate super quickly. And then actually this, serverless thing ended up being a little bit of, you know, serendipitous discovery we first built a system that focused more, on the developer feedback loop and, more like a research type setting. But what we realized is also a lot of this is, a year and a half in, mid

Erik [00:09:47]:
2022. A lot of people started dabbling with, gen AI and Stable Diffusion, and a bunch of those systems came to us and wait a minute, could we use Modal for this?

Erik [00:09:56]:
I was actually, like serverless inference, makes a lot of sense for us because we had GPU support. We had the ability to start continuous very quickly. So we started focusing more on, serverless in, more of, a production setting, more of an online setting. And it turns out to be a lot of the infrastructure we had built led itself quite well to that ability to iterate quickly and or sorry. To serve online traffic quite well because we had the ability to start containers very quickly, and we have GPU support and and auto-scaling and all those things you want.

Kshitij [00:10:28]:
Yeah. And I guess the the way that I would think about that thesis, just reflecting that back at you, is this idea of, you can write a bunch of code locally. You know, you can run a function in the command line. You shouldn't have to take on a bunch of mental overhead to figure out how you're gonna run it in production. And, all the the kind of DevOps associated with that should ideally be as simple as possible. When does that abstraction, if at all, become leaky? I think I already see instances of this where it's, great - I'll deploy my code to a serverless function, but I don't think about the fact that my RDS instance can't handle an unlimited number of connections. Right? And so then the database instance falls over where maybe there are some cases where you do want the engineer to think about the difference between their local environment and their production environment.

Kshitij [00:11:16]:
At what point are you, over abstracting that? And then how do you think about the sort of use cases Modal solves in relation with those?

Erik [00:11:25]:
Yeah. I think that's, an incredibly hard problem. Because on one hand in order to simplify things, you need to make assumptions. But then, when you do that, or add abstractions, either they become, leaky or they become limited or something that and I think that's been the sort of problem that I look at a lot of alternatives to model. And arguably, one of our competitors is basically just homegrown Kubernetes wrappers. Most of the things at a certain size, that do significant amount of, data, AI, machine learning workload, have built some sort of abstraction over Kubernetes.

Kshitij [00:12:00]:
Right.

Erik [00:12:01]:
Those teams to deploy things into production. In many cases, those abstractions are not very good in terms of, the ability to iterate quickly. But, also to your point, they end up being leaky you have this abstraction, but then, you actually do need to understand, how Docker works. You do need to understand how Kubernetes works. And I don't know. , I don't have, a particular, great solution to that, but I do think the fact that we focus specifically on data, AI, and

Kshitij [00:12:26]:
machine learning

Erik [00:12:27]:
Mhmm. Does mean that, you know, we can think more about, what are we actually trying to accomplish? What are the set of use cases for this and and, build for that? And that makes it a lot easier. So this is so far, I mean, you know, I think we found a set of distractions that work reasonably well for those people, you know, without being, super limiting you can sort of do quite a lot of stuff in Modal. It's Yeah. Somewhat flexible and dynamic. And, occasionally, people run up against, limitations, and then, you know, we sit down and kinda rethink, maybe we need to, redesign this stuff and, you know, reveal it. But I don't know.

Erik [00:12:57]:
It's any program, any tool. Right? That’s fundamentally the decision. You start with, 100% of the use cases, and then you say, actually, we only wanna support this, 50% and then, you build an amazing experience to do the 50% really nice.

Kshitij [00:13:14]:
Yeah, and it's interesting because I think the words you choose also influence some of the use cases people think you want to serve. Right? So we're talking about serverless. I think lots of people think of serverless, at least initially, as just, you know, it's gonna be a 20-second function that's doing something lightweight that I can, that, you know, maybe has variable demand. But I know that a lot of folks are using Modal for significantly more intensive workloads and just longer running workloads.

Erik [00:13:40]:
Yep.

Kshitij [00:13:40]:
One, does that resonate? And two, is that something you've had to reframe to people of, hey, you know, yes, it's quote-unquote serverless computing, but it's not just for these really small tasks that take seconds or milliseconds to return.

Erik [00:13:54]:
Yeah. I don't know if you necessarily use the term serverless, that much when we talk to people because our audience is very, data-focused. So, for them, serverless is not necessarily but, I do think your your question is really good, though. , and I've thought a lot about this is, we've had, a decade of Lambda now. And, I think, when I saw Lambda the first time, I'm, woah! This is super cool everything should be a Lambda.

Erik [00:14:14]:
But I think, you know, first of all, it's not necessarily I mean, I think Lambda in itself is, a lot of, usability challenges. It's quite annoying to work with. But I also, wonder if, fundamentally, the compute model, doesn't actually lend itself well to back end stuff. Because, a typical back end, one example, for instance, I think a lot of it is, a typical, back end application is, mostly IO bound you have this lots of coroutines, just sitting around, waiting for IO. And that was actually not that good for that because it ends up being very expensive to, fire off a Lambda for every request because, most of the requests is just sitting around waiting for a database or whatever. One of, one thought I've had is that maybe serverless is a great technology, but, people have been focusing on the wrong application, which is, you know, people have been focusing on, using serverless for back end use cases. The other thing was, back end use case often have, more predictable traffic.

Kshitij [00:15:06]:
Right.

Erik [00:15:06]:
So I don't know. My thing is, basically, maybe serverless, the real application is data, data AI machine learning. Compute heavy stuff. Right? And we're dealing with compute-heavy stuff, you also need much more than what Lambda can do. And so, you need GPUs. You need long-running things.

Erik [00:15:24]:
You need things that maybe use, I don't know, a 100 gigabytes of RAM and run for an hour or, 100%. That's sort of, a little bit of thesis I have. I was biased, but that maybe the real killer app for serverless is not what people think it is, back end. It's actually data AI machine learning.

Kshitij [00:15:42]:
It almost feels the thing that Modal is offering is managed compute, and serverless just happens to be a way people think about it and attach a lot of, you know, preconceived notions too.

Erik [00:15:53]:
Exactly.

Kshitij [00:15:54]:
Let’s actually talk about this, the fact that you all support GPUs because that's a use case you're leaning into. Was that a decision from the beginning? How did you navigate that? And was that a result of AI interest or was that something that you kind of had it's some foresight to say, look, this is an important use case for us to support.

Erik [00:16:13]:
Yeah. And, I mean, it always felt a pretty core, you know, focus for any platform that deals with, you know, data, AI, machine learning you kinda have to do GPU today. What we didn't know at that time was that you know, how much demand there would be for GPU. And you know, how much of a problem, people have finding capacity and dealing with and I think that's also, how I think about, why Modal was so successful specifically with GPUs is that there's a lot more value to be provided

Kshitij [00:16:47]:
Yeah.

Erik [00:16:47]:
specifically around, just getting GPU capacity and dealing with the containerization and the drivers and all of that stuff. It's kind of annoying to run stuff on GPUs. And we reduce that burden quite a lot, it's significantly easier to get stuff running on a GPU with Modal.

Kshitij [00:17:06]:
And actually that's a particularly interesting point even from a go-to-market lens because, you know, you're building core infrastructure, you're running it for other people, you're effectively getting capacity, GPU capacity, for other people, but there's real kind of COGS to that. So what do you think about I know you have a pricing model that is fairly precise, you know, I think bills on the second. But how do you think about more broadly capacity planning and thinking through the cloud economics of running model?

Erik [00:17:36]:
Totally. And just maybe to explain to the users not familiar with the business model, we run a multi-tenant model. And we’re similar to Lambda, which sounds like we run user code in our sort of cloud. We're almost a cloud provider ourselves in a way. And even though we use, existing cloud providers under the hood. And then we charge customers based on the time it takes in the CPU usage and the memory usage. And so I mean you know, I think that the benefit is, revenue comes quite natural, and it's, sort of obvious point of monetization, just, kind of thinking from, a commercial angle here, and I think we benefit from that.

Erik [00:18:12]:
I think, obviously, it is a different model than a pure software company, and margins will never be, you know, 95%+.

Erik [00:18:21]:
You know? Exactly. It's very different. And so we have significant COGS, but we've been able to get pretty good margins. And I think a lot of that comes from our ability, for instance, to leverage, you know, spot capacity to some extent. We've been multi-region very early, and I think that's been very beneficial in terms of, you know, you can you if you pull a lot of capacity from different regions, you can get, you know, a bigger pool. And it turns out, actually, the latency, you know, associated with, moving things around is, not that significant for most Gen AI applications you can send a request to, you know, in the US over to Europe, run it around GPU, and then send result back to the US without, meaningful, degradation from a user experience point of view. So I think it is more challenging for sure.

Erik [00:19:04]:
And I think one thing I've been thinking about is this what I think was the WeWork problem of how do you handle these demand that's very short term and, very ephemeral and very volatile? Yeah. And then, you know, on our side, in some cases, we have to make reservations, 1-year or 3-year reservations. Right.

Kshitij [00:19:22]:
Exactly.

Erik [00:19:22]:
And doing that well, I think it's a very hard problem. But, we spend a lot of time like what what what just one example is, we on our scheduler side, our code that goes out and, provisions and it terminates cloud instances. That's actually running 24/7, a mixed integer programming problem to optimize our cloud spend, keep utilization high, find the lowest cost, spot instances, and move things around. And I think those are the things you need to do in order to get, you know, the margins to a place where you want them to be.

Kshitij [00:19:54]:
Yeah. That makes a lot of sense. And it sounds there's there's a lot of technical investment in that area, not just kind of thinking about it from a go-to-market lens. One thing I'm curious about, and so on a slight tangent, is there's been a lot of talk about the cloud economics of free tiers. I don't know if you've seen this all over Twitter, but especially with some of these database providers. I'm curious if you have a point of view on that, you know, to what extent should you as a provider, and you as you were saying, effectively a cloud reseller, be thinking about the free tier as a growth model? To what extent should you just be focusing on the core business and driving revenue? You know, what's kind of the mix of that story?

Erik [00:20:37]:
I don't have, a, you know, a a general point of view. I think it's so hard. It's so idiosyncratic. I worked at Spotify for 7 years, and Spotify for Spotify, it worked beautifully. Right? The free tier was, never a moneymaker, but it was a it was a necessary condition in order to get growth. And, eventually, you monetize those users through premium. And I think in a way, we have a similar view, and, I think everyone who has a free, you know, premium model has a similar view. It's, you know, you almost have to think about the free users as CAC, customer acquisition cost.

Erik [00:21:09]:
Right. And then Right. You have to recoup that later because they upgrade. And I think what you're referring to is, PlanetScale got rid of the free tier recently and I understand that it's hard. And I think especially for databases because, because you end up getting a lot of hobby users. And I think something that's, also specific to databases is the cogs of running databases, unless you have a truly serverless model, it's actually kind of expensive to run a database 24/7 unless you can scale down to 0, which is, why I think, Neon for instance is still doing a free model. And I think for them, it's, totally fine to do it because they've reduced cost I think if you're doing, databases, you almost have to do serverless. Right?

Kshitij [00:21:53]:
Yeah.

Erik [00:21:54]:
I think for us, it's been a little bit different what we see is that I mean, obviously, we have the serverless advantage, but I think more importantly, what we see is customers come in and they spend $30 a month, and we front that. You get a $30 free credit every month when you use Modal. But such a high proportion of those users end up converting into higher high enough spenders that we can recoup those credits pretty quickly.

Kshitij [00:22:23]:
Yeah.

Erik [00:22:23]:
So it's just more of, a conversion what's like the LTV? You know? If you're spending $30 a month you need to get, some of those users to

Kshitij [00:22:33]:
Right.

Erik [00:22:33]:
in a few months, they need to start spending $100 a month, $1,000 a month, $10,000 a month. But they do with Modal. So that's the sort of, that's the reason why we can fundamentally subsidize that tier.

Kshitij [00:22:44]:
Yeah. And I think that's an interesting framing. One thing I've heard is even from just a business discipline and accounting perspective, lots of people will take that $30 a month or you know, free trial credits and put that under sales and marketing spend. Right? It's it's not COGS, it's sales and marketing. And I think what that incentivizes is to take a close look at the conversion rate, the churn rate, the growth rate of those customers Because now someone is on the hook for that translating into revenue in the future. Right? Yeah. Yeah.

Erik [00:23:16]:
Exactly. We don't do that yet but we do look at, gross revenue and net revenue, net of credits. And, we have okay margins, in both cases. So, we can at least for now, I mean I wanna keep supporting it forever because I am a big fan of, freemium in general, being biased as, you know, Spotify is so fun. But I think for us, it makes sense these people convert eventually, and they convert in high enough proportion that we can recoup.

Kshitij [00:23:43]:
Awesome. Okay. So maybe let's talk about developer productivity because that's a big part of what Model does as a product, but it's also the context in which I hear a lot about Modal, just on social media people really are champions of the developer experience. I think it's it's an interesting question, is there a silver bullet there? Is there something you found to work really well? Is this just, months months or years of just being, you know, really aware that that's the kind of one of the primary value props early on? Yeah. I wanna hear a little bit more about that from you.

Erik [00:24:16]:
I mentioned some of it, which is that I think one of the best ways to think about developer productivity is, understanding it from the point of view of feedback loops. How fast do you get feedback? You know, when you traditionally write code, you have you know, maybe the editor even, these days, can flag things, as you're typing. But then you have, you know, separately, you have maybe the, you know, the interpreter or the compiler flagging things. And then, you know, you have unit tests, you know, in order to, and and it's sort of a I mean, people refer to this as, shift left. Right? You wanna take, you know, these, complex feedback loops and, and and and find problems as early as possible. The cost of finding, you know, a problem in production of fixing it is, 100 times higher than if you're finding it while you're writing code. Right? Because they can just, fix it. And so that to me the number one way to look at developer productivity.

Erik [00:25:05]:
But I think there's so many other different aspects of it too I think a lot about, you know, building tools and developing tools for humans is also, something I think is, very it's hard. It's hard enough to write code for computers, but when you're actually writing code for, humans, that ends up being a much harder problem because, you're not just, dealing with, the compiler's understanding of the code now you have to think about a human's understanding of the code. And so, when I think about, what I want the model SDK to look, I have to you know, I'm trying to visualize in my head, here there's a user looking at this the first time. How what are they gonna see? What are their, mental concepts gonna be? And so I think I think that's that's, another sort of guiding principle I think a lot about. It's, how do we build an SDK

Erik [00:25:57]:
that sort of latches on to people's existing mental models of things and Yeah. Kind of just feels, intuitive. And, it feels, you know, people get this big beautiful pile of LEGO blocks, and they can just, put them together and build something the first time they try it out that's a sort to me, that's, the magic tools that I've seen, I don't know. I think the tools like React, I think is, you know, just. well-designed and that has this, kind of couple of core foundational building blocks. And then yeah, I think you can snap them together and build, almost anything, and it kinda just works. Torch is another example.

Erik [00:26:33]:
I think it's done phenomenal work. It's a lot of complexity around Torch and all the tracing and you know, eager mode and all of that stuff. But it works and I don't have to, wrap my head around it the same way I felt with TensorFlow when I tried to there are these, well-designed frameworks that I think you've done. What's the word I'm looking for? They really show how to do this well.

Kshitij [00:26:59]:
Yeah. This is the idea of, the code being consumed for human thought and the human mental model. I don't know if you're familiar. It feels very, Brett Victor-esque to me this idea that you have to think about the human behind the screen. I think one of the things that to me seems a big challenge is designing an SDK that or, or just a developer experience more broadly. That is a good fit for the user that is just getting started on day 1, and also a good fit for the user that is really trying to dig deep and assemble their own complexity from the primitives.

Kshitij [00:27:32]:
But I think, you're saying, that kind of comes from the complexity unfolding as you need it rather than it being presented to you all upfront. Right?

Erik [00:27:41]:
And I think that this idea I've heard of progressive revelation of complexity that people talk about that idea that, you don't have to understand everything when you get started. I think it's a kind of a beautiful concept.

Kshitij [00:27:54]:
Sweet. So maybe let's go back a little bit to talking about some of the technical challenges.
One of the things you mentioned in this conversation and as part of this kind of really fast feedback loop is this idea that you've invested into, building your own container system, and I think that's brought you a lot of performance wins. And, yeah, I think that seems a pretty big investment, especially for, you know, a scaling team and a scaling company. How does that go? I'm curious about the technical journey there.

Erik [00:28:21]:
Yeah. I mean, you know, when I started hacking on this, which is myself, pretty quickly, I ran up against these challenges with Docker and with Kubernetes and things like that. And, you know, I had conversations with VCs at that point more to, you know, get to know people. And I feel the consistent feedback I got from people was, why would you build your own file system? You're nuts. People are, nervous about it but I was pretty convinced that, you know, I think this could be done, and I think it's possible to build a proof of concept, you know, that's actually not as big as people think. And I think fundamentally to me the question is also, if you're an infrastructure company, to some extent you can't just be a wrapper.

Erik [00:29:03]:
What's your competitive advantage? If you choose that route, if you wanna be an infrastructure company, you have to be very willing to go down in, the guts of the infrastructure and kinda rethink a lot of your foundational stuff. And I personally love that stuff. So for me, it comes very naturally. I've encountered, a shocking degree of infrastructure founders or infrastructure companies who, don't wanna do that. And, to me, that was, I don't even understand, why are you building an infrastructure company then? To me, you know, the fun part is obviously, there are many fun parts, but, I love infrastructure, and I love building stuff. And I always respect people who build databases. Database companies to me is, the most, crazy thing you can build because you have to spend, 3 years in a cave building a database until it's even useful. Modal has a little bit of, an easier route.

Erik [00:29:47]:
We could make it, kinda useful, just a year-end. But I don't know it personally. But I think for some people, it's a little scary.

Kshitij [00:29:55]:
And I know specifically for this container problem and I think Cloudflare has talked a lot about this as well, but security and SAN are really tricky, or at least they can be. You know, has that been true for you? And, are there trade-offs you had to take to make sure that, you're focusing on that part of the problem too, which Imagine, especially for larger customers is of course, just a really big point of evaluation?

Erik [00:30:19]:
Yeah. We don't use so Docker has, a a lower level primitive called RunC, which was initially what we what we used. And, you know, if you think about, what containers are in, sort of traditional sort of Linux containers, it's basically, RunC runs the containers from Docker, and, basically, it's chroot and, seccomp maybe and, namespaces and, a few things. And that hasn't, historically been, a super strong way of isolating code, especially since we're building a multi-tenant environment. We wanted something stronger. We ended up using gVisor instead. gVisor is a product from Google.

Erik [00:30:57]:
So I mean there's limits to, how much we wanna build ourselves even though we built, you know, our own scheduler and our own file system, we think that you guys are a super solid product. And you guys are basically, it's, somewhere in between, a VM and a container runner. , you basically provide its own layer of, virtual memory. It intercepts all the syscalls, has its own, very limited implementation of basically, it implements all the syscalls in terms of, a much smaller number of syscalls. So you have, a very low surface to the kernel. So there's a few different things, where we felt, gVise is, today, you know, strong enough isolation of code that that we felt very comfortable with.

Erik [00:31:38]:
I think in the future, we want even, more, sort of isolation things some larger customers, obviously, they're gonna wanna run things inside their own environment or have dedicated machines or whatever and, you know, we'll we'll look at that once we get there. But right now, I think, you know, gvice is just, a very strong isolation mechanism, and we're pretty happy with it.

Kshitij [00:31:56]:
And maybe talking about so you said you built your own file system, you built your own scheduler, and I think maybe related to a couple of those things, I saw that you all recently put out kind of an architecture deep dive where you talk about how you translate incoming HTTP requests to function in vacations and how that's a hard problem. You wanna talk a little bit about that? Because I think I imagine that is a thing that you've invested deeply in from a technical perspective over it looks many months.

Erik [00:32:23]:
Yeah. I mean, I think, one of the things we made that'll have made things a little bit easier for Modal is we decided to take a, very integrated approach between the, the infrastructure and the SDK. And so we go we we basically said, you know, in data AI machine learning, everyone uses Python. We're just gonna focus on Python for now. Maybe in the future, we'll add other languages, but, let's do Python for now. And it turns out, specific for web requests, pretty much every single web framework in Python uses an underlying thing called ASCII or whiskey. I forgot what it stands for, but it is, kind of, a lower-level protocol. And, we realized, we could take advantage of that.

Erik [00:33:01]:
We ended up, building a small Rust service that takes incoming web requests, serializes it to, basically, ASCII over protobuf, and then we use that internally in our function call architecture to basically, you know, create function calls with these, ASCII payloads, and then we send it out to, you know, containers running web handlers. And then on the Python side basically take the ASCII, you know, take it out of the protobuf and just execute it. And that works pretty well, we got it working from WebSockets recently. We can have very high request volume. Not everything in Modal is a web handler, by the way, most things are just native Python functions, calling out their native Python functions. But you can take a function in Modal and turn it into a web handler based on, changing one line of code.

Erik [00:33:49]:
And we do serve, I don't know, several 1,000 web requests a second. And things are, you know, pretty happy with the end result. We wrote a blog post yesterday, and you mentioned, that we published it on the blog.

Kshitij [00:33:59]:
Yeah. And I think that's interesting because it's an example of trying to go, and and I think successfully going pretty far to make the abstractions actually work between the local environment and the the cloud environment. I think the kind of, quote, unquote, lazier version of that is just kind of wrap the function in a rest API. And then, you know, it might work for simple use cases, but it's not actually a function indication. Right? And so so then Imagine you run into lots and lots of mental model problems between the two.

Erik [00:34:27]:
Yeah. And I think the sort of programming model that we wanted, for non-web endpoint functions is you just want it to feel you're calling, you know, a function in Python. Right? And, you know but but but, you know, you wanna make it work across container boundaries. So you wanna have, yeah, one function running on one image, calls another function running on a different image. Maybe this one the second one is running on a GPU. Who knows? They wanna be able to just, you know, pass data just you normally would. And that works in model it because model because Python is, built-in support for pickling, serializing and deserializing. So that that part works reasonably well, you know, for, most objects.

Kshitij [00:35:10]:
Awesome. Well, we've we've talked a lot about the economic story, the the technical architecture. Let's go all the way back to the product. What's the thing that's coming in Modal that you're most excited about over the next, let's say, 6 months here? Is it just scaling the existing workloads and use cases? Or is there something in particular that that you're you're, you know, just can't wait to launch?

Erik [00:35:32]:
Right now, a lot of it is scaling, frankly which is a good problem to have. But, we definitely feel like we've reached the end of life for, certain internal services. We have to rewrite a lot of stuff, rewriting in, smaller breaking things up, decomposing it, and rewriting it in Rust in general. So there's gonna be a lot of work going into that. I think in terms of, features, you know, it's hard to say there's a lot of, enterprise table stakes stuff.

Erik [00:35:54]:
And I think one area that I've been very interested in recently is training. I think Modal so far has had the most I found the most use for inference. Yeah. We have a couple of customers, Suno, for instance, that just, AI-generated music, and they run a few 100 up to a few 1,000 GPUs at a point in time, and we handle all the music generation for them on GPU stream model. And so, those types of use cases have been where we've seen most success. More recently, we've seen success with some success with LM fine-tuning. Mhmm. So Ram, for instance, uses this model for fine-tuning LMs and a few other companies.

Erik [00:36:33]:
I think training is the next era that I'm very interested in. I think in particular distributed training. You can train models in Modal already, but you can use multi-GPU single node, but you can't do multi-node training.

Kshitij [00:36:46]:
Yeah.

Erik [00:36:47]:
So that would be, one area that I'm, quite interested in, trying to figure out. Could we make the ergonomics, really nice? , if you're just, training a model, could you, know, change a couple of lines and now you're training a model instead? Yeah. And it's distributed using 100 GPUs. You can train it 100 times faster in, very bursty ways. And you don't have to think about, launching instances and running dash scripts to just having the ability to, take some code and just, parallelize it.

Kshitij [00:37:14]:
Yeah.

Erik [00:37:15]:
That’s an area that I'm very fascinated by. I think we can build some really cool products there.

Kshitij [00:37:20]:
Awesome. Well, that's really exciting and I think it's particularly interesting that you get to take all of these new use cases that are unfolding in front of you and really think about how can we make it easier from a developer experience perspective. But ideally, just, unlock new things that otherwise would would potentially not be feasible, if if you were having to build a lot of this infrastructure yourself. So totally excited about the future of Modal, thanks again for for chatting today. I really had a good time.

Erik [00:37:47]:
Yeah. Thanks. Thanks for having us or having me. This is awesome. Really enjoyed it.

Kshitij [00:37:51]:
Awesome. Thanks, Erik.