This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors. Please review the episode audio before quoting from this transcript and email [email protected] with any questions.
Casey, I have a surprise for you.
So last week on the show, you made a promise to listeners that our show would always be human, that we would never replace ourselves with AI.
Well, I took that as a challenge.
So I have spent the past week building an AI version of you.
(LAUGHING) No! For real?
Yeah. I sent many, many hours of audio of you talking to a company called Play, which is an AI startup, and had them build me AI versions of you and of me.
(LAUGHING) Oh, no.
Do you want to hear it?
Casey Newton. I think Elon Musk has a lot of great ideas about social media.
I love getting PR emails from cryptocurrency startups, and, frankly, I wish I got a lot more of them. I hate puppies.
I’m going to seize your laptop and throw it into the ocean. You cannot be trusted around technology.
I’m sorry, but don’t wrong me now because I can get you canceled in like 10 seconds flat just by releasing this audio.
Oh, man. I’m glad we can have a laugh about this now before some synthetic voice clip triggers armed conflict somewhere in the world.
All right, let’s have the AIs do the intro.
I’m Kevin Roose, a technology columnist at “The New York Times.”
And I’m Casey Newton from “Platformer.”
And you’re listening to a very special AI episode of “Hard Fork.” This week on the show, a new type of AI is revolutionizing the way humans make art, come up with new ideas, and express their creativity. It’s called generative AI, and it’s the next big thing in Silicon Valley.
And later, we’ll sit down with Emad Mostaque, the founder and CEO of Stability.AI, who is on a controversial mission to bring generative AI to billions of people.
I’m so relieved by how bad that sounded.
I think it sounded like you, but I don’t think it sounded like me. Do you think it sounded like me?
No. But in fairness, the company did say that I didn’t send enough training data, so they’re going to let their models run. I’m going to send them some more data. So that is Version 1. Version 2 will be indistinguishable from you and me. And if, God forbid, something were to happen to you, we could continue making this podcast.
Right, or if I just want to take a week off.
People say, did the show seem a little different this week? It’s like, eh, did it? I don’t know.
So as our robot voices said, this is going to be a special AI episode of “Hard Fork.” And before we start, I just want to make a disclaimer. I think this should be obvious from listening to the quality of our speech versus the AI speech, but the rest of this episode will be us.
100 percent human.
100 percent human, no AI voices at all. So, OK, now that that’s out of the way, let’s do the show.
Yeah. So I think for the past 10 years that we’ve been writing about tech, so often we would hear CEOs explain away how they were going to solve any problem by saying AI and waving their hands. And I think it’s been easy to assume that it just kind of wasn’t going to happen, or it was going to take a lot longer to happen, because for everyone waving their hands, we didn’t seem to see a lot of it in everyday applications. And then all of a sudden, DALL-E launched honestly. And that was when I felt like, OK, AI seems like it’s here in a new way and a much more consequential way.
Yeah. So AI is not new. It’s been around for a long time. And for most of that time, it’s been doing sort of analysis of existing things, like you’d take a news feed, and you put AI on it, and it just ranks it in order of what it predicts will get the most engagement or something. This new kind of AI that everyone in tech is talking about right now is called generative AI, which is that it takes something like a text prompt, and it turns it into something totally new like an image or a video or, as we heard at the beginning of the episode, an audio file. And it feels 2010 in social media right now.
But it’s not just hype. I think we should also be clear about that. I think some of it certainly is hype, but there’s also just been an astounding number of breakthroughs in just what’s possible to do with AI over the past few years.
Yeah. Is it worth taking a step back and trying to define AI? I remember there was a time maybe in the middle of the last decade where people said, AI is just what people call stuff that feels magic or not quite possible with computers. And then when it becomes possible, we stop calling it AI. So what do we actually mean by AI?
Yeah. Well, AI in the classic definition is like a computer system that can learn on its own, where you’re not just saying, if this then that; where you’re not just giving it a set of linear step-by-step instructions, but it’s actually using techniques like deep learning to sort of figure out problems on its own. You put a bunch of data into a model maybe, and it learns something about that data and uses it to do something.
So the type of AI that we’re going to be talking about today is generative AI. And that really is only a few years old in its current form. So it really started with these what are called large language models. GPT-3 is the best known example.
And this came out of OpenAI, a startup here in San Francisco. And they basically proved that if you build a large enough supercomputer, and you train it on enough data words scraped from the internet, that can create essentially the world’s best autocomplete. You can plug in any text and tell it to finish it, tell it to transform it into some style, make it sound like Shakespeare, make it sound like science fiction, and it can do that with pretty amazing results.
I have this friend who’s a machine learning engineer. And I was asking him a few weeks ago, why is it that this is having such a moment? And he said, a really funny thing about this is there hasn’t been a lot of technological innovation over the past three years with this stuff. All that happened was that they started making the large language models even larger.
So they used to train them on hundreds of millions of parameters, they call them. Now, they’re training them on billions. And the really weird thing that happened was just by adding way more parameters, there were these emergent properties of these systems that no one ever could have predicted. And one of those was just it got really good at this generative stuff. So something that used to be more of an analytical tool is now feeling much more like a creative one.
Totally. And it started with text. That was the first big breakthrough. And then companies started figuring out how to apply the same techniques to different kinds of media. So first, as you mentioned, was DALL-E, which is the AI image generator that was released by OpenAI. The second version was released earlier this year, and that’s the one that sort of got all the attention.
But since then, Google has come out with their version of this. Meta has their version of this. Just in the last couple of weeks, they’ve started using the same techniques to create videos, so now you can create a short five-second video out of a text prompt. Audio is coming along very quickly. And they’re also doing all kinds of crazy stuff like using large language models to complete code snippets.
So more than a million programmers now are using this thing called GitHub Copilot, which is basically just autocomplete for coders. You type in a line of Python, and it will finish it for you. So it’s a pretty amazing moment in Silicon Valley, as we’ll talk about with Emad Mostaque, the CEO of Stability.AI, who just announced this huge $101 million funding round this week. But I think it’s also just worth taking a beat and catching everyone up, catching ourselves up on where we are, why this is the big hot trend in tech right now, and where it’s going.
Yeah. There’s sort of an old joke when I worked for a newspaper, and they would say a story is like anything that an editor saw on the side of the freeway while they were on their way to work. And the joke is that editors don’t actually have any good story ideas. But reporters are the same, where it’s like, it kind of doesn’t feel like it’s a story until it’s happening to you.
And earlier this year, I saw the news about DALL-E 2. I thought, oh, that’s kind of interesting. I’m a person who writes a newsletter three days a week, and I need illustrations for that newsletter. I write about the same five or six companies almost every single day. I’m often using the same stock art of Meta, TikTok, YouTube, whatever. But then DALL-E comes along, and I start to think, hmm, could I get access to that? What would I do? So I wind up getting access. And then all of a sudden, it’s like, hmm, could I see the Facebook logo in a tornado? Could I see the Twitter logo on fire? And I start typing these into DALL-E. And I get these illustrations that are not only incredible, but I got them in 15 seconds. The speed with which these images are generated, it’s almost an afterthought.
I spend all day writing the column and 15 seconds illustrating it, started putting it in the newsletter, and people really liked it. They’re writing in with suggestions. Maybe try this. Maybe try that.
So generative AI became huge for me this year because it is now what I think of as an indispensable tool in my workflow for doing my own job. And if you had asked me at the start of this year, what role do you think AI will play in your business, I would have said none, you know?
Yeah. And this is happening in all kinds of industries. So I’ve been spending some of this week talking with people in a bunch of different creative industries about how they are already using generative AI like DALL-E, like Midjourney, like Stable Diffusion, in their daily work. These are not first adopter tech people. These are architects and filmmakers.
I talked to a filmmaker who is using generative AI models to create concept images. So when he goes to pitch a studio on a new project, instead of just pulling images from Google Images or Getty, he can actually go in and create — I think one example used was a Melbourne tram car in a dust storm. And he can show that to the studio to give them the vibe of what he’s pitching.
I talked to an interior designer who is using these apps to generate options for clients to say, here’s how your room could look if it had a tropical theme. Here’s how your room could look if it had a mid-century modern theme. I talked to people in advertising, who are starting to use this as the first step in their creative process.
And I think it really helps to separate it from other tech hype cycles like, for example, crypto, which a lot of the tech world has been obsessed with for the past couple of years, which raised tons of money and got tons of attention and had tons of people sort of saying this was the future of finance and media and everything else, but never really had the kind of organic take-off among real users, whereas DALL-E has, I think, a million and a half users, Midjourney has 3 million people in its Discord server. So it really is taking off.
Yeah. Like with crypto, we would always just say, well, that seems kind of interesting, but what are you doing with it? And people would be like, well, it doesn’t matter what we’re doing with it. Look how expensive it is. It’s like, look how much more valuable it is. This is like, no, these are just tools that people are using.
Yeah. Sequoia, the big venture capital firm, just put out a blog post saying it thinks generative AI is going to unlock trillions of dollars of economic value. I went to this big party on Monday night that was thrown by Stability.AI, which is the company that makes stable diffusion, which is one of the biggest image generators that uses this type of AI.
Which I was not invited to, by the way. I don’t feel great about that.
Yeah. Well, maybe work for a giant media company.
Mm, that’s one idea.
Maybe don’t be an independent newsletter.
No, look, I’m a DALL-E user, and these people have a different business model. So I think they’re sending a — message received.
And I never expect much of startup launch parties, but this was genuinely extremely buzzy. So I walk in, and immediately I see Sergey Brin, the co-founder of Google, a billionaire, very famous person in the tech industry.
He won’t go to just any party.
Yeah. I assume he’s not just pulling up Eventbrite and going to random startup parties.
And he is one of several billionaires in attendance. I guess, nominally, this was a party to celebrate this big funding round that Stability.AI just raised $101 million, which just to put that into perspective, a lot of startups, when they raise a seed round, it’s like $3 million, $5 million, maybe $10 million.
Yeah. For a seed round, this is basically as big as it gets today.
Yeah. So that funding round and this party, all of this investor hype, really convinced me that generative AI is the big thing in Silicon Valley right now and is going to be the focus of a lot of attention. But there are still a lot of really thorny questions people have about this kind of AI.
There’s the sort of age old question of whether artists and illustrators are going to lose their jobs because of this AI, but also there are questions about copyright. All these AI models are trained on a number of different kinds of material, including some that is copyrighted. Just this week, it was reported that GitHub’s Copilot tool, that autocomplete for programmers, was actually generating copyrighted code.
Yeah. And I would say that the first real trend of this technology was to create non-consensual nude imagery mostly of women and sharing it on sites like Pornhub. So it’s used to harass people, and it’s pretty ugly. Some states have laws against that. Some states don’t. But that’s just one of the ways that these tools have been misused, and I think we’re about to find a lot more of them.
Yeah. And Stable Diffusion, this AI model made by Stability.AI, I think it’s a really good case study for a lot of these concerns because it’s open source. Anyone can download a version of it totally for free and run it on their computer. They can modify it or fine tune it. They can take off the safety filters, whereas a lot of other image generators like DALL-E 2 or Midjourney, these are closed source. And they come with some pretty strict rules about what you can make with them.
And I should say, Stability.AI does have an app called Dream Studio that uses stable diffusion and that does have safety filters built into it. But people are downloading and modifying and building versions of Stable Diffusion that don’t have any rules at all, where basically anything goes. Regulators, who are not known for being on top of every tech trend the moment it happens, are already starting to get pretty worried about generative AI.
So just a few weeks ago, Anna Eshoo, who’s a Democratic Congresswoman from California, sent a letter to the US National Security Advisor and the Office of Science and Technology Policy basically calling out Stable Diffusion in particular, this open source AI model, saying that it was being used to create violent images among other graphic topics and saying that, basically, federal regulators needed to get their arms around this technology because the potential for abuse was so high.
So all that being said, we’ve got the CEO of Stability.AI here with us today. His name is Emad Mostaque. He’s a former hedge fund manager, an oil trader. He’s done a lot of things. He worked on this big data project during COVID-19. And recently he kind of burst onto the AI scene from nowhere. And he’s become a pretty powerful figure.
And all these questions are kind of what I want to talk to him about, like what is Emad thinking about how this technology that he is putting into the world could be used and could be misused? So after the break, we’re going to talk with Emad Mostaque. Emad Mostaque, welcome to the show.
Hey. Thanks for having me.
So the big news this week is that you all have just announced that you’ve raised $101 million, which, by the way, is a very high number, but also a very specific number. You must have not wanted $100 million. What went on there?
The extra one is for luck.
(LAUGHING) Yeah, right.
No. Yeah. I mean, we kicked off a year ago. And this was a seed round, the first substantial fundraise that we did, just in our week of launch in August. It’s not cheap to train big AI models. You need huge supercomputers.
You released this statistic. You have 4,000 of these giant processors, these A100s that are sort of the top of the line graphics processors, which already makes you one of the biggest supercomputers in the world, correct?
Yeah. I think we’re up to 5,892 today.
And you said that you are going to use some of this money to expand that over the next year. How big do you think your supercomputer will get?
We’ll get about five to 10 times as big. It’ll be one of the fastest. Stable Diffusion, we took a 100,000 gigabytes of images, 2 billion images, and we use this great big supercomputer to squish it down to a 2-gigabyte file. It now runs on an iPhone, and it’s a file that can create anything. I mean, it’s a bit like that show, “Silicon Valley,” from HBO. We’re Pied Piper, but it’s reality because we’ve had two types of AI.
Classical AI was big data. They took large amounts of information, and they targeted Kevin to target him ads. Now, we kind of moved on to a new type of AI that kind of learns from principle. So it learns, what is a cup? A cup is you cup your hand. It’s a well cup. It’s an actual cup that you have water in.
And this is a new type of thing that’s allowed us to break through to almost human level across a range of things. So people are taking Stable Diffusion. They’re making images from it, but they’re also making 3D from it. They’re making architectural drawings from it. They’re using it to do dynamic fashion.
Because we released it open source and free to the world, there have been 1,000 different projects done. So I think we’ve had 200,000 developers download it in just a few weeks. And they’re just creating amazing things.
Yeah. Let’s talk about that open source approach that you’ve taken because I think it’s worth noting how strange this is in the landscape of AI research and development. So usually someone like OpenAI, they release something like DALL-E or GPT-3, and they put all these rules around it.
They say, you can’t generate porn. You can’t generate violent imagery. You have to basically be approved to build an app on top of our AI systems. What Stability did was basically just give this stuff away for free to anyone who wanted to use it with, really, no rules at all or very light rules. So why did you take that approach?
Yeah. I mean, we released it with an ethical use license, and there was a safety filter in it. I mean, there are good rules and some sensible things. You don’t want to be used for violence or deepfakes, and we can discuss that later. But at the same time, let’s take OpenAI. That was set up in 2015. Then Microsoft did a billion-dollar investment in them in 2019.
DALL-E 2 is the equivalent image generator from them — again, a wonderful feat of engineering. But you have to ask some questions. For example, until a couple of months ago, you couldn’t use the word Ukraine or any city in Ukraine or anything to do with Ukraine, and you couldn’t use it in Ukraine. Who do you discuss that with? Nobody. They’re a private company. So all of a sudden, you have all of the world allowed to be immensely creative except for Ukrainians.
Similarly on diversity, how do you do diversity if you’re a centralized thing? What they do is they randomly allocate gender and race if it’s a non-gendered, non-race word. So you have female Indian Sumo wrestlers, if you type in Sumo wrestler. Our take was different, which is that you should give these primitives out so people can create their own models. They can bring their own data and customize it to themselves. On the bad side of things, on deepfakes and other stuff, these models weren’t quite good enough to do that versus some of the existing ones.
But we thought, let’s open up the discussion. So we just announced a $200,000 prize for the best open source deepfake detector. And we spent 10 times the compute that we did on the image generation model on an image identification model that will be used to identify bad and illegal and other content. So that’s the approach that we’re taking, which is that we trust people, and we trust the community, as opposed to having a centralized, unelected entity controlling the most powerful technology in the world.
Wait. So if we type in Sumo wrestler using your tech, we’ll just see Japanese men only?
Yeah, pretty much. And if you use Japan Diffusion, which is the fine-tuned version for the Japanese community, you’ll see even better Sumo wrestlers, whereas if you do it on some of these centralized ones like OpenAI, you get random races and things like that.
So part of your argument here is just that when it comes to these generative AI tools, one size does not fit all. You have to make it contextual to whatever you’re trying to create.
Yeah, and I think that’ll happen with centralized tools. I mean, basically I believe this is one of the ultimate tools for freedom of expression. And I believe expression should be free. I’m very American for a Brit, I know. And I think that’s the power. The power is in diversity.
So we’re taking this and building Indian versions of it. We just did a deal to do all the Bollywood stuff, so you’ll have generated Bollywood things with Eros Entertainment. We’re taking it to Thailand, taking it to Vietnam, and we’re seeing this global community come and build. And it’s just amazing to see the diversity of output there, which will never happen if it’s centralized.
So I’m interested in how this technology gets to the average person. I’ve downloaded something called Diffusion B, which is a very easy-to-use user interface. Much like DALL-E, I can type in any words. But Stable Diffusion will actually create an image based on a model that’s downloaded entirely to the laptop that’s right in front of me right now.
At the same time, you’re talking about things like very personalized, contextualized AI. And I’m just wondering, what’s the path from you to me using that? Presumably, you’re not going to be making all these apps yourself. So is it that developers will get an API from you, or how are these tools all going to get built?
Yes. So I mean, we have our API. And then we have DreamStudio.AI, which is our own version of implementation of that. So this week, we announced animation, and we’ve got 3D and video and other things coming in the next few months. So you’ll be able to create anything you can imagine — kind of insane. However, what is happening is that people are taking this, and they’re integrating to other stuff. So if you use Canva, for example, there is a Stable Diffusion integration where you can create —
Canva is like an online Photoshop, basically?
Yeah. Or if you use Photoshop, there’s a plugin.
So you can literally go into Photoshop and say, not just I would like to edit this image, but I would like to insert something totally new into it?
Exactly. So you’ve got Captain Jack Sparrow, and you’re like, on his shoulder, I don’t want that parrot. You’d kind of highlight it. Instead I want, I don’t know, a possum. And it will create a possum in two seconds.
One of the things that you said in your talk on Monday night at this event was that you believe that the world — I want to make sure I use your exact words here — the world has been creatively constipated, and we’re going to let them poop rainbows. What did you mean by that?
Well, look, we’re talking now, right? So audio is the easiest way to communicate, even if you deepfake someone’s voice or whatever. The next level above is text. The reason that you’re a great writer, Kevin, is that you’re a great wordsmith. But that’s effort. It’s effort to create text, so that’s the next level above.
Visual is the hardest, the artistic process being so difficult. But then PowerPoint is visual communication, and it’s incredibly difficult. Everyone hates it. So I’m like, this AI can make it easier to write text and easier for anyone to be able to create PowerPoints or images as they want. It lowers the bar. It lowers that effort. It lowers the pain.
So what happens then is that people will be able to communicate more because how many of the people believe that they can create? I would say maybe 10 percent believe that they can create flawlessly. I believe that maybe another 30 percent believe they could if they put in massive amounts of effort, but the majority believe that they can’t. But the reality is anyone can with these tools.
So like I said, allow us to poop rainbows and make a wonderful new future. That’s the way we go. And that’s going to be one of the biggest changes, I think, in human communication over the next couple of years that we’ve had since maybe the printing press because that enabled us to use text, and now it enables us to use visuals without any barrier. Snapchat and TikTok and things were the first elements towards that. This are the next evolution.
So this stuff right now is, I would say, kind of primitive. I mean, you can generate things that are realistic. But sometimes if you’re trying to do faces, it will be a little off, won’t quite cross the uncanny valley. Sometimes you ask for a hand, and it comes back with seven or eight fingers.
So there’s still some bugs in the software. Where do you think this will be five years from now?
I think you’ll have the “Ready Player One” experience/Holodeck. It will be perfectly high resolution, no issues whatsoever, because, basically, what we’ve done is there’s this concept of a Type 1 and a Type 2 brain. There’s a logical brain that has retrieval, and then there’s that brain that just understands concepts.
We’ve now got both. We have search, and we have concepts. So the way that I tell people to use this in the developer community is don’t think of it just as a one-stop ship, where you put words in and images out. Put it as part of a process where you cross-check and reference.
And then when you go into Unreal Engine, the plans that Apple have around augmented reality, the fact that we’ve got AI chips everywhere, it’ll get faster and faster and better and better because this will be the biggest investment theme, the most interest of anything in the world. Self-driving cars have $100 billion of investment. They got nowhere.
This area, every single content provider in the world will have to think not what’s our Metaverse strategy, but what’s our generative AI strategy now? Because we have massive amounts of content. How are we going to make that for audio, video, 3D, et cetera, in the next generation?
So now you’ve raised this big round of money, $101 million, from venture capital firms. Presumably, they’re going to want that money back at some point. They’re going to want you to make money. How are you making money right now? Because you’re giving away Stable Diffusion for free. It’s all open source. So how are you making money now, and how do you plan to make money in the future?
So yeah, basically all the servers and databases in the world now are run on open source. And it’s a multi-billion-dollar market cap industry. And it comes down to two things, scale and service. So scale is making it easy for people to do this. That’s our API.
So every time someone creates an image through one of your APIs, you get a cut of that?
Yeah, or Dream Studio.
So we get a nice percentage. The second part is service, which is there’s very few people who can build these models, but every content provider in the world wants to have their own versions of this model. You want a Hello Kitty model, or you want a Bollywood model or whatever.
So what’s happening is that we’re going to a lot of these big content libraries and saying, look, this is the actual Metaverse. We call it the multiverse because we think everyone should have their own models, right? And so we’re embedding teams there to create the models for them and sharing in the upside. But you have service contracts, all these other contracts around that, because they’re a specialized thing right now.
So we’re very bullish on this because we think this is the future of content, whereby you have that rich generative future where everyone can personalize and contextualize these things. This whole entire media space will be generative assisted. I don’t think it replaces. I think it enhances.
Well, I’m curious — and it gets philosophical — but what are you thinking of as, I don’t know, humanity’s sustaining advantage here? What is the role of just sort of human consciousness in creation when we’re now giving so much of it to these automated tools?
So I think automated is an interesting thing because this is basically a snapshot of ourselves that kind of extends out, but it requires us as the initial agent to do it. So that’s why I said, if someone used [INAUDIBLE], that person’s thing, because it doesn’t do anything without that person. And so I think we need to consider that very carefully.
So you’re saying that human intention is really what matters here? It’s the fact that we’re telling these machines to help us that is what makes the process fundamentally human and more an extension of us than some sort of robot replacement?
Exactly. And this is the case of many tools that we have like photography and other things. You have Photoshop, right? If you use Photoshop to create a copyrighted entity, and then you sell it, that’s your fault. These tools don’t do anything by themselves. There’s a 2-gigabyte file that you input, and then it creates an output. So we have to kind of take it back to that original human.
But what it does now is it opens up access just like, again, the printing press opened up access. Now anyone has access to visual creativity. Like the first version of this I actually did for my seven-year-old daughter because she was like, I want to create, Dad, and this is fun, and it’s painting, and look at all the stuff you’re doing. And she created this wonderful piece called “Happy Eve” that she sold for $3,500 as an NFT for India COVID relief, and she donated all the money. I was like, holy crap, this is a big thing. And I was like, why aren’t you making more? She made like eight more pieces. And she’s like, Dad, the value of one’s own uniqueness will only go up in time as the sector grows. So she’s going to pay for her own university.
We’ll be right back.
We’re back with Emad Mostaque. Emad, I want to ask you about a letter that came out just a few weeks ago. Congresswoman Anna Eshoo from California wrote a letter to the US National Security Advisor and the Office of Science and Technology Policy basically sort of calling attention to this new generative AI area and, in particular, calling for a regulatory response to Stable Diffusion, to this piece of software that you have built.
She called Stable Diffusion unsafe. She said it was being used to create violent images of Asian women being beaten and other sort of pornographic images. And she says that this could be a potentially dangerous tool. So I wanted to just see, have you read that letter? What are your thoughts on it?
We, yeah, just were made aware of it recently, and we’ve read it with interest. I think it’s this thing — again, it comes down to intentionality and who is responsible for the use of a tool. So if a tool could be used for violence, I mean, we’re here in America. We have guns, and guns are dangerous, but we’re still allowed to use them, right?
This tool is the purest example of a tool for expression that you can possibly get. I’d say it’s very interesting in an American context to call for regulation of a tool for expression. But it’s part of this bigger thing of who is responsible for AI?
So within the European Union context, basically there are calls to — well, the current system is basically calling to regulate it so that the makers of AI are responsible for the end use of that AI, even if it’s academic or open source. That leads to a very interesting future, where basically this technology would be locked away from everyone. And so that base level of creativity that could unleash across the world is removed because of concerns of the other side — blocking of the means, as it were.
It’s got similar things in other types of law at least to a very Orthodox position. And this is the position of AI ethics, which is that there could be something bad, so we must not release it as a Google or a matter or something else, rather than saying let’s open up the conversation, and let’s figure out systems to deal with the bad, and let’s trust in people in the community to have overwhelming good. I think there should be a debate.
Who is responsible for AI? Who is responsible for the output of AI? And how can we make sure that it’s open and positive? But I don’t think that happens if the only debate that happens is behind closed doors and big technology companies.
So this debate is so reminiscent to me of the debate that we’ve had around social media over the past decade, which started with some very similar ideas of social media is going to democratize expression. and people who’ve never had a voice before will now have one. And I think after 10 years, there are a lot of people who really dislike Facebook and think that Twitter is really bad for the discourse.
And even if the technology was created with the best of intentions and the tools themselves weren’t actively encouraging people to do harm, ultimately a lot of harm was done. And so now you see a lot of calls for regulation around the world. I’m curious how you feel about social media. Do you feel it’s been a net positive, and did it inform the way you’ve thought about Stability.AI?
100 percent, because what you had was centralized AI controlling us. And the incentive is advertising, which is manipulation of the mind. And so I saw a vision of an intelligent internet where everyone has their own AI as opposed to centralizing AI, which was the default under this with AI that’s even more convincing. I didn’t like that future. I didn’t want that for my children. And I thought it needed to be expanded, not only through the open source, but across the world, so other people would have a voice here in this discussion and debate because there is this positivity. There is this negativity. We don’t know where it’s going to end up.
But, again, we’re in a democracy. We should have this as an open extended debate. And I’ve tried to create a company that is collaborative and open. We cooperate, and we’re willing to hear feedback.
Maybe I’m just a coward about these things. But when I think about the potential for both the good and the harm that AI can do, I think the best approach is to parcel it out little by little, see what harms result, see if you can regulate that away, create policies around it. So when I look at what OpenAI has done, I think that feels responsible to me.
When I look at what you’re doing, I think you’re going to accelerate both the good and the bad at a really fast clip. And so one question I have for you is what’s your case for the acceleration? What’s the case to do everything all at once?
So I think, again, this is the question of — it is a question of responsibility. It’s a question of organization and other things. We are in a unique place right now to organize some of the discord here to hopefully coordinate this for the better because no one else was stepping in.
This technology was emerging regardless, and we saw that. And we said, OK, we have a responsibility here to try and do our best to guide the thing, but then get other people into the room on this. I think that the parceling out, you never know what that looks like. But when someone breaks, they might break it from a less good perspective, as it were.
And I was very terrified of that, especially because, like I said, this technology is being used for very nefarious terms. The good, though, I think, massively outweighs the bad in this because there’s nothing like creation. We are in a society of consumption at the moment. And if you look at what art therapy does, if you look at the things around, well, the joy that happens with creation and people using this technology, why should we cut that off from the world? Who are the self-appointed representatives that decide that?
I think that’s wrong. Again, it’s the blocking of the means. The possibility of any type of evil means that we cannot have anything, whereas, in fact, what’s best is when we’re stronger together, and we work as a society to combat the evil, and then we push the good.
Yeah. I mean, I think what you’re hearing from me and Casey as reporters who have spent the past 10 years covering the unanticipated consequences of social media is this sense that a lot of the people — it’s not just regulators who wish that some of this stuff had happened more slowly and that they had had time to sort of make rules and adjustments. A lot of the people who are building this stuff have recently said, I wish we had gone slower. I wish we had taken trust and safety more seriously.
There’s this Netflix documentary, “The Social Dilemma,” which has all these former Facebook and Twitter employees saying, we went too fast. We didn’t understand what we were doing. So I guess, what makes you confident that your approach is the right one? Are we going to see you in a Netflix documentary 10 years from now saying, I wish we had gone slower, I wish we had thought about this stuff?
So I’ve spoken to just about everybody on that documentary about this.
And so basically, what we’re doing is what if you’d created Facebook and Twitter without an ad incentive, and you’d also been accelerating the tools to counterbalance this? And we spent 10 times as much compute and money on the identification of images as opposed to the creation of images. We have a $200,000 deepfake prize to create technology to counterbalance that because, again, this technology isn’t our best model, actually.
Again, we trust in the community, and we trust in this decentralization as opposed to this kind of centralized coordination, whereby these decisions were made separately. These algorithms are locked away, and they’re not interrogable. They’re not understandable. For good and ill, because it’s not perfect, you can interrogate the data sets, you can interrogate the model, you can interrogate the code of stable diffusion and the other things we’re doing around language and others because, again, we believe that’s a public good and a public right. And we’re seeing it being improved all the time from biases, trust, and safety, versus, again, being in these big companies that their incentive is not the public good.
Their incentive is the bottom line, and it’s manipulation of you to serve ads. We can’t forget that. And I don’t believe that that can be trusted. I believe it needs to be open and for the people, as it were.
Although it’s interesting to think about. One thing that makes me maybe a little bit less nervous about the usage of these tools to create really bad images in particular is that the content policies of the big social networks will prevent them from being uploaded and shared for the most part. If I create a bunch of extremely violent imagery, I can’t put it on Facebook. I can’t put it on Twitter. And so interestingly, it’s like these same networks that you’re being critical of are actually going to play a huge role in preventing the spread of these materials.
Oh, 100 percent, and because these networks have now adapted for society, right? And this is the thing. Does society accept unethical, immoral, and illegal things? No, it doesn’t. So right now, 4chan and other interesting websites online have had this model for eight weeks. Has incredibly bad stuff been done on the back of that? Not really. I mean, it hasn’t broken the internet.
Again, how much good has been done from that? There’s some amazing art and things that are being created, hundreds, if not thousands, of projects being built on the back of this. We have to acknowledge that. We live in a society, as it were, that has these rules, regulations, and other things.
Social networks and others, they have this motivation that I believed was dangerous for this artificial intelligence, but they’re full of good people and developers trying to do the right thing. It’s just that, like I said, we couldn’t have a world that I thought was just this one closed AI approach, where these decisions are being made without consultation.
Have you seen anything made with Stable Diffusion or any of your other tools, where you’ve been like, ooh, wish that wasn’t made using our tools, kind of feel like we should have made a rule about that?
Not particularly, no. I mean, there’s been some pretty gross stuff because it doesn’t really understand anatomy. And so people were trying to really push — I was like, ooh, that’s a bit gross. But, again, I think it’s a tool like any other, like a Photoshop or things like that.
It isn’t a tool that makes stuff by itself. It’s the intention of the end user. And so there’s nothing that you can make with Stable Diffusion that you can’t make with Photoshop. It’s just become a bit easier.
One other criticism that you often hear of Generative AI is that it’s copying or stealing art from people. I was reading recently this interview with this artist named Greg Rutkowski, who’s a famous fantasy illustrator. People love his work.
And so people who have been using these image generators have sort of figured out that if you just add Greg Rutkowski to the end of your prompt, it makes it look amazing and which is great for them because they can now create stuff that’s like Greg Rutkowski-level of good. But for Greg Rutkowski, who didn’t give permission for his art to be ingested by the AI model, it’s now like everyone can essentially rip him off with very little effort. So how do you think about that?
So I think this is a very valid thing, I mean the fears and concerns around that. So like I said, it was trained on 2 billion images. It was a snapshot of Google Images, basically. So anything you could find through there, you can find through here.
And then it learns from the relationship between words and images. That’s why you need a huge frickin’ supercomputer to kind of do that. So it learns principles. You can’t recreate any image from the data set, but it understands what a cup is or what Greg is and other things.
In fact, the interesting thing is our data set didn’t actually have very many pictures of Greg Rutkowski. Instead it was another model that was released by OpenAI, which had much more of his. We don’t know because we don’t know what the data set is that was part of this that introduced the concept of Greg into this model.
I just love that he’s like the secret word, the code word that unlocks the really good level of AI art.
Well, basically, prompting is like magic, right? It’s like, ah, I’ve got a spell.
But respond to that, because I think this is a valid fear that a lot of working artists have, which is that they are going to be put out of work by people who are essentially using these big AI models to do what they have been doing for a living for years.
Yeah. I mean, I think this was the concern around photography and a lot of these other kind of tools that came through. And it depends on the definition of what an artist is. Is it an aesthetic thing, someone who’s communicating through a community, or is it a job, where you’re an Illustrator, et cetera? And these technologies have been improving to display some of these things all of the time.
The way that I think about it is that, like I said, with a couple of images, you can teach it any style. So even if there was a base model, you bring in a couple of images, it will learn FlibbertiGibbet, which is a new artist from Brooklyn. I don’t know. Whatever.
And, again, it comes down to the intention of the individual user. If they’re breaking copyright or something illegal, that is one thing. Unethical and immoral is another thing because you can combine Greg with van Gogh, with Banksy, and it will incorporate all those styles because it understands that. But it’ll never recreate an image, because, like I said, this is not a compression algorithm. It’s a learning algorithm. It’s like a human brain, again, which is kind of crazy. However, I do think that we should build systems that listen to the community. So we’ve been talking to, I think, hundreds of artists. And so one of the things we’re doing is trying to figure out attribution mechanisms, opt-in and opt-out mechanisms as well, because, again, we listen unlike most others.
We don’t know what the models and data sets from some of the big companies are that understand these concepts, whereas ours is open, interrogable, and so we receive flack. But at the same time, we are listening and building tools around that, and the community’s also doing that. So we’re supporting community initiatives around this as well.
So you can see a world where a Greg can just opt out of having his style included in a model?
But isn’t that a rule? I mean, isn’t your whole pitch that this kind of thing should be open and free, and people should be able to use these giant models to create whatever they want?
Yeah. The technologies should be open and free, and they should use it ethically per the license. But I mean, we get to decide what we put in our model, right? And so these are open decisions where we’re listening to feedback. But then other people can take our code, and they can create their own models. That’s the thing. It’s open infrastructure.
So as a company, we decide what we have for the models that we release, but then we also have our implementations of those models. So our safety filter on Dream Studio is a lot stricter than the safety filter that is in the open model release because that’s a decision we make as a company versus an open source kind of decision. And everyone should have their own one. So for example, you can have an opt-out scheme, whereby people upload their styles.
Again, it’s difficult because even if there’s a picture in “The New York Times” that it picks up, it will learn from that style and that little thing. And then you can show that to Photoshop. And, as I say, this is an opt-out list of filters. And they can choose whether or not to use that.
So this message that you’re promoting, it’s very much oriented around this idea of we are the populist alternative to big tech. And on Monday, at this big Exploratorium party, you gave this talk. You said it was only your ninth talk ever. But you said, basically, that this is the way that we’re going to take on big tech to make sure that AI is not in the hands of a few giant corporations.
I was standing right next to Sergey Brin from Google when you said that, and he was not happy.
He was like, what, I’m the panopticon that you’re talking about here?
Yeah, what did I ever do?
Yeah, what did I do to you, Emad? So that was funny, A, but I think it’s also a very strong pitch right now because a lot of people are skeptical of big tech. But I also wonder because now you just raised $101 million. You have venture investors, including some of the same ones who have invested in other big tech companies.
You are a big tech company now. I know you only have 100 employees and not 100,000 or however many Google has. But how do you think you will keep stability AI from repeating some of the flaws with big tech now that you have all of this money, all these investors, all these expectations around growth?
It’s not easy. We have 100 employees. We have 100,000 people in our community. That’s kind of where our strength comes from, diverse across the entire world. We also give them revenue share, which is very weird. And we give them the upside because we try them as artists effectively.
From the perspective of big tech, big tech is creating the panopticon because they can’t help it, because they’ve got no alternative. We’ve given that an alternative now. So we’re actually working with big tech now and giving them an outlet to be part of this thing. We’re kind of Switzerland that everyone can participate in as the neutral party.
And so we’ve had amazing talks with all of them because they don’t want to head in this direction. They actually want to, and especially the engineers, make things free and open, but at the same time have the regulation and have the trust and safety things. So we’re getting guidance and input around that to have a happy medium because it can’t be one extreme of pure libertarian and just release this and let it go and the other extreme of no one can have anything. It will be somewhere in the middle.
So I think that these kind of factors we have will help us do that. With our venture investing, we raised it on our terms, so we have complete independence, as opposed to, for example, OpenAI, they raised a billion dollars from Microsoft, and Microsoft have an exclusive license to their technology. Again, these misaligned incentives are very difficult to combat against. We would hope that the community, our team, and our position will help us counterbalance that because, also, it’s good business for us to be in this position, because no one else really occupies this position.
Yeah. I don’t know. There’s another world, though, where it’s like, these big corporations, for all of their evils, they are also reined in, in terms of what they can do, because they’re so afraid of the PR crises that might result from them releasing an AI that leads to some sort of bad thing. You’re in this position right now that I think is enviable to them, where very few people have heard of you yet. And so you have much wider latitude to just do whatever and not be as worried as much about what a regulator or the public might think.
Again, we’re in active talks with regulators. And the public, again, is the community and extended. So we release this, and we’re like, it’s open. And then it got a bit crazy. It’s been eight weeks and two days, and it’s just been insane. And so 1,000 projects sprung up. And the community is like, stability, why don’t you step in and coordinate this and have an official mouthpiece? We’re like, OK, fine. So we went in, and we turned the Reddit into an official Reddit. And they’re like, how dare you? Corporate overlords. And we’re like, we’re just trying to make things a bit more organized. And then we had to kind give it back. There’s always this kind of push-pull. I think, again, it’s community first, but it can’t be direct democracy. We’re going to make mistakes, and we’re going to make right things, and we’re going to be put under increasing amounts of scrutiny because what we’re doing is actually fundamentally important. And so I hope that we get the right input to do that right, but we might not.
As you said, big tech is in this unenviable position because they can’t release it for PR things. Because it’s like Promethean fire from the gods — again, it’s the next generation of communication, which is insane — It can be used to burn things down. It can be used to activate the light of humanity. But the only way we can figure out how to do this best is together. So that’s why I want to work with big tech, want to work with little tech, want to work with regulators, want to work with everyone to try and figure out how to do this right.
You invoked this idea that AI is neither good nor bad. It’s neutral. It’s just a tool. This is one of the things that I try to push back on really hard whenever it comes up because I think that all technology is infused with values because it is made by humans. And humans, including you, have values. So do you really think that AI is neutral the way that you are building it, or do you think that people in positions like yours have a responsibility to ensure that their values are reflected in the tools that they’re building?
I would agree with you that it does infuse the values. And, again, there are decisions to be made. And these are decisions where, again, the buck stops with myself as CEO of stability on this or with some of the developers and others. And they will be different because you can take a lot of different choices here.
What I do think, though, in terms of, as you said, good or bad, it’s that once the model is out there, it is infused with certain values. But models like this are generalist models, whereas classical models are models that are more specific. There is a bit of a difference there. And I think, again, it’s the who is liable for the use of the model. This is a very complicated question that we’re trying to figure out now across various jurisdictions and geographies. So I think we need to separate some of these things. But, again, we have to realize that nothing springs from nothing, as it were. There are these biases. There are deliberate decisions that are being made. And, again, I’ve been putting this out and saying, for this thing, the buck stops with me. I made this decision, or my company made these decisions. And, again, we should be held accountable for that.
One interesting difference, it occurs to me, between generative AI and social media is that in social media, as Casey and others have written about, the market is demanding more involvement. Parler, Truth Social, Getter, these free speech apps with very minimal rules, they’re not popular. What is popular is Twitter and Facebook and Instagram and TikTok, all of which have pretty strict guidelines.
In generative AI, what seems to be happening is that the market wants openness. I mean, Stable Diffusion has, I think — and correct me if I’m wrong here — more users than DALL-E or Midjourney. Is that right?
Yeah, it’s got more uses than both.
So what do you think the difference is there? Why are people in this field looking for sort of a more free-for-all environment than people on social media?
So social media, again, you have centralizing AI that directs information from one place to another. And it’s been this kind of equalizer in a degree, but it’s also been an extender, whereby people, such as yourselves, have an almost disproportionate platform to influence. And that’s how the algorithms are set up.
People like easy, whereas generative media is a more personal experience. There’s no social aspect to it. It’s already integrated into TikTok, so they’ve just introduced their generative thing. They should use Stable Diffusion instead because it’s much better. But this is the way of things. But this is going to go exponential. And, again, it’s a very personal thing, where you put it out, and then you use the social networks to communicate.
Where do you think generative I goes from here beyond photos, beyond videos, beyond the creative professions? Is there a world in which this type of technology changes the law, medicine, science, journalism? I mean, where do you think this is headed?
100 percent. It’s going to be everywhere. We look ahead 10 to 20 years, and you have an AI that learns principles as well as an that can search logic. That’s insane because it isn’t just generative media. It’s like a reflection of the human brain.
And I think, again, it’s the most powerful technology in the world, which is why I said we couldn’t trust for it to be just in the hands of a few. And so we were like, let’s put it out to the people because it should be AI for the people, by the people, where they can create it themselves.
Emad, thank you so much for — this is really a challenging but a productive conversation, and I’m glad you stopped by to talk to us about it.
Thanks so much for having me on the show.
That’s it for “Hard Fork” this week. Our show is produced by Davis Land. We’re edited by Paula Szuchman. This episode was fact-checked by Caitlin Love with original music by Dan Powell, Elisheba Ittoop, and Marion Lozano. Our show was engineered by Corey Schreppel.
Special thanks to Hannah Ingber, Nell Gallogly, Kate LoPresti, Shannon Busta, Mahima Chablani, Jeffrey Miranda, and Mahmoud Felfel from Play.ht. You can email us at [email protected] That’s it for this week. See you next time.
We also don’t have to say, that’s all for this week, see you next time every week. You guys can do whatever you want. I just put that in there.
We should have something that feels a little bit less AI.
Because an AI would say, that’s all for next week, see you next time —
— whereas we might say, adios, cowboys. We’ll catch you on the range down the road. We’ll see you down the dusty trail!