22 people valued at 1 billion! Dialogue with Character.ai CEO: Instead of directly studying medicine, it is better to study artificial intelligence

2023-07-03 09:13:46

Wall Street News Press: Character.ai is one of the hottest startup stars in the current AI boom. The company's main product is a customizable AI chat robot, which is oriented to the entertainment needs of C-end consumers, and has the ability to accompany emotionally and set up fictional characters. Charater.AI allows users to create their own chatbots with specific personalities, designs and knowledge reserves, such as world celebrities, historical figures, fictional characters in literature, film and television, and even animals, providing users with a novel and immersive chatting experience.

Image source: Generated by Unbounded AI

At the beginning of this year, Charater.ai completed A-round financing of US$150 million, with a valuation of more than US$1 billion, and was promoted to unicorn, with a team of only 22 people.

In April, Character.ai CEO Noam Shazeer, a former member of the Google Brain team, was interviewed by the podcast No Priors.

Core point of view:

As early as 2021, Google had the ability to launch an AI chatbot before OpenAI, but it gave up due to security concerns. The timidity of big companies has also become the reason why he left Google and chose to start a business.

The biggest advantage of Character lies in its user-oriented product strategy. The fully customizable AI chatbot has become a way for many to dispatch loneliness, with some users even saying that Character is their new counselor. Noam believes that AI has great potential for emotional support. Emotional support work does not require high intelligence. For example, pet dogs can do emotional support work well, although dogs are not smart and can't talk. Similarly, an AI with limited parameters can also complete this task. 3 Data requirements tend to increase exponentially with computing power, but data is not scarce. The Internet can provide almost unlimited data, and Character is also considering using AI to generate more data. 4 Character.ai is still in the stage of burning money for scale, and the business model is still being explored. In the future, the team will consider expanding the TOB business. 5 Noam believes that AGI is the goal of many AI startups. But ** the real reason for him to start a business is to promote the development of technology and use technology to overcome difficult problems, such as medical intractable diseases. He pointed out that AI can speed up the progress of many researches. Instead of directly studying medicine, it is better to study AI. **

The following is the verbatim transcript of the podcast audio. ELAD and SARAH are the hosts of the podcast. For the sake of understanding, some passages have been deleted.

Early work experience at Google, and the birth of Transformer

LIVE:

You have worked in NLP and AI for a long time. You worked at Google on and off for 17 years, where interview questions revolved around spell check solutions. When I joined Google, one of the main systems for ad targeting at the time was Phil Cluster, which I think you and George Herrick wrote. I would like to know about the history of your work on NLP language models for artificial intelligence, how this all evolved, how did you start, what sparked your interest?

NOAM：

Thank you Elad. Yes, just, always a natural draw to AI. Hopefully it will make the computer do something clever. Seems to be the funniest game around. I was lucky enough to discover Google early on and was involved in a lot of the early projects there, maybe you wouldn't call it artificial intelligence now. Since 2012, I have joined the Google Brain team. Do some fun stuff with a bunch of really smart people. I've never done deep learning or neural networks before.

LIVE:

You were one of the participants in the transformer paper in 2017, and then you participated in the work on Mesh-TensorFlow. Can you talk a little bit about how all of this works?

NOAM：

Deep learning is successful because it's really well suited to modern hardware, and you have this generation of chips that, in matrix multiplication and other forms of stuff, require a lot of computation versus communication. So basically deep learning really took off, it's running thousands of times faster than anything else. Once I got the hang of it, I started designing stuff that was really smart and fast. The most exciting problem right now is language modeling. Because there is an infinite amount of data, just scrape the network and you can get all the training data you want.

The definition of the problem is very simple, it is to predict the next word, the fat cat sits on it, what is the next one. It's very easy to define, and if you can do it well, then you can get everything you see now, and you can directly talk to this thing, it's really artificial intelligence. So, around 2015 or so, I started working on language modeling and working with recurrent neural networks, which was the great thing at the time. Then the transformer appeared.

I overheard my colleagues next door chatting about wanting to replace RNNs with something better. I was like, this sounds good, I want to help, RNNs are annoying, this will be more interesting.

LIVE:

Can you quickly describe the difference between a recurrent neural network and a transformer or attention based model?

NOAM：

Recurrent neural networks are continuous calculations, every word you read to the next word, you calculate your current brain state based on the old state of your brain and the content of the next word. Then you, you predict the next word. So, you have this very long sequence of calculations that have to be performed sequentially, and so, the magic of the Transformer is that you can process the entire sequence at once.

The prediction for the next word depends on what the previous word was, but it happens in a constant step, and you can take advantage of this parallelism, you can look at the whole thing at once, like the parallelism that modern hardware is good at.

Now you can take advantage of the length of the sequence, your parallelism, and everything works really well. attention itself. It's kind of like if you're creating this big in-memory key-value association, you're like building this big table with an entry for each word in the sequence. Then you look for things in this table. It's all like fuzzy, differentiating, and a big, French function through which you can do the inverse. People have been using this for problems with two sequences, where you have machine translation and you're like translating English to French, so when you generate the French sequence, you're like looking at the English sequence and trying to pay attention to the correct position in the sequence. But the insight here is that you can use that same attention to look back into the past of this sequence you're trying to make. The fly in the ointment is that it works well on GPUs and GPUs, which parallels the development of deep learning because it works well on existing hardware. And that brings the same thing to sequences.

SARAH：

Yeah, I think the classic example of helping people visualize it is saying the same sentence in French and English, the ordering of the words is different, you're not a one-to-one mapping in that sequence, and figuring out how to do that without the information Do this with parallel computation in case of loss. So it's like a very elegant thing to do.

LIVE:

It also appears that the technique is being used in a variety of different fields. Clearly these are multimodal language models. So it's like chat GPT or a character you're doing. I've also been amazed at some applications like Alpha Folding, the protein folding work that Google did, it actually works in a huge performance way. Are there any areas of application that you've found to be really unexpected relative to how transformers work and relative to what they can do?

NOAM：

I just bow my head on the language, like here you have a problem and can do anything. I hope this thing is good enough. So I asked it, how do you cure cancer? Then it's like inventing a solution. So, I've been completely ignoring what people are doing in all these other modes, and I think a lot of the early success in deep learning has been with images, and people are getting excited about images but completely ignoring it. Because, a picture is worth a thousand words, but it has a million pixels, so the text is a thousand times denser. So, I'm a big literal fan. But it's pretty exciting to see it take off in all these other ways. These things are great. It's super useful for building products that people want to use, but I think a lot of the core intelligence is going to come from these text models.

Limitations of large models: computing power is not a problem, neither is data

LIVE:

What do you think are the limitations of these models? People often talk about just scale, like you just throw more computing power and this thing will scale further. There is data and different types of data that may or may not be there. And algorithmic tweaks, and adding new things like memory or loopbacks or something like that. What do you think are the big things that people still need to build, and where do you think it's being tapped as an architecture?

NOAM：

Yeah, I don't know if it will be eliminated. I mean, we haven't seen it come out yet. Probably nothing compared to the amount of work that goes into it. So there's likely to be all sorts of like two inefficiencies that people get with better training algorithms, better model architectures, better ways of building chips and using quantization and all that. And then there will be 10s and 100s and 1,000s of factors like scaling and money that people will throw at this thing because hey, everyone just realized that this thing is incredibly valuable. At the same time, I don't think anyone sees how good this thing is for a wall. So I think it's just, it's just going to keep getting better. I don't and I don't know what's stopping it.

SARAH：

What do you think of this idea, we can increase computing power, but the largest model training data is not enough. We have used all readily available text data on the internet. We have to go to improve quality, we have to go to human feedback. what are you thinking.

NOAM：

With 10 billion people, each person produces 1,000 or 10,000 words, which is a huge amount of data. We all do a lot of conversations with AI systems. So I, I have a feeling that a lot of data is going to go into some AI systems, I mean in a privacy-preserving way, I hope that data can go. Then the data requirements tend to scale exponentially with the computing power, because you're training a bigger model, and then you're throwing more data at it. I'm not worried about the lack of data, we may be able to generate more data with AI.

LIVE:

Then what do you think are the main problems that these models will solve in the future? Is it a hallucination, a memory, or something else?

NOAM：

I have no idea. I kind of like hallucinations.

SARAH：

This is also a feature.

NOAM：

The thing we want to do most is remember, because our users definitely want their virtual friends to remember them. You can do a lot with personalization, you want to dump a lot of data and use it effectively. There's a lot of work going on in trying to figure out what's real and what's hallucinatory. Of course, I think we'll fix that.

Character.ai's entrepreneurial story

LIVE:

Tell me a little bit about LaMDA and your role in it, how did you come up with Character?

NOAM：

My co-founder, Daniel Freitas, is the hardest-working, hardest-working, smartest guy I've ever met. He has been working on this task of building chatbots all his life. He's been trying to build chatbots since he was a kid. So he joined Google Brain. He read some papers and thought this neural language modeling technique was something that could really generalize and build a really open field.

Although he didn't get the support of many people, he only took this project as a sideline and spent 20% of his time on it.

Then he recruited an army of 20 percent assistants who helped him set up the system.

He even goes around grabbing other people's TPU quota, he calls his project Mina because he likes it, came up with it in a dream I guess. At some point I looked at the scoreboard and thought what is this thing called Mina, why does it have 30 TPU points?

LIVE:

LaMDA is like this, I know it is an internal chatbot that Google made before GPT. The news became famous because an engineer thought it had wisdom.

NOAM：

Yeah, we put it on some big language models, and then there was a buzz inside the company, and Mina was renamed LaMDA, and by then, we'd left, and there were people who believed it had life.

SARAH：

Why was it not released later, and what concerns?

NOAM：

For a large company, launching a product that knows everything is a bit dangerous. I guess it's just a matter of risk. So, after much deliberation, starting a business seemed like the right idea.

SARAH：

What is Character's origin story like?

NOAM：

We just want to build something and get it to market as quickly as possible. I formed a punk team of engineers, researchers, got some computing power, and started a business.

LIVE:

How do you recruit?

NOAM：

Some of the guys we met at Google happened to be introduced to Myat who used to be from Meta, and he rolled out a lot, and built a lot of their large language model stuff and their neural language model infrastructure, and some other people from Meta followed him, They are very nice.

LIVE:

Do you have specific requirements or testing methods when you are looking for talent? Or is it just a regular interview?

NOAM

I think it largely depends on motivation. I think Daniel is very focused on motivation, he's looking for a state between a strong desire and a childhood dream, so there are a lot of good people that we don't hire because they don't reach that level, but we also hire a lot of people, They are perfect for joining a start-up company, they are very talented and driven.

There are already Siri and Alexa on the market, don’t compete head-on with big companies in terms of functionality

SARAH：

Speaking of childhood dreams, would you like to describe this product? You have these bots, they can be user-created, they can be character-created, they can be public figures, historical figures, fictional characters, how did you come up with this pattern?

NOAM：

Users often know better than you what they want to do with this thing. ** Siri and Alexa and Google Assistant are already on the market, there is no need to compete with these big companies on functionality. **

If you try to present a public persona that everyone loves, you end up with nothing but boredom. And people don't like being bored, they want to interact with things that feel like people.

So basically you need to do multiple characters and let people invent characters as they please, and there's something I like about the name Character, which has several different meanings: text, character, role.

SARAH：

So, what do people want? a friend? writing a novel? Anything else completely new?

NOAM：

Some users will chat with virtual public figures and influencers on our products. Users can create a character and talk to it. While some users may feel lonely and need someone to talk to, many have no one to talk to. Some would say that this role is now my new counselor.

SARAH：

Two ways of thinking about emotion, right? Like how important is the relationship that people have with the characters, or like what level are we at when it comes to expressing coherent emotions?

NOAM：

Yes, I mean probably you don't need that high-end intellectual level for emotional support. Emotions are great and super important, but a dog can also do a great job of emotional support. Dogs provide great emotional support but have little verbal ability,

LIVE:

What do you think happens to the system when you scale up?

NOAM：

I think we should be able to make it smarter in various ways. Getting more computing power, training a bigger model, and training for longer should get smarter, more knowledgeable, better at what people want, what people are looking for.

SARAH：

You have some users who use Character many hours a day. Who is your target audience? What is your expected usage pattern.

NOAM：

We're going to leave that up to the user to decide. Our goal has always been to get stuff out there and let users decide what they think it's good for.

We see that people who are on the Character website today, the average active time is two hours. This is who sent the message today, which is crazy but significant, and it says people are finding some kind of value.

And then as I said, it's really hard to say exactly what that value is because it's really like a big mixed thing. But our goal is to make this thing more useful for people to customize it and decide what they want to do with it. Let's get it into the hands of users and see what happens.

Burning money for scale TOC is the first priority

SARAH：

How do you think about commercialization?

NOAM：

**We lose money per user and make it up with volume. **

SARAH：

good. This is good strategy.

NOAM：

No, I'm kidding.

LIVE:

Like traditional, 1990s business model, so that's fine.

SARAH：

This is also a business model for 2022.

LIVE:

You should issue a token and turn it into a cryptocurrency thing.

NOAM：

** We will be monetizing at some point soon. This is a business that benefits from a lot of computing power. Instead of burning investors' money, we hope to provide value to enough users and make money along the way. Some services like premium subscription types may be tried later. As we develop some new features, subsequent charges may increase in price. **

LIVE:

I mean, Character as a TOC service really took off in a really dramatic way. If you look at the number of users and the usage time per user, it's crazy. Will you start the TOB business in the future? Like a customer service robot?

NOAM：

Right now we have 22 employees so we need to prioritize and we are hiring. The first priority is TOC.

SARAH：

So you said one of the key reasons why LaMDA wasn't launched right away was security. What do you guys think?

NOAM：

There are other reasons. For example, Google doesn't want people to hurt themselves or hurt other people, and it needs to block pornography. There have been some protests around this.

LIVE:

Do you think all this is the path to AGI or superintelligence? For some companies, this appears to be part of the goal, and for others, it does not appear to be an explicit goal.

NOAM：

Yes, AGI is the goal of many AI startups. **The real reason is that I want to push technology forward. There are so many technical problems in the world that can be solved, such as medical intractable diseases. We can come up with technical solutions. **

That's why I've been researching artificial intelligence,** because instead of studying medicine directly, it's better to study artificial intelligence, and then artificial intelligence can be used to speed up other research efforts. So basically that's why I'm working so hard on AI, I want to start a company that's both AGI first and product first. **

Your product depends entirely on the quality of the AI. The biggest determinant of the quality of our product is how smart the thing is going to be. So now we're like fully motivated to make AI better, make products better.

LIVE:

Yeah, it's a really nice buy-feedback loop because to your point when you make a product better, more people interact with it, which helps make it better The product. So it's a very clever approach. How far do you think we are from artificial intelligence that is as smart as or smarter than humans? Obviously, they're already smarter than humans in some ways, but I was just thinking of something like that.

NOAM：

We are always amazed by the ways in which artificial intelligence can outperform humans. Some AI can now do your homework for you. I wish I had something like this when I was a kid.

LIVE:

What advice would you give to those who have a background similar to yours? Like what did you learn as a founder that you didn't necessarily learn when you were working at Google or elsewhere?

NOAM：

good question. Basically, you learn from your horrific mistakes. Although I don't think we've made any very, very bad mistakes, or at least we've made up for it.

SARAH：

What kind of talent are you looking for?

NOAM：

so far? 21 of the 22 are engineers. We will also hire more engineers. Whether it is deep learning or front-end and back-end, be sure to hire more people on the business and product side.

LIVE:

Last two or three quick questions, who is your favorite mathematician or computer scientist.

NOAM：

I work with Jeff Dean (head of Google Brain) a lot at Google. He's really nice and fun to work with. I think he's working on their large language model right now. This is a bit of a pity to leave Google, and I hope to work with him in the future.

LIVE:

Do you think mathematics was invented or discovered?

NOAM：

I think maybe it's been discovered, maybe everything has been discovered, and we're just discovering.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes

Reward
1
Comment
Repost
Share

Comment

0/400

No comments

Topic
#July PPI Beats Expectations
28k Popularity
#ETH ETFs Top $30B
29k Popularity
#Gate Alpha Peak Trading Competition
141k Popularity
#Bessent on BTC Reserves
4k Popularity
#Gate Releases August Reserves Report
18k Popularity

sitemap