Dialogue with MyShell Founder: To Build a Super Dream Factory for Robots
Interview: Afra, Zohar, AI Vanguard**
Editor: Afra, ChatGPT, AI Vanguard
The Starting Point of MyShell's Explosion
"After the earliest Demo bot went live, within the third or fourth week, our user community exceeded 8,000 people, and at that time, we saw a total of 30,000 users in the backend."
"There were 8,000 people in the group chatting and discussing issues every day. Throughout March and April, we relied on the community to contribute code and develop different modules, which built us up. During this time, some users even wanted to invest money in us." "As of the date of publication, the total number of users has surpassed 100,000."
MyShell's Growth is Very "Organic," More Like Evolution than Design
"During that time, GPT and many large language models emerged, and we felt that its text capabilities were impressive; however, we wondered if we could add a particularly engaging voice to make it not just a chat tool, but also help users learn new languages. Coincidentally, at that time, there was a need for practicing spoken English, so we spent a day building a bot. After we finished, we were amazed, and Rick was very happy. Talking to Samantha (Note: MyShell's earliest bot, which used Scarlett Johansson's voice) to practice English alleviated the awkwardness of speaking English with people."
"Then we posted about it on social media, and unexpectedly, through that post, the Telegram group grew from a few dozen people to two or three hundred, then suddenly to 1,000, and quickly surged to 8,000."
"A very human-like bot that can have direct voice conversations with you; just press the voice button to speak, and the bot will respond to you with voice."
MyShell is a No-Code Bot Creation Platform
AI Vanguard: First, we would like you to introduce your MyShell product, talk about your current achievements, and your future development plans.
Rick, MyShell Founder: Our goal is to create a no-code platform that allows university students who have never learned programming to easily create the bots they want. Recently, our bot workshop feature officially launched. Since we opened the workshop for creation, users have been participating wildly, with nearly 60 user-created bots. In addition to public bots, there are also over 100 private bots. In the previous two months, we only created 5 bots.
We have various types of bots on our platform, such as language learning, education, and pure tool types. We hope users can combine their favorite bots based on their interests. Currently, we have integrated voice generation capabilities and plan to add image modules in the future. We want to make bots more human-like and combinable to meet the needs of various niche markets.
Outstanding Bot Showcase
YUKI - IELTS Teacher Ben
- [IELTS Teacher Ben] helps you with one-on-one simulation practice and speaking correction.
- Usage example:
- https://app.myshell.ai/share/c177f1ca50d248b6a31bde4f3f64485c
Kaiserwetter - MBTI Stimulation
- Chat with any MBTI personality in any role.
- Usage example:
- https://app.myshell.ai/share/07bfd887a5414ff7bae3d0be985ddae8
We categorize the roles on the platform into model providers, bot creators, and users. We hope to establish a healthy and sustainable economic model that fosters organic collaboration among them. Users can choose their favorite bots, bot authors can select quality models, and model authors can obtain the application scenarios and high-quality data they need on the platform.
AI Vanguard: I understand that you are using a Langchain-like architecture, hoping to enable more people to participate in building this ecosystem through no-code processing.
Ethan, MyShell Founder: Yes, that is our goal. In fact, we hope users can create the AI they want with lower barriers and higher efficiency without needing to understand any code. Unlike Langchain, which primarily integrates text modalities to enable developers to achieve faster text input and output, we believe multimodality is crucial. Therefore, we have developed and integrated voice modalities and plan to add image understanding and generation capabilities in the near future. This makes our platform simpler and more diverse, which we believe is a very important aspect of multimodal integration.
Large Language Models Should Serve as Super Glue, Connecting Other Modalities and Services
AI Vanguard: Regarding the issue of personalization, I think it can be discussed further. Because, in fact, personalization is not just about the appearance and voice of the robot; more importantly, it is about its communication ability and the services it provides. How do you view this issue?
Rick, MyShell Founder: We divide bots into two layers: the surface layer, which is the communication interface that interacts with users; and the underlying capability layer, which is what the bot can do. We believe that large language models should serve as super glue, connecting other modalities and services. We compare this type of bot to a traffic dispatcher, which can distribute user commands to other modules that are better suited to handle those issues.
For the capability layer of the bot, we believe it should be very homogeneous, especially for some basic functions, such as ordering takeout or solving math problems. The differentiation of the surface layer, which is the communication interface layer, is very important. The bot's UI interface needs to be very human-like; it should establish good emotional communication with users, better understand users' intentions, and mobilize different small models to work together in the background.
Ethan, MyShell Founder: The large model can understand user intentions based on their usage habits, but there should be many small models working together in the background. In the front, there will be a large language model or a dedicated model that is most familiar with the user to manage the dispatch of different capability modules behind the scenes. For example, asking about the weather, solving translation issues, or other functional problems actually requires the bot to have a strong understanding of the user's habits and intentions.
Rick, MyShell Founder: I think a very concrete example can be given, using a familiar scenario. For instance, when we are in a work meeting and encounter a professional problem that needs to be solved, we usually bring in another person and say, "Can you take a look at these issues and give some suggestions?" For example, if you are chatting with our bot Samantha and say, "We are going out for dinner tonight, about 12 people, do you have any suggestions?" Samantha might pull in another chef bot to chat and let the chef bot make arrangements. These bots have a common event: each bot knows of the other's existence and what unique capabilities they can provide, and will dispatch another bot when needed to provide the corresponding service.
Secondly, regarding multimodal capabilities, the bot will support different types of models and services, and it can choose how to respond at its layer. For example, if I ask Samantha for some home decoration suggestions, if this is done through text modality, communication may be cumbersome for both parties. However, if we can invoke image modality, the issue can be resolved in seconds. Knowing when to invoke which modality is a key part of what we consider personalized UI.
The Future Large Models Will Become More Powerful, but Will Only Be in the Hands of a Few Leading Companies
AI Vanguard: What impact has the emergence of large models had on the industry? What is the future direction of large models?
Ethan, MyShell Founder: First of all, the emergence of large models, such as the GPT series, has posed a significant challenge to various NLP algorithms over the past decade. In the past, we used different algorithms to solve various independent problems, such as specialized translation or error correction algorithms. But now, a super-large model can achieve capabilities that previously required multiple models. This has rendered many specialized algorithms ineffective, as the new model's performance on specialized issues has surpassed that of traditional specialized models.
Secondly, we see that large models like GPT-3 have parameters exceeding 100 billion, making it very difficult for startups to train using consumer-grade hardware or small-scale proprietary hardware, which is very costly. However, we have also seen solutions like LoRA (Low-Rank Adaptation of Large Language Models) that can adjust a minimal number of parameters on pre-trained super-large models for training on new data and scenarios, reducing costs.
At the same time, we believe that future large models will become increasingly powerful, but they will likely only be in the hands of a few leading companies. The vigorous development of the open-source community will lead to the adoption of solutions similar to LoRA, utilizing cutting-edge general models and proprietary data, resulting in countless small models and specialized models.
We believe that large language models will increasingly resemble a brain, connecting all APIs to link all algorithms and tools. It will dispatch external knowledge, coordinate external services, and obtain inputs from outside to complete complex tasks.
AI Vanguard: Currently, we can see that other models trying to catch up with GPT-4 either need special data or must exceed large models in specific fields through extensive proprietary data training. If GPT-5 emerges, what challenges do you predict it will pose to models currently trying to catch up with OpenAI?
Ethan, MyShell Founder: We believe that while GPT-5 may be very powerful, the costs will also be very high. Therefore, we think future models may diversify, with people choosing models based more on cost-effectiveness and demand. After GPT-5, it may be more likely to serve high-quality data production (large volume and standardized format), although the usage cost is high, it is still cheaper than labor costs. There are already similar cases, such as Stanford using data generated by GPT to train small models.
We also have a judgment: we see that Apple seems to have made no moves in the era of large language models, but Apple is a company with strong terminal capabilities and chip production capabilities. Therefore, it is very likely that some of Apple's dedicated chips for mobile devices can efficiently run local large language models, which can solve data privacy issues and optimize response times. I believe that in the future, Apple is likely to play a very interesting role in the AI wave, changing the current competitive landscape where everyone only uses OpenAI interfaces.
AI Startups Want to Build Barriers, They Can Start from Algorithms and Data
AI Vanguard: From the perspective of entrepreneurs, what do you think are the biggest obstacles and challenges currently facing AI startups?
Ethan, MyShell Founder: I think there is a very dangerous situation: the underlying large model companies, such as OpenAI, are iterating their functions, which may actually consume many opportunities for traditional companies and even some emerging startups based on the GPT series. We now find it difficult to predict the capabilities of GPT-4 and GPT-5, as well as how they will evolve. Therefore, many infrastructure layers closely related to OpenAI may be replaced by features developed by OpenAI itself.
For example, Grammarly is currently facing such a situation. When choosing a startup direction and accumulating product technology, everyone needs to think about how to balance their relationship with these underlying giants, which is a very worthy consideration.
As for ourselves, we first judge that multimodality is a particularly important point. Our existing products are primarily focused on developing algorithms and human resources for highly personalized, human-like voice synthesis algorithms. We believe that the new modality of voice and the overall direction should not be reached by OpenAI within a year, which is the technological and product advantage we hope to maintain. We also combine the latest various text modality products on the market with our own fine-tuned small models based on open-source algorithms and data to create our own products, avoiding putting all our energy and barriers too close to large language models.
Additionally, the evolution speed of the open-source community is also accelerating. From this year to now, the development of the open-source community in large language models has been very rapid, with the best-performing open-source models already very close to the performance of GPT-3.5. In the past three months, from the leak of Facebook's LLaMA pre-trained model, to the work done by academia such as Stanford and CMU on Alpaca and Vicuna, to MiniGPT, which can understand images. We believe that the energy of the open-source community is very important; it is a very unique and significant force in competing with large companies in the era of GPT.
In this context, for MyShell, we need to think about how to build technological barriers to prevent the open-source community from erasing our competitive advantages. We need to build barriers in algorithms and proprietary data because, regardless of how external open-source algorithms iterate, we can always use the latest open-source algorithms and our proprietary data to create capabilities that are stronger than open-source or even general models. In addition to technological barriers, we also need to consider how to build multi-sided network capabilities through short-term technological advantages to solidify community and content barriers. For example, Douyin and Taobao are both multi-sided supply and consumption networks. If a platform already has a large number of active creators and users, newcomers will face non-technical competitive pressures and find it difficult to break through this blockade.
AI Vanguard: So, do you both already have specific ideas to face these two challenges?
Rick, MyShell Founder: I think we need to go with the flow. Open-source is becoming stronger, large models are becoming stronger, and the best entrepreneurial ideas should adapt to these changes. Ideally, as these open-source communities strengthen and large models become stronger, your entrepreneurial ideas will also become stronger. We need to find such ideas because anyone trying to challenge these two forces may face sudden failure this year.
Ethan, MyShell Founder: This year, everyone is experiencing FOMO regarding large language models, but we believe that multimodality is particularly important. Therefore, our barrier-building focuses on voice. Because past voice synthesis technologies, whether in terms of cost or effectiveness, have been unsatisfactory and unable to achieve large-scale applications. This year, we have been able to synthesize any human voice at a cost two orders of magnitude lower than all APIs, achieving emotionally rich voice effects.
The second point is that on our platform, we particularly care about building a data closed loop during the user's product usage process, accumulating high-quality datasets. For example, there is a bot I released called voice collector, and we hope users can help us provide some voice or text data while using the product, allowing our algorithms to become more human-like and warm. This data is actually proprietary data accumulated in specific scenarios on the platform, and we hope to achieve a very harmonious cooperative relationship with the open-source community. Regardless of how open-source models iterate, our proprietary data in specific scenarios will always be a barrier we build. We provide creators with useful tools and powerful capabilities to attract more users, ultimately forming a barrier based on content and creator ecology. Once this barrier is formed, we will no longer fear the rapid changes in underlying technology in any era. Because if our monetization efficiency is the highest and the platform's operational efficiency is the highest, we can always choose to access the best APIs or use our proprietary data to train on the best open-source models.
This Will Be an Era of New Technological Acceleration
AI Vanguard: Talk about your past entrepreneurial experiences and why you chose to start a business at this particular time? Why did you choose to approach it from the Web3 perspective?
Rick, MyShell Founder: We started our entrepreneurial journey in the AI field in 2013. During this time, we had one or two work experiences, but most of the time we were in the entrepreneurial process. Therefore, continuing to start a business is a very normal choice for us.
In 2013, I founded a graphics and imaging company, mainly working on AR underlying SDKs. At that time, Apple had not yet launched ARKit, so we developed similar products. Later, I met Ethan, who was studying at Oxford University at the time and joined my company during his internship back in China. Later, Ethan founded a VR startup that mainly addressed filming and roaming issues in VR environments, which eventually became the VR viewing product for Beike.
Over the years, we have been researching AI algorithms and trying to commercialize them. We have accumulated a lot of experience, especially in the large-scale implementation of algorithms and stable output quality. Later, we jointly joined an AI unicorn company, where we were mainly responsible for the robotics department. This experience made us feel that making robots is a very interesting thing because robots are typical multimodal products that introduce another modality when one modality does not work. This idea is actually in line with our current approach to software robots and planted the seeds for our entrepreneurship.
We chose to start a business at this time because we saw the powerful capabilities of large models like GPT-4 and felt that this would be a new era of technological acceleration. Since seeing ChatGPT at the end of last year, we have been in a state of confusion and shock, caught in a mix of extreme excitement and fear. We found that even the most cutting-edge people in the industry were also surprised by OpenAI's rapid development.
We believe that natural language is a very important field. AI can directly establish natural language, and once the seal of natural language is opened, the boundaries between humans and machines will be broken, leading us into a new technological acceleration where more modalities may be integrated and connected. This means that many things we have done in the past may not be that important anymore, and we feel both excited and fearful. In this technological acceleration, there is no choice but to reset ourselves, to reset all past understandings and judgments about entrepreneurial models and technologies, to rethink problems, and to start anew. Therefore, we decided to start a business in March this year and quickly launched the first demo.
Ethan, MyShell Founder: From my personal perspective, the economic model of Web3 and the efficiency improvement capabilities of multi-sided networks are the reasons we chose this path. We hope to use these capabilities as tools to help us build a multimodal bot creation platform. Additionally, the AI era brings new possibilities for solving content production efficiency issues. Whether it is image generation algorithms (like Stable Diffusion) or text generation models (like GPT), AI can help those without professional knowledge and programming skills solve productivity issues in specific scenarios, improving productivity by at least one to two orders of magnitude. In this case, the definition and distribution of value become particularly important. The multi-sided network platform and cryptographic technology of Web3 can greatly enhance our efficiency in building a multi-sided creator platform and address new ownership and value distribution issues in the AI era. Through the multi-sided mechanisms of Web3, we can achieve decentralized economic benefit distribution through smart contracts and provide liquidity for the platform through token holding mechanisms. Although existing technologies are not yet mature, in areas such as data assets, model assets, and data privacy, cryptographic and blockchain technologies have the potential for anti-big company design and community multi-role economic systems. Therefore, we are building our model more from this perspective, as traditional company forms do not suit platforms like ours.
The Pandora's Box Has Been Opened; The AI Arms Race Will Not Stop
AI Vanguard: Many industry leaders are beginning to worry about the development of AI, such as Geoffrey Hinton leaving Google and warning about AI's future. What are your thoughts?
Rick, MyShell Founder: I think we can see a problem here: many of the internet infrastructures we have built today, including various systems, may not be prepared to face today's new artificial intelligence. Many things may be vulnerable in the face of new large models, which is a safety issue. There is also a data issue; there is good data, such as teaching you IELTS or providing emotional companionship, but there is also bad data, which includes dirty information, misleading information, and internet trolls. When such situations arise, we can only "use magic to defeat magic," employing a larger defensive model to prevent it. These issues can be very troublesome for many small companies or individuals lacking security awareness.
Ethan, MyShell Founder: Yes, because this technology is created by humans. Once humans discover something particularly useful, various forces will begin to compete internally. The current AI arms race happening between companies like Microsoft and Google is very much like the competition between the United States and the Soviet Union during the moon landing; neither side will concede or stop. Therefore, this matter will be driven by various human desires, continuously evolving. So what the future looks like, we can only observe and see how it evolves.
Rick, MyShell Founder: I particularly understand why OpenAI's founder Sam is also working on a Worldcoin project because we may face very serious data pollution in the future, so we need to ensure data ownership. Data must have responsible parties; it must be data issued by someone who can be legally held accountable. You can lie, but we need to prove that this data was issued by a person, so that the corresponding person can be responsible for the corresponding data.
Ethan, MyShell Founder: Worldcoin mainly aims to ensure that every person in the physical world has a unique identity ID in both the internet and blockchain worlds. If this can be solved, it may help address the data ownership issues that Rick mentioned earlier. Additionally, I believe that the Worldcoin project embodies Sam's thoughts on how to construct future human society.
The Most Important Thing in Entrepreneurship is to Have an Open Mind and Not Hold Too Much Inertia
AI Vanguard: As experienced entrepreneurs, what advice can you give to those who want to enter the AI field?
Rick, MyShell Founder: First of all, I think entrepreneurship is not the only way out. For many people who do not start businesses, following the new generation of AI dividends will present numerous opportunities. For example, many niche scenarios that previously lacked human resources may be well filled. The overall social production value will see a leap forward. Ordinary people can better plan their lives or invest funds in profitable areas.
However, for entrepreneurs, I think the most important thing is to have an open mind. Because I believe that the past experientialism or the inertia of the internet over the past two to three decades may lead many people to mistakenly think this is just another mobile internet opportunity. In reality, AI may open a new technological acceleration in entirely new ways. Therefore, do not hold too much inertia; having an open mind is essential for success in this field.
Ethan, MyShell Founder: I believe that in this wave of the AI era, many small models for specific scenarios will emerge, and the composability between algorithms and models will become stronger and more flexible. Therefore, a product may integrate technologies from different companies within the same modality to provide services to users. In this case, technological evolution will be rapid, and products will become increasingly flexible. Thus, entrepreneurs need to have keen observation skills and innovative thinking to cope with this rapidly changing era.