a16z Crypto's latest research: Four new business models under the integration of AI and blockchain

ChainCatcher Selection
2023-08-16 12:56:38
Collection
The current situation and opportunities under the trend of integration between blockchain and AI.

Original video: Web3 with a16z, AI & Crypto

Author: Dan Boneh (Professor at Stanford University, Senior Research Advisor at a16z crypto), focusing on cryptography, computer security, and machine learning; Ali Yahya (General Partner at a16z crypto), formerly at Google Brain and one of the core contributors to Google’s machine learning library TensorFlow.

Compiled & Edited by: Qianwen, ChainCatcher

Stephen King once wrote a science fiction novel called "The Diamond Age," in which there is an artificial intelligence device that serves as a mentor to people throughout their lives. When you are born, you are paired with an AI that knows you very well—understanding your preferences, following you throughout your life, helping you make decisions, and guiding you in the right direction. This sounds great, but you would never want such technology to fall into the hands of intermediary giants. Because that would give the company significant control and lead to a series of privacy and sovereignty issues.

We hope that this technology can truly belong to us, and thus a vision emerges that you can achieve this with blockchain. You can embed AI in smart contracts. With the power of zero-knowledge proofs, you can keep the data private. Over the next few decades, this technology will become increasingly intelligent. You can choose to do anything you want or change it in any way you wish.

So what is the relationship between blockchain and AI? What kind of world will AI lead us to? What is the current state and challenges of AI? What role will blockchain play in this process?

AI and Blockchain: A Balancing Act

The development of artificial intelligence, including the vision described in "The Diamond Age," has always existed, but it has recently undergone a leap in development.

First, AI is largely a top-down, centrally controlled technology. In contrast, cryptography is a bottom-up, decentralized collaborative technology. In many ways, cryptocurrency is a study of how to build decentralized systems that enable large-scale human cooperation without a true central controller. From this perspective, this is a natural way for these two technologies to come together.

AI is a sustainable innovation that reinforces the business models of existing tech companies, helping them make top-down decisions. The best example in this regard is Google, which can decide what content to present to users among billions of users and billions of page views. Cryptocurrency, on the other hand, is essentially a disruptive innovation, and its business model is fundamentally opposed to that of large tech companies. Therefore, this is a movement led by marginal rebels rather than by the powers that be.

As a result, AI may be closely related to privacy protection, with both mutually reinforcing and interacting. As a technology, AI has established various incentive mechanisms that lead to less and less user privacy because companies want to acquire all our data. AI models trained on increasingly large datasets will also become more effective. On the other hand, AI is not perfect; models may have biases, and biases can lead to unfair outcomes. Therefore, there are currently many papers on algorithmic fairness.

I believe we are heading towards a path of AI where everyone's data will be aggregated into these massive model trainings to optimize the models. Cryptocurrency, however, is moving in the opposite direction, increasing personal privacy and empowering users to control data sovereignty. It can be said that cryptography is a technology that counters a certain type of AI because it helps us distinguish between content created by humans and that created by AI. In a world flooded with AI-generated content, cryptography will become an important tool for preserving and maintaining human content.

Cryptocurrency is like the Wild West because it is completely permissionless, and anyone can participate. You have to assume that some participants are malicious. Therefore, there is a greater need for tools to help you filter out honest participants from dishonest ones, and machine learning and AI, as intelligent tools, can be very beneficial in this regard.

For example, there are projects that use machine learning to identify suspicious transactions submitted to wallets. This way, users' transactions can be flagged and submitted to the blockchain. This can effectively prevent users from accidentally sending all their funds to attackers or doing something they will regret later. Machine learning can also serve as a tool to help you predict which transactions might have MEV.

Just as LLM models can be used to detect fake data or malicious activities, conversely, these models can also be used to generate fake data. A typical example is deepfakes. You can create a video of someone saying something they never said. However, blockchain can actually help mitigate this problem.

For instance, a timestamp on the blockchain shows that on this date, you said such and such. If someone forges a video, you can use the timestamp to refute it. All this data, the truly authentic data, is recorded on the blockchain and can be used to prove that the deepfake video is indeed fake. So I believe blockchain can help combat forgery.
We can also rely on trusted hardware to achieve this. Devices like cameras and our phones can sign the images and videos they capture as a standard. This is known as C2PA, which specifies how cameras should sign data. In fact, there is now a Sony camera that can take photos and videos and then generate a C2PA signature on the video. This is a complex topic, and we won't elaborate further here.

Typically, newspapers do not publish images as they are captured by cameras. They crop and authorize the photos. Once you start editing an image, it means that what the recipient, the final reader, or the user on the browser sees is not the original image, and C2PA signature verification cannot be performed.

The question is, how do you ensure that users can confirm that the images they see are indeed correctly signed by a C2PA camera? This is where ZK technology comes into play; you can prove that the edited image is actually a downsampled and grayscale version of the correctly signed image. This way, we can use simple zk proofs instead of C2PA signatures and correspond them with these images. Now, readers can still confirm that what they see is a real image. Therefore, zk technology can be used to combat this information.

How Can Blockchain Break the Deadlock?

AI is essentially a centralized technology. It largely benefits from economies of scale, as relying on a single data center makes things more efficient. Moreover, data, machine learning models, and machine learning talent are typically controlled by a few tech companies.
So how can we break the deadlock? Cryptocurrency can help us achieve the decentralization of AI through technologies like ZKML, which can be applied to data centers, databases, and the machine learning models themselves. For instance, in terms of computation, by using zero-knowledge proofs, users can prove that the actual reasoning or training process of the model is correct.

This way, you can outsource this process to a large community. In this distributed process, anyone with a GPU can contribute computing power to the network and train models in this way, without relying on a large data center that centralizes all GPUs.

From an economic perspective, whether this makes sense is uncertain. But at least with the right incentives, a long-tail effect can be achieved. You can utilize all possible GPU capabilities. Allowing all these people to contribute computing power for model training or inference can replace the large tech companies that control everything. To achieve this, various important technical issues must be resolved. In fact, there is a company called Nvidia that is building a decentralized GPU computing market primarily for training machine learning models. In this market, anyone can contribute their GPU computing power. On the other hand, anyone can use any computing power available in the network to train their large machine learning models. This will become an alternative to centralized big tech companies like OpenAI, Google, and Meta.

Imagine a scenario where Alice has a model she wants to protect. She wants to send the model to Bob in encrypted form. Bob now receives the encrypted model and needs to run his data on this encrypted model. How can this be done? By using what is known as homomorphic encryption to compute on encrypted data. If the user has the encrypted model and plaintext data, they can run the encrypted model on the plaintext data, receive, and obtain the encrypted result. You send the encrypted result back to Alice, and she can decrypt it to see the plaintext result.

This is actually a technology that already exists. The problem is that current technology is effective for medium-sized models; can we scale it up to larger models? This is quite a challenge that requires more companies' efforts.

Current Status, Challenges, and Incentive Mechanisms

I believe we need to achieve decentralization in computation. The first is the verification problem; you can use ZK to solve this problem, but currently, these technologies can only handle smaller models. The challenge we face is that the performance of these cryptographic primitives is far from meeting the needs for training or inferring ultra-large models. Therefore, a lot of work is underway to improve the performance of the proof process so that it can efficiently prove increasingly larger workloads.
At the same time, some companies are also using other technologies that are not just cryptographic. They are adopting game-theoretic techniques that allow more independent individuals to work together. This is a non-cryptographic, game-theoretic optimistic approach, but it still aligns with the larger goal of decentralized AI or helping to create an AI ecosystem. This is the goal proposed by companies like OpenAI.

The second major issue is the distributed systems problem. For example, how do you coordinate a large community to contribute GPUs to a network so that it feels like an integrated, unified computing layer? There will be many challenges, such as how to reasonably decompose the machine learning workload and allocate different workloads to different nodes in the network, as well as how to efficiently complete all this work.

Current technology can basically be applied to medium-sized models but cannot be applied to models as large as GPT-3 or GPT-4. Of course, we have other methods. For example, we can have multiple people train and then compare the results, thus creating a game-theoretic incentive mechanism. Incentivizing people not to cheat. If someone cheats, others may complain that their computed training results are incorrect. In this way, the cheater will not receive rewards.
We can also decentralize data sources within the community to train large machine learning models. Similarly, we can collect all data and then train the model ourselves, rather than having a centralized institution responsible for it. This can be achieved by creating a market, similar to the computing market we just described.

We can also view this from an incentive perspective, encouraging people to contribute new data to a large dataset, which can then be used to train models. The difficulty here is similar to the verification challenge. You must somehow verify that the data people contribute is indeed good data. This data should neither be duplicate data nor randomly generated garbage data, nor should it be inauthentic data generated in some way.

Moreover, it is essential to ensure that the data does not somehow undermine the model; otherwise, the model's performance will only deteriorate. Perhaps we must rely on a combination of technical solutions and social solutions. In this case, you can also establish credibility through some site metrics that community members can access, so that when they contribute data, the data will be more trustworthy than before.

Otherwise, achieving comprehensive data distribution will take a very long time. One significant challenge in machine learning is that models can only cover the distribution range that the training dataset can reach. If there are some inputs that far exceed the distribution range of the training data, your model may behave completely unpredictably. To ensure that the model performs well in edge cases, black swan data points, or data inputs that may be encountered in the real world, we need as comprehensive a dataset as possible.

Therefore, if you have such an open, decentralized market for providing data to datasets, you can allow anyone in the world with unique data to contribute this data to the network, which is a better approach. Because if you try to do this as a centralized company, you simply cannot know who owns this data. Therefore, if you can create an incentive mechanism that encourages these people to come forward and provide this data, then I believe you can achieve significantly better long-tail data coverage.

So we must have some mechanism to ensure that the data you provide is authentic. One way is to rely on trusted hardware, embedding some trusted hardware in the sensors themselves; we only trust data that is correctly signed by the hardware. Otherwise, we must have other mechanisms to discern the authenticity of the data.
Currently, there are two significant trends in machine learning. First, the methods for measuring the performance of machine learning models are continuously improving but are still in the early stages; it is challenging to know another model's performance. The second trend is that we are becoming increasingly adept at explaining how models work.

Therefore, based on these two points, at some point, I may be able to understand the impact of datasets on machine learning model performance. If we can understand whether the datasets contributed by third parties help the performance of machine learning models, then we can reward such contributions and create incentives for the existence of that market.
Imagine if you could create an open market where people contribute trained models to solve specific types of problems, or if a smart contract were created that embedded some tests, and if someone could use zkml to provide a model and prove that the model could solve that test, that would be a successful outcome. You now have the tools needed to create a market, and when people contribute machine learning models that can solve certain problems, the market will be incentivized.

How Do AI and Crypto Form Business Models?

I believe the vision behind the intersection of cryptocurrency and artificial intelligence is that you can create a set of protocols that distribute the value derived from this new technology of AI to more people, allowing everyone to contribute and share in the benefits brought by this new technology.

Thus, those who can profit will be those who contribute computing power, those who contribute data, or those who contribute new machine learning models to the network, thereby training better machine learning models to solve more significant problems.

The demand side of the network can also profit. They use this network as the infrastructure for training their machine learning models. Perhaps their models can contribute something interesting, such as the next generation of chat tools. In these models, since these companies will have their own business models, they can drive value acquisition themselves.

The people who establish this network will also profit. For example, creating a token for the network, which will be distributed to the community. All these people will have collective ownership of this decentralized network for computing data and models, and they can also gain some value from all economic activities conducted through this network.

You can imagine that every transaction conducted through this network, every payment for computing fees, data fees, or model fees, could incur a certain fee that goes into a treasury controlled by the entire network. Token holders collectively own this network. This is essentially the business model of the network itself.

AI Enhancing Code Security

Many listeners may have heard of co-pilots, which are tools used to generate code. You can try using these co-generating tools to write Solidity contracts or cryptographic code. I want to emphasize that this is actually very dangerous. Because many times, when you try to run them, these systems actually generate code that can run but is not secure.

In fact, we recently wrote a paper on this issue, which pointed out that if you try to get a co-pilot to write a simple cryptographic function, the cryptographic functionality it provides is correct. But it uses an incorrect operational pattern, so you end up with an insecure cryptographic pattern.

You might ask, why does this happen? One reason is that these models are fundamentally trained on existing code, and they are trained on GitHub repositories. Many GitHub repositories are actually vulnerable to various attacks. Therefore, the code learned by these models may work correctly but is not secure. It's like low-quality garbage in, garbage out. Therefore, I hope people are very cautious when using these generative models to generate code, carefully checking whether the code actually does what it is supposed to do and does it safely.

You can use AI models combined with other tools to generate code, ensuring that the entire process does not go wrong. For example, one idea is to use LLM models to generate specifications for formal verification systems, requiring the LLM to generate a specification for the formal verification tool. Then, ask the same LLM instance to generate a program that meets the specification, and then use formal verification tools to check whether the program indeed meets the specification. If there are vulnerabilities, the tools will catch them. These errors can be fed back to the LLM, and ideally, the LLM can modify its work and generate another correct version of the code.

In the end, if you repeat the process, you will eventually get a piece of code that, ideally, meets this return value and has been formally verified to meet this return value. And since humans can read this backtrace, you can see that this is the program I wanted to write. In fact, many people are already trying to evaluate the capabilities of LLMs in finding software vulnerabilities, such as uniting smart contracts, C, and C++.

So, will we reach a point where code generated by LLMs is less likely to contain bugs than code generated by humans? For example, when we discuss autonomous driving, we care about whether it is less likely to crash than human drivers. I believe this trend will only become stronger, and the degree to which AI technology is integrated into existing toolchains will also increase.

You can integrate it into formal verification toolchains, and you can also integrate it into other tools, such as those mentioned earlier that check for memory management issues. You can also integrate it into unit testing and integration testing toolchains, so that LLMs do not operate in a vacuum. They can receive real-time feedback from other tools that connect them to ground realities.

I believe that by combining ultra-large machine learning models trained on all the data in the world with these other tools, it may make computing programs better than human programmers. Even if they still make mistakes, they might just be superhuman. This will be a significant moment in software engineering.

AI and Social Graphs

Another possibility is that we might be able to build decentralized social networks that behave very much like Weibo, but the social graph is entirely on-chain. It is almost like a public product that anyone can build on. As a user, you can control your identity on the social graph. You can control your data, control who you follow, and who can follow you. Additionally, there is a whole host of companies building portals in the social graph to provide users with experiences similar to Twitter, Instagram, TikTok, or any other experience they want to create.

But all of this is built on the same social graph, which no one owns, and no billion-dollar tech company completely controls in the middle.

This is an exciting world because it means it can be more vibrant, with an ecosystem built by people together. Every user can have more control over what they see and do on the platform.

But at the same time, users also need to filter signals from noise. For example, reasonable recommendation algorithms need to be developed to filter all content and show you the news sources you genuinely want to see. This will open a door for the entire market, creating a competitive environment composed of participants providing services. You can use algorithms, using AI-based algorithms to curate content for you. As a user, you can decide whether to use a specific algorithm, perhaps the one established by Twitter, or another algorithm. But similarly, you also need tools like "machine learning" to help you filter out noise and help you parse all the garbage information in a world where generative models can create all kinds of garbage information.

Why Is Human Proof Important?

A very relevant question is, in a world flooded with AI-generated content, how do you prove that you are indeed human?

Biometric technology is one possible direction, with one project called World Coin, which uses retinal scans as biometric information to verify whether you are a real person, ensuring that you are indeed alive and not just a photo of an eye. This system has secure hardware that is difficult to tamper with, so the proof that emerges on the other end, which is a zero-knowledge proof that conceals your actual biometric information, is hard to forge in this way.

On the internet, no one knows you are a robot. Therefore, I think this is where human proof projects become very important, as knowing whether you are interacting with a robot or a human will become very important. If you do not have human evidence, you cannot determine whether an address belongs to one person, a group of people, or whether ten thousand addresses truly belong to one person or are just pretending to be ten thousand different people.

This is crucial in governance. If every participant in a governance system can prove that they are indeed human and can prove their humanity in a unique way because they only have one set of eyeballs, then the governance system will be fairer and less oligarchic (based on preferences locked into a smart contract with the maximum amount).

AI and Art

AI models mean we will live in a world rich in media, where communities around any specific media or narratives around specific media will become increasingly important.

For example, Sound.xyz is building a decentralized music streaming platform that allows artists and musicians to upload music and then connect directly with our community by selling NFTs. For instance, comments can be made on tracks on the sound.xyz website, allowing others who play the song to see the comments. This is similar to the previous SoundCloud feature. Purchasing NFTs also supports artists, helping them achieve sustainability and create more music. But the beauty of all this is that it actually provides artists with a real platform to interact with the community. Artists are everyone's artists.

With the role of cryptocurrency here, you can create a community around a piece of music, and if a piece of music is created solely by a machine learning model without any human element, then this community would not exist.

Many of the music we will encounter will be entirely generated by AI, and the tools for building communities, telling stories around art, music, and other types of media will be very important, as they will distinguish the media we genuinely care about, want to invest in, and spend time engaging with from other general media.

There may be some synergies between the two, such as much music being enhanced or generated by AI. But if there is also a human element involved, for example, if a creator uses AI tools to create a new piece of music, they have their unique voice, their artist page, their community, and their followers.

Now, a synergy has emerged between these two worlds, where everyone has the best music because AI has empowered everyone. But at the same time, everyone also has the human elements and stories that are coordinated and realized through cryptographic technology, bringing all these people together on one platform.

In terms of content generation, this is undoubtedly a brand new world. So how do we distinguish between human-generated art that needs support and machine-generated art?

This actually opens a door for collective art, generated through the creative process of the entire community rather than a single artist. Some projects are already doing this, where the community influences the chain through some voting processes, generating artworks based on prompts from machine learning models. Perhaps what you generate is not one piece of art but ten thousand. Then you use another machine learning model, which is also trained based on community feedback, to select the best piece from these ten thousand works.

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
ChainCatcher Building the Web3 world with innovators