Foresight Ventures: The Best Attempt at a Decentralized AI Marketplace

Foresight Ventures
2023-07-05 16:41:15
Collection

Author: Ian, Foresight Ventures

TL;DR

  • A successful decentralized AI marketplace needs to closely integrate the advantages of AI and Web3, leveraging the added value of distributed, asset ownership, revenue distribution, and decentralized computing power to lower the barriers to AI applications, encourage developers to upload and share models, while protecting users' data privacy rights, thus building a developer-friendly AI resource trading and sharing platform that meets user needs.
  • Data-driven AI marketplaces have greater potential. Marketplaces that focus solely on models require a large number of high-quality models to support them, but early platforms lack user bases and quality resources, resulting in insufficient incentives for excellent model providers, making it difficult to attract quality models. In contrast, data-driven marketplaces can accumulate a wealth of valuable data and resources, especially private domain data, through decentralization, distributed data collection, incentive layer design, and guarantees of data ownership. However, data marketplaces also need to address the challenges of data privacy protection, with solutions including designing more flexible policies that allow users to customize privacy level settings.
  • The success of a decentralized AI marketplace relies on the accumulation of user resources and strong network effects, where the value that users and developers can gain from the marketplace exceeds what they can obtain outside of it. In the early stages of the marketplace, the focus is on accumulating high-quality models to attract and retain users, and then, after establishing a library of quality models and data barriers, shifting to attracting and retaining more end users. Moreover, an excellent AI marketplace needs to find a balance of interests among all parties and properly handle factors such as data ownership, model quality, user privacy, computing power, and incentive algorithms.

1. Web3's AI Marketplace

1.1 Review of AI Tracks in the Web3 Field

First, let's review the two major directions I previously mentioned regarding the combination of AI and crypto: ZKML and decentralized computing networks.

ZKML

ZKML makes AI models transparent + verifiable, meaning that the model architecture, model parameters and weights, and model inputs can be verified across the entire network. The significance of ZKML lies in creating the next phase of value for the Web3 world without sacrificing decentralization and trustlessness, providing the capability to support broader applications and create greater possibilities.

Foresight Ventures: AI + Web3 = ?

Computing Networks

Computing resources will be the battlefield of the next decade, and future investments in high-performance computing infrastructure will rise exponentially. The application scenarios for decentralized computing are divided into two directions: model inference and model training. The demand for training large AI models is the greatest, but it also faces the biggest challenges and technical bottlenecks, including complex data synchronization and network optimization issues. There are more opportunities for model inference to land, and the potential for future incremental space is also large enough.

1.2 What is an AI Marketplace?

AI marketplaces are not a new concept; Hugging Face can be considered the most successful AI marketplace (except for the lack of a trading and pricing mechanism). In the NLP field, Hugging Face provides an extremely important and active community platform where developers and users can share and use various pre-trained models.

From the success of Hugging Face, we can see that an AI marketplace needs to have:

a. Model Resources

Hugging Face offers a large number of pre-trained models covering various NLP tasks. This richness of resources attracts a large number of users, making it the foundation for forming an active community and accumulating users.

b. Open Source Spirit + Sharing

Hugging Face encourages developers to upload and share their models. This spirit of open sharing enhances the vitality of the community and allows the latest research results to be quickly utilized by a wide range of users. This accelerates the verification and promotion of research results based on the accumulation of excellent developers and models.

c. Developer-Friendly + Usability

Hugging Face provides easy-to-use APIs and documentation, allowing developers to quickly understand and use the models it offers. This lowers the barriers to use, enhances user experience, and attracts more developers.

Although Hugging Face does not have a trading mechanism, it still provides an important platform for sharing and using AI models. Therefore, it can also be seen that AI marketplaces have the opportunity to become valuable resources for the entire industry.

Decentralized AI Marketplace in Short:

Based on the above elements, a decentralized AI marketplace is built on blockchain technology, allowing users to have ownership of their data and model assets. The value brought by Web3 is also reflected in the incentive and trading mechanisms, where users can freely select or be matched with suitable models through the system, and can also list their trained models to earn revenue.

Users have ownership of their AI assets, and the AI marketplace itself does not have control over the data and models. Instead, the development of the marketplace relies on the user base and the subsequent accumulation of models and data. This accumulation is a long-term process, but it is also a process of gradually establishing product barriers, supported by the number of users and the quantity/quality of models and data uploaded by users.

1.3 Why Focus on Web3's AI Marketplace?

1.3.1 Alignment with the Direction of Computing Applications

Due to communication pressures and other reasons, decentralized computing may find it difficult to land in training base models, but the pressure will be much smaller in fine-tuning, making it one of the best scenarios for centralized computing networks to land.

A bit of background knowledge: Why is the fine-tuning phase easier to land?

Foresight Ventures: Rational View on Decentralized Computing Networks

The training of AI models is divided into pretraining and fine-tuning. Pretraining involves a large amount of data and computation, which can be referenced in my previous article's analysis. Fine-tuning is based on the base model, using data from specific tasks to adjust model parameters, allowing the model to perform better for specific tasks. The computational resources required for model fine-tuning are much smaller than those for pretraining, mainly for the following two reasons:

  1. Data Volume: In the pretraining phase, the model needs to be trained on large-scale datasets to learn general language representations. For example, the pretraining of the BERT model was conducted on Wikipedia and BookCorpus, which contain billions of words. In the fine-tuning phase, the model typically only needs to be trained on a small-scale dataset for specific tasks. For instance, the fine-tuning dataset for a sentiment analysis task may only contain a few thousand to tens of thousands of comments.
  2. Training Steps: The pretraining phase usually requires millions or even billions of training steps, while the fine-tuning phase typically only requires a few thousand to tens of thousands of steps. This is because the pretraining phase needs to learn the basic structure and semantics of the language, while the fine-tuning phase only needs to adjust part of the model's parameters to adapt to specific tasks.

For example, in the case of GPT-3, the pretraining phase used 45TB of text data, while the fine-tuning phase only required about 5GB of data. The training time for the pretraining phase can take weeks to months, while the fine-tuning phase only takes a few hours to days.

1.3.2 The Starting Point of AI and Crypto Intersection

A key point in judging whether a Web3 project is reasonable is whether it is crypto for the sake of crypto, whether the project maximizes the value brought by Web3, and whether the Web3 enhancement brings differentiation. Clearly, Web3 provides irreplaceable value in terms of ownership, revenue distribution, and computing power for such AI marketplaces.

I believe an excellent Web3 AI marketplace can tightly integrate AI and crypto. The perfect combination is not what the AI marketplace can bring to Web3 in terms of applications or infrastructure, but rather what Web3 can provide for the AI marketplace. For example, every user can have ownership of their AI models and data (e.g., encapsulating AI models and data as NFTs) and can trade them as commodities, which effectively utilizes the value that Web3 can leverage. This not only incentivizes AI developers and data providers but also broadens the application of AI. If a model is good enough, the owner has a stronger motivation to upload and share it with others.

At the same time, decentralized AI marketplaces may introduce entirely new business models, such as selling and renting models and data, and task outsourcing.

1.3.3 Lowering the Barriers to AI Applications

Everyone should and will have the ability to train their own AI models, which requires a platform with sufficiently low barriers to provide resource support, including base models, tools, data, and computing power.

1.3.4 Demand and Supply

While large models have powerful inference capabilities, they are not万能 (万能 means "all-powerful"). Often, fine-tuning for specific tasks and scenarios yields better results and greater practicality. Therefore, from the demand side, users need an AI model marketplace to obtain useful models for different scenarios; for developers, they need a platform that provides significant resource convenience to develop models and earn revenue through their expertise.

2. Model-Based vs. Data-Based

2.1 Model Marketplace

Model

Using tooling as a selling point, as the first link in the chain, projects need to attract enough model developers in the early stages to deploy quality models, thereby establishing supply for the marketplace.

In this model, the main attraction for developers is the convenient and user-friendly infrastructure and tooling. Data relies on the developers' own capabilities, which is why some experienced individuals in a certain field can create value; the data in that field needs to be collected and fine-tuned by the developers themselves to produce better-performing models.

Thoughts

Recently, I've seen many projects about the combination of AI marketplaces and Web3, but I wonder: is creating a decentralized AI model marketplace a false proposition?

First, we need to consider a question: what value can Web3 provide?

If it is merely token incentives or narratives of model ownership, that is far from enough. Looking at it practically, high-quality models on the platform are the core of the entire product, and excellent models usually mean extremely high economic value. From the perspective of model providers, they need sufficient motivation to deploy their quality models to the AI marketplace, but can the incentives from tokens and ownership meet their expectations for model value? For a platform that lacks a user base in its early stages, it is clearly far from sufficient. Without exceptionally good models, the entire business model will not hold. So the question becomes how to ensure that model providers earn enough revenue in the early stages when there are few end users.

2.2 Data Marketplace

Image

Model

Based on decentralized data collection, the design of the incentive layer and the narrative of data ownership can onboard more data providers and users who label data. With the support of crypto, the platform has the opportunity to accumulate a large amount of valuable data over time, especially the currently scarce private domain data.

What excites me the most is that this bottom-up development model resembles a crowdfunding approach. Even the most experienced individuals cannot possess complete data in a field, and one of the values that Web3 can provide is permissionless and decentralized data collection. This model not only concentrates expertise and data from various fields but also provides AI services to a larger user base. Compared to a single user's data, these crowdsourced data are collected from the actual scenarios of a large number of real users, thus better reflecting the complexity and diversity of the real world, which can greatly enhance the model's generalization ability and robustness, allowing AI models to perform well in various environments.

For example, a person may have extensive experience in nutrition and have accumulated a lot of data, but relying solely on personal data is far from enough to train an excellent model. While users share data, they can also effectively reach and utilize valuable data contributed by other users in the same field across the platform, achieving better fine-tuning results.

Thoughts

From this perspective, creating a decentralized data marketplace could also be a good attempt. Data, as a "commodity" with lower barriers, shorter production chains, and wider provider density, can better leverage the value that Web3 can offer. The incentive algorithms and data ownership mechanisms can motivate users to upload data. In the current model, data resembles a one-time commodity, which loses value almost immediately after use. In a decentralized AI model marketplace, users' data can be reused and generate profits, leading to a more long-term realization of data value.

Using data as an entry point to accumulate users seems like a good choice. One of the core barriers of large models is high-quality and multidimensional data. After onboarding a large number of data providers, these individuals have the opportunity to further convert into end users or model providers. An AI marketplace based on this can indeed provide underlying value for excellent models, motivating algorithm engineers to contribute models on the platform from the perspective of training models.

This motivation is a change from 0 to 1. Currently, large companies have vast amounts of data, allowing them to train more accurate models, making it difficult for small companies and individual developers to compete. Even if users possess highly valuable data in a certain field, this small portion of data cannot exert value without the support of a larger dataset. However, in a decentralized marketplace, everyone has the opportunity to access and use data, and these experts join the platform with valuable incremental data, thus further enhancing the quality and quantity of the platform's data. This enables everyone to potentially train excellent models and even drive AI innovation.

Data itself is indeed well-suited to become a competitive barrier for this type of AI marketplace. First, excellent incentive layers and secure privacy protections can allow more individual users to participate in contributing data to the entire protocol. Moreover, as the number of users increases, the quality and quantity of data will also continuously improve. This will generate community and network effects, increasing the value that the marketplace can provide and broadening its dimensions, thus enhancing its attractiveness to new users. This is the process of establishing barriers for the marketplace.

Therefore, fundamentally, to build a data-driven AI marketplace, the following four points are most important:

  1. Incentive Layer: Design algorithms that effectively incentivize users to provide high-quality data, balancing the intensity of incentives with the sustainability of the marketplace.
  2. Privacy: Protect data privacy and ensure the efficiency of data usage.
  3. Users: Rapidly accumulate users in the early stages and collect more valuable data.
  4. Data Quality: Data comes from various sources, necessitating the design of effective quality control mechanisms.

Why are model providers not listed as key factors in this scenario?

The main reason is still based on the above four points; having excellent model providers join is a natural outcome.

2.3 Value and Challenges of the Data Marketplace

Private Domain Data

The value of private domain data lies in its unique and hard-to-obtain information within specific fields, which is especially important for fine-tuning AI models. Using private domain data can create more accurate and personalized models, which will perform better in specific scenarios than models trained on public datasets.

Currently, the process of building base models can access a large amount of public data; therefore, the focus of the Web3 data marketplace is not on these data. How to obtain and incorporate private domain data during training is currently a bottleneck. By combining private domain data with public datasets, the model's adaptability to diverse problems and user needs, as well as its accuracy, can be increased.

For example, in the healthcare scenario, AI models using private domain data can typically improve prediction accuracy by 10% to 30%. According to Stanford's research, deep learning models using private healthcare data achieved a prediction accuracy for lung cancer that exceeded models trained on public data by 15%.

Data Privacy

Will privacy become a bottleneck for AI + Web3? From the current development perspective, the direction of AI in Web3 has gradually become clear, but it seems that every application cannot avoid the topic of privacy. Decentralized computing must ensure the privacy of data and models in both model training and inference; one condition for the establishment of zkml is also to ensure that models are not misused by malicious nodes.

AI marketplaces are built on the foundation of ensuring that users control their own data. Therefore, although user data is collected in a decentralized and distributed manner, all nodes should not directly access the raw data at any stage of collection, processing, storage, or usage. Current encryption methods face bottlenecks in usage; for example, with fully homomorphic encryption:

  1. Computational Complexity: FHE is more complex than traditional encryption methods, significantly increasing the computational overhead of performing AI model training under fully homomorphic encryption, making model training extremely inefficient, if not infeasible. Therefore, for tasks that require a large amount of computational resources, such as deep learning model training, fully homomorphic encryption is not an ideal choice.
  2. Computational Error: In the computation process of FHE, errors gradually accumulate, ultimately affecting the calculation results and impacting the performance of AI models.

Privacy also has levels; there is no need to be overly anxious.

Different types of data have varying degrees of privacy needs. Only certain types of data, such as medical records, financial information, and sensitive personal information, require high levels of privacy protection.

Therefore, discussions about decentralized AI marketplaces need to consider the diversity of data, and most importantly, balance. To maximize user participation and the richness of platform resources, it is necessary to design a more flexible strategy that allows users to customize privacy level settings; not all data requires the highest level of privacy.

3. Reflections on Decentralized AI Marketplaces

3.1 Will Users' Control Over Assets Lead to Platform Collapse?

The advantage of a decentralized AI marketplace lies in users' ownership of resources. Users can indeed withdraw their resources at any time, but once users and resources (models, data) accumulate to a certain extent, I believe the platform will not be affected. Of course, this also means that the project will spend a lot of funds in the early stages to stabilize users and resources, which can be very challenging for a startup team.

Community Consensus

Once a decentralized AI marketplace forms strong network effects, more users and developers will become sticky. Additionally, the increase in user numbers will lead to an increase in the quality and quantity of data and models, making the market more mature. The value that different interest-driven users obtain from the marketplace will also increase. Although a small number of users may choose to leave, theoretically, the growth rate of new users in this case should not slow down, allowing the market to continue to develop and provide greater value.

Incentive Mechanism

If the incentive layer is designed reasonably, as the number of participants increases and various resources accumulate, the benefits obtained by all parties will also rise accordingly. A decentralized AI marketplace not only provides a platform for users to trade data and models but may also offer a mechanism for users to profit from their own data and models. For example, users can earn rewards by selling their data or allowing others to use their models.

For model developers: Deploying on other platforms may not have enough data to support fine-tuning a better-performing model;

For data providers: Another platform may not have such a robust data foundation, and a small piece of data from users cannot exert value or generate sufficient usage and revenue;

Summary

Although in a decentralized AI marketplace, the project party only plays the role of facilitating and providing the platform, the real barrier lies in the accumulation of users leading to the accumulation of data and models. Users do have the freedom to withdraw from the market, but a mature AI marketplace often ensures that the value they derive from the market exceeds what they can obtain outside of it, which means users have no motivation to withdraw from the market.

However, if a majority of users or a portion of high-quality model/data providers choose to withdraw, the market may be affected. This also aligns with the dynamic changes and adjustments of user entry and exit that exist in various economic systems.

3.2 Which Comes First, the Chicken or the Egg?

From the above two paths, it is difficult to say which one will ultimately prevail, but it is clear that a data-based AI marketplace makes more sense and has a much higher ceiling than the first one. The biggest difference is that a data-based marketplace is in the process of continuously strengthening barriers, and accumulating users is also the process of accumulating data. Ultimately, the value endowed by Web3 is to enrich a massive decentralized database, creating a positive cycle. At its core, this type of platform does not need to retain data but rather provides a market for contributing data, making it lighter. Ultimately, this is a large data marketplace, and this barrier is very difficult to replace.

From the perspective of supply and demand, an AI marketplace needs to have two key elements:

  1. A large number of excellent models
  2. End users

From a certain perspective, these two conditions seem to be interdependent. On one hand, the platform needs enough users to provide motivation for model and data providers; only by accumulating enough users can the incentive layer maximize its value, and the data flywheel can start turning, which will attract more model providers to deploy models. On the other hand, a sufficient number of end users must be drawn to useful models; users' choice of platform is largely a choice based on the quality and capability of the models on the platform. Therefore, without accumulating a certain number of excellent models, this demand does not exist. No matter how advanced the routing algorithm is, routing without good models is just talk. This is akin to how the Apple Store's premise is Apple.

Thus, a relatively good development strategy is:

Initial Strategy

  • Accumulate quality models. In the initial stage, the most important focus should be on building a quality model library. The reason is that regardless of how many end users there are, without high-quality models for them to choose from and use, the platform will lack attractiveness, and users will not have stickiness or retention. By focusing on building a quality model library, the platform can ensure that early users can find the models they need, thereby establishing brand reputation and user trust, gradually building community and network effects.

Expansion Strategy

  • Attract end users. After establishing a quality model library, the focus should shift to attracting and retaining more end users. A large number of users will provide sufficient motivation and benefits for model developers to continuously provide and improve models. Additionally, a large user base will generate a significant amount of data, further enhancing model training and optimization.

Summary

What is the best attempt for an AI marketplace? In a nutshell, it is a platform that can provide a sufficient number of high-quality models and efficiently match users with suitable models to solve problems. This statement resolves two contradictions: first, the platform can provide sufficient value to developers (including model developers and users), ensuring that there are enough high-quality models on the platform; second, these "products" can provide efficient solutions for users, thus accumulating more users and safeguarding the interests of all parties.

A decentralized AI marketplace is a direction where AI + Web3 can easily land, but a project must clearly understand the true value that this platform can provide and how to onboard a large number of users in the early stages. The key lies in finding a balance of interests among all parties while properly handling factors such as data ownership, model quality, user privacy, computing power, and incentive algorithms, ultimately becoming a platform for sharing and trading data, models, and computing power.

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
ChainCatcher Building the Web3 world with innovators