Sora emerges, could 2024 be the transformative year for AI + Web3?

YBB Capital
2024-02-23 17:06:53
Collection
Exploring the future of the fusion of AI and Web3: decentralized computing power, big data, Dapp innovation, and their profound impact on industrial transformation.

Author: YBB Capital Zeke

Introduction

On February 16, OpenAI announced the latest text-controlled video generation diffusion model "Sora," showcasing another milestone moment for generative AI through high-quality generated videos covering a wide range of visual data types. Unlike AI video generation tools like Pika, which are still in the stage of generating a few seconds of video from multiple images, Sora achieves scalable video generation by training in the compressed latent space of videos and images, breaking it down into spatiotemporal patches. Additionally, the model demonstrates the ability to simulate both the physical and digital worlds, with the final 60-second demo aptly described as a "universal simulator of the physical world."

In terms of construction, Sora continues the technical path of the previous GPT model "source data - Transformer - Diffusion - emergence," which means its mature development also requires computing power as an engine. Furthermore, the amount of data required for video training is significantly larger than that for text training, further increasing the demand for computing power.

However, we have already explored the importance of computing power in the AI era in our earlier article "Prospects for Potential Tracks: Decentralized Computing Power Market." With the recent surge in AI popularity, a large number of computing power projects have begun to emerge in the market, and other Depin projects (such as storage and computing power) that benefit passively have also experienced a wave of growth. So, aside from Depin, what other sparks can the intersection of Web3 and AI ignite? What opportunities are hidden within this track? The main purpose of this article is to update and complement previous writings and to contemplate the possibilities of Web3 in the AI era.

Three Major Directions in AI Development

Artificial Intelligence (AI) is an emerging science and technology aimed at simulating, extending, and enhancing human intelligence. Since its inception in the 1950s and 1960s, AI has become an important technology driving changes in social life and various industries after more than half a century of development. In this process, the interweaving development of three major research directions—symbolism, connectionism, and behaviorism—has become the cornerstone of today's rapid advancement in AI.

Symbolism

Also known as logicism or rule-based systems, symbolism posits that simulating human intelligence through the processing of symbols is feasible. This approach represents and manipulates objects, concepts, and their interrelations within a problem domain using symbols, and employs logical reasoning to solve problems, achieving significant success particularly in expert systems and knowledge representation. The core idea of symbolism is that intelligent behavior can be realized through the manipulation of symbols and logical reasoning, where symbols represent a high abstraction of the real world.

Connectionism

Also known as neural network methods, connectionism aims to achieve intelligence by mimicking the structure and function of the human brain. This method constructs a network composed of numerous simple processing units (similar to neurons) and learns by adjusting the connection strengths between these units (similar to synapses). Connectionism particularly emphasizes the ability to learn and generalize from data, making it especially suitable for pattern recognition, classification, and continuous input-output mapping problems. Deep learning, as a development of connectionism, has made breakthroughs in fields such as image recognition, speech recognition, and natural language processing.

Behaviorism

Behaviorism is closely related to the study of bionic robotics and autonomous intelligent systems, emphasizing that intelligent agents can learn through interaction with their environment. Unlike the first two approaches, behaviorism does not focus on simulating internal representations or thought processes, but rather achieves adaptive behavior through a cycle of perception and action. Behaviorism posits that intelligence is exhibited through dynamic interaction and learning with the environment, making this approach particularly effective in mobile robots and adaptive control systems that need to operate in complex and unpredictable environments.

Despite the essential differences among these three research directions, they can interact and integrate in practical AI research and applications, collectively driving the development of the AI field.

Overview of AIGC Principles

Currently experiencing explosive growth, generative AI (Artificial Intelligence Generated Content, AIGC) is an evolution and application of connectionism, capable of mimicking human creativity to generate novel content. These models are trained using large datasets and deep learning algorithms to learn the underlying structures, relationships, and patterns present in the data. Based on user input prompts, they generate novel and unique outputs, including images, videos, code, music, designs, translations, question answering, and text. The current AIGC is fundamentally composed of three elements: deep learning (DL), big data, and large-scale computing power.

Deep Learning

Deep learning is a subfield of machine learning (ML), where deep learning algorithms are modeled after the human brain. For example, the human brain contains millions of interconnected neurons that work together to learn and process information. Similarly, deep learning neural networks (or artificial neural networks) consist of multiple layers of artificial neurons that work together within a computer. Artificial neurons are software modules called nodes that use mathematical computations to process data. Artificial neural networks use these nodes to solve complex problems.

Hierarchically, neural networks can be divided into input layers, hidden layers, and output layers, with parameters connecting different layers.

  • Input Layer: The input layer is the first layer of the neural network, responsible for receiving external input data. Each neuron in the input layer corresponds to a feature of the input data. For example, when processing image data, each neuron may correspond to a pixel value of the image.
  • Hidden Layer: The hidden layers process the data received from the input layer and pass it to further layers in the neural network. These hidden layers process information at different levels and adjust their behavior when receiving new information. Deep learning networks can have hundreds of hidden layers to analyze problems from multiple perspectives. For instance, if you receive an image of an unknown animal that needs to be classified, you can compare it to animals you already know by examining features like ear shape, leg count, and pupil size to determine what kind of animal it is. The hidden layers in a deep neural network work in the same way. If a deep learning algorithm attempts to classify an animal image, each hidden layer processes different features of the animal and tries to classify it accurately.
  • Output Layer: The output layer is the final layer of the neural network, responsible for generating the network's output. Each neuron in the output layer represents a possible output category or value. For example, in a classification problem, each output layer neuron may correspond to a category, while in a regression problem, the output layer may have only one neuron, whose value represents the predicted result.
  • Parameters: In a neural network, the connections between different layers are represented by weights and biases, which are optimized during training to enable the network to accurately recognize patterns in the data and make predictions. Increasing the number of parameters can enhance the model capacity of the neural network, meaning the model's ability to learn and represent complex patterns in the data. However, increasing parameters also raises the demand for computing power.

Big Data

To train effectively, neural networks typically require large, diverse, high-quality, and multi-source data. It forms the foundation for training and validating machine learning models. By analyzing big data, machine learning models can learn patterns and relationships within the data, enabling predictions or classifications.

Large-Scale Computing Power

The multi-layered complex structure of neural networks, the large number of parameters, the demand for big data processing, the iterative training method (during training, the model needs to iterate repeatedly, performing forward and backward propagation for each layer, including calculations for activation functions, loss functions, gradients, and weight updates), high-precision computation requirements, parallel computing capabilities, optimization and regularization techniques, and the model evaluation and validation process all contribute to the demand for high computing power.

Sora

As OpenAI's latest video generation AI model, Sora represents a significant advancement in AI's ability to process and understand diverse visual data. By employing video compression networks and spatiotemporal patch technology, Sora can transform vast amounts of visual data captured from various devices around the world into a unified representation, enabling efficient processing and understanding of complex visual content. Leveraging a text-conditioned diffusion model, Sora can generate videos or images that closely match text prompts, demonstrating high levels of creativity and adaptability.

However, despite Sora's breakthroughs in video generation and simulating interactions with the real world, it still faces some limitations, including the accuracy of physical world simulation, consistency in generating long videos, understanding complex text instructions, and training and generation efficiency. Moreover, Sora essentially continues the "big data - Transformer - Diffusion - emergence" technical path through OpenAI's monopolistic computing power and first-mover advantage, achieving a form of brutal aesthetics, while other AI companies still have the potential to overtake through technological innovation.

Although Sora has little direct relation to blockchain, I personally believe that in the next year or two, its influence will compel the emergence and rapid development of other high-quality AI generation tools, which will extend to multiple tracks within Web3, such as GameFi, social platforms, creative platforms, and Depin. Therefore, having a general understanding of Sora is necessary, as how future AI will effectively integrate with Web3 may be a key point for us to consider.

Four Paths of AI x Web3

As mentioned above, we can see that the underlying foundation required for generative AI consists of three main points: algorithms, data, and computing power. On the other hand, from the perspective of versatility and generative effects, AI is a tool that disrupts production methods. The two main roles of blockchain are to reconstruct production relationships and decentralize. Therefore, I personally believe that the collision of the two can generate the following four paths:

Decentralized Computing Power

Since I have previously written articles on this topic, the main purpose of this section is to update the current situation of the computing power track. When discussing AI, computing power is always an unavoidable aspect. The demand for computing power in AI has become unimaginable since the birth of Sora. Recently, during the 2024 World Economic Forum in Davos, Switzerland, OpenAI CEO Sam Altman explicitly stated that computing power and energy are the biggest shackles at this stage, and their importance in the future may even equal that of currency. On February 10, Sam Altman published an astonishing plan on Twitter to raise $7 trillion (equivalent to 40% of China's national GDP in 2023) to rewrite the current global semiconductor industry landscape and establish a chip empire. When writing articles related to computing power, my imagination was still limited to national blockades and monopolistic giants; now, one company aiming to control the global semiconductor industry seems quite crazy.

Thus, the importance of decentralized computing power is self-evident. The characteristics of blockchain can indeed address the current extreme monopoly of computing power and the high costs of purchasing dedicated GPUs. From the perspective of AI needs, the use of computing power can be divided into two directions: inference and training. Currently, there are very few projects focused on training, as the decentralized network needs to integrate with neural network design and has extremely high hardware demands, making it a high-barrier and difficult-to-implement direction. In contrast, inference is relatively simpler; on one hand, the decentralized network design is not complex, and on the other hand, the hardware and bandwidth requirements are lower, making it a more mainstream direction at present.

The imagination space of the centralized computing power market is enormous, often associated with the keyword "trillion-level," and it is also a topic that is easily hyped in the AI era. However, from the recent surge of projects, most belong to the category of hastily assembled efforts to ride the wave. They often raise the banner of decentralization while remaining silent on the inefficiencies of decentralized networks. Additionally, there is a high degree of homogeneity in design, with many projects being very similar (one-click L2 plus mining design), which may ultimately lead to chaos. In such a situation, it is indeed difficult to carve out a share from the traditional AI track.

Algorithm and Model Collaboration Systems

Machine learning algorithms are those that can learn patterns and rules from data and make predictions or decisions based on them. Algorithms are technology-intensive, as their design and optimization require deep expertise and technological innovation. Algorithms are the core of training AI models, defining how data is transformed into useful insights or decisions. Common generative AI algorithms include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformers, each designed for a specific domain (such as painting, language recognition, translation, or video generation) or purpose, and then trained into specialized AI models.

With so many algorithms and models, each with its strengths, can we integrate them into a model that is both versatile and powerful? Recently, the highly popular Bittensor has emerged as a leader in this direction, incentivizing collaboration and learning among different AI models and algorithms through mining, thereby creating more efficient and versatile AI models. Other projects with a similar focus include Commune AI (code collaboration), but algorithms and models are considered proprietary treasures by current AI companies and are not easily shared.

Thus, the narrative of an AI collaboration ecosystem is intriguing and novel, utilizing the advantages of blockchain to integrate the disadvantages of isolated AI algorithms. However, whether it can create corresponding value remains uncertain. After all, leading AI companies have strong capabilities for updating and integrating their closed-source algorithms and models. For instance, OpenAI has evolved from early text generation models to multi-domain generative models in less than two years, while projects like Bittensor may need to find alternative paths in the fields targeted by their models and algorithms.

Decentralized Big Data

From a simple perspective, using private data to feed AI and labeling data aligns very well with blockchain, as it only requires attention to prevent junk data and malicious actions, and data storage can also benefit Depin projects like FIL and AR. From a more complex perspective, using blockchain data for machine learning (ML) to address the accessibility of blockchain data is also an interesting direction (one of Giza's exploratory directions).

In theory, blockchain data is accessible at any time, reflecting the state of the entire blockchain. However, for those outside the blockchain ecosystem, obtaining this vast amount of data is not easy. Fully storing a blockchain requires extensive expertise and a significant amount of specialized hardware resources. To overcome the challenges of accessing blockchain data, several solutions have emerged in the industry. For example, RPC providers access nodes via APIs, while indexing services make data extraction possible through SQL and GraphQL, both playing key roles in solving the problem. However, these methods have limitations. RPC services are not suitable for high-density use cases that require extensive data queries and often fail to meet demands. Meanwhile, although indexing services provide a more structured way to retrieve data, the complexity of Web3 protocols makes building efficient queries extremely challenging, sometimes requiring hundreds or even thousands of lines of complex code. This complexity poses a significant barrier for general data practitioners and those who do not have a deep understanding of Web3 details. The cumulative effect of these limitations highlights the need for a more accessible and usable method for blockchain data, which can promote broader applications and innovations in the field.

Thus, combining ZKML (Zero-Knowledge Proof Machine Learning, which reduces the burden of machine learning on the chain) with high-quality blockchain data may create datasets that solve the accessibility of blockchain data. AI can significantly lower the barriers to accessing blockchain data, allowing developers, researchers, and enthusiasts in the ML field to access more high-quality, relevant datasets over time, facilitating the construction of effective and innovative solutions.

AI Empowering Dapps

Since the explosive popularity of ChatGPT3 in 2023, AI empowering Dapps has become a very common direction. The versatile generative AI can be integrated through APIs to simplify and intelligently analyze data platforms, trading bots, blockchain encyclopedias, and other applications. On the other hand, it can also serve as chatbots (like Myshell) or AI companions (Sleepless AI), and even create NPCs in blockchain games through generative AI. However, due to the low technical barriers, most projects simply involve integrating an API and making minor adjustments, resulting in imperfect combinations with the projects themselves, which is why they are rarely mentioned.

However, with the arrival of Sora, I personally believe that the direction of AI empowering GameFi (including the metaverse) and creative platforms will be a focus of attention in the near future. Given the bottom-up nature of the Web3 field, it is certainly challenging to produce products that can compete with traditional games or creative companies, and the emergence of Sora may very well break this predicament (perhaps in just two to three years). Based on Sora's demo, it already possesses the potential to compete with micro-drama companies, and the vibrant community culture of Web3 can generate a wealth of interesting ideas. When the only limitation is imagination, the barriers between bottom-up industries and top-down traditional industries will be dismantled.

Conclusion

With the continuous advancement of generative AI tools, we will experience more epoch-making "iPhone moments" in the future. Although many people scoff at the combination of AI and Web3, I believe that the current directions are mostly sound, and the pain points that need to be addressed boil down to three: necessity, efficiency, and compatibility. The integration of the two, while still in the exploratory stage, does not prevent this track from becoming mainstream in the next bull market.

Maintaining sufficient curiosity and openness towards new things is the mindset we need to possess. Historically, the transition from horse-drawn carriages to automobiles was a foregone conclusion, just as holding too many biases against inscriptions and past NFTs will only cause us to miss opportunities.

Related tags
ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
banner
ChainCatcher Building the Web3 world with innovators