In January, it surged by 400%. What exactly is the AI + Crypto dark horse Bittensor?
Author: Knower
Compiled by: Luffy, Foresight News
Friends, long time no see. I hope all of you have been enjoying some positive price movements in the cryptocurrency space recently. To reap substantial rewards, I decided to write a formal report on the AI crypto project Bittensor. I am not an expert in cryptocurrency, so you might think I am not very familiar with AI. However, in reality, I have spent a significant amount of my free time researching AI outside of cryptocurrency, and I have been familiarizing myself with important updates, advancements, and existing infrastructure in the AI field over the past 3-4 months.
Despite some inaccuracies and lack of analytical tweets, I still want to clarify the facts. After reading this article, your understanding of Bittensor will far exceed expectations. This is a somewhat lengthy report, and I am not intentionally piling on words; it is largely due to the numerous images and screenshots. Please do not input the article into ChatGPT for a summary; I have invested a lot of time in this content, and you cannot grasp the whole story in that way.
As I was writing this report, a friend (who happens to be a Crypto Twitter KOL) told me, "AI + Cryptocurrency = The Future of Finance." Keep this in mind as you read through this article.
Is AI the final piece of the puzzle for cryptocurrency technology to begin dominating the world, or is it just a small step towards achieving our goals? The answer is for you to find out; I am just providing some food for thought.
Background Information
In Bittensor's words, Bittensor is essentially a language for writing numerous decentralized commodity markets or "sub-networks" under a unified token system, aimed at directing the power of digital markets towards the most important digital commodity in society—artificial intelligence.
Bittensor's mission is to establish a decentralized network that can compete with models that "only giants like OpenAI could achieve" through unique incentive mechanisms and advanced sub-network architecture. It is best to think of Bittensor as a complete system made up of interoperable components, a machine built on blockchain to better facilitate the proliferation of AI capabilities on-chain.
There are two key participants in managing the Bittensor network: miners and validators. Miners are individuals who submit pre-trained models to the network in exchange for a share of rewards. Validators are responsible for confirming the validity and accuracy of these model outputs and selecting the most accurate outputs to return to users. For example, when a Bittensor user requests an AI chatbot to answer simple questions related to derivatives or historical facts, this question will be answered regardless of how many nodes are currently running in the Bittensor network.
The steps for users to interact with the Bittensor network are simply described as follows: users send queries to validators, validators propagate them to miners, and then validators rank the outputs from miners, with the highest-ranked miner's output being sent back to the user.
It's all very straightforward.
Through incentives, models typically provide the best outputs, and Bittensor has created a positive feedback loop where miners compete with each other, introducing more refined, accurate, and higher-performing models to gain a larger share of TAO (the Bittensor ecosystem token) and promote a more positive user experience.
To become a validator, users must be among the top 64 holders of TAO and have registered a UID on any sub-network of Bittensor (which provides access to independent economic markets for various forms of AI). For example, Sub-network 1 focuses on label prediction through text prompts, while Sub-network 5 focuses on image generation. These two sub-networks may use different models because their tasks are entirely different and may require different parameters, precision, and other specific features.
Another key aspect of the Bittensor architecture is the Yuma Consensus mechanism, which is similar to the CPU allocation of available Bittensor resources across the sub-network. Yuma is described as a hybrid of PoW and PoS, with additional functionality for transmitting and facilitating intelligence off-chain. While Yuma supports most of Bittensor's network, sub-networks can choose to join or not rely on Yuma consensus. The specifics are complex and vague, and there are various sub-networks and corresponding GitHub repositories, so if you just want a rough understanding, knowing about Yuma consensus's top-down approach will suffice.
But what about the models?
Contrary to popular belief, Bittensor does not train models itself. This is an extremely expensive process that only larger AI labs or research organizations can afford, and it can take a long time. I tried to make an absolute determination about whether model training is included in Bittensor, but my only finding was inconclusive.
The decentralized training mechanism sounds a bit convoluted, but it is not difficult to understand. The task of Bittensor validators is to "evaluate the continuous game of models generated by miners on the Falcon Refined Web 6T token unlabelled dataset," scoring each miner based on two criteria (timestamp and loss relative to other models). The loss function is a machine learning term that describes the difference between predicted values and actual values in a certain type of simulation, representing the degree of error or inaccuracy of the model output given the input data.
Regarding the loss function, here is the latest performance of sn9 (the relevant sub-network) that I obtained from Discord yesterday; keep in mind that lower loss does not necessarily mean average loss:
"If Bittensor itself does not train models, what else can it do?!"
In fact, the "creation" process of large language models (LLMs) is divided into three key stages: training, fine-tuning, and contextual learning (adding a bit of reasoning).
Before we proceed with some basic definitions, let's take a look at the Sequoia Capital report on LLMs from June 2023 report, which found that "typically, aside from using LLM APIs, 15% of companies build custom language models from scratch or based on open-source libraries. Custom model training has significantly increased compared to a few months ago. This requires its own compute stack, model hub, hosting, training frameworks, experiment tracking, etc., sourced from popular companies like Hugging Face, Replicate, Foundry, Tecton, Weights & Biases, PyTorch, Scale, etc."
Building a model from scratch is a daunting task, and 85% of surveyed founders and teams are reluctant to undertake it. When most startups and independent developers only want to leverage large language models in external applications or software-based services, the workload of self-hosting, tracking results, creating or importing complex training scenarios, and various other tasks is overwhelming. For 99% of people in the AI industry, creating something comparable to GPT-4 or Llama 2 is not feasible.
This is why platforms like Hugging Face are so popular, as you can download pre-trained models from their website, a process that is very familiar and common for people in the AI industry.
Fine-tuning is more challenging but suitable for those who want to provide applications or services based on large language models in specific niche areas. This could be a chatbot developed by a legal services startup, with its model fine-tuned based on various specific data and examples from lawyers, or a biotechnology startup developing a model that is fine-tuned based on potentially existing biotechnology-related information.
Regardless of the purpose, fine-tuning is meant to further infuse personality or expertise into your model, making it more suitable and accurate for performing tasks. While it is undeniably useful and more customizable, everyone finds it difficult, and even a16z thinks so:
Although Bittensor does not actually train models, miners who submit their models to the network claim to fine-tune the models in some form, although this information is not publicly disclosed (or at least difficult to verify). Miners keep their model structures and functionalities confidential to protect their competitive advantage, although some are accessible.
Let's take a simple example: if you are participating in a competition with a $1 million prize, where everyone is competing for who has the best-performing LLM, would you reveal that you are using GPT-4 if all your competitors are using GPT-2? While the reality is more complex than this example suggests, it is not much different. Miners are rewarded based on the accuracy of their outputs, which gives them an advantage over miners with less fine-tuned models or average-performing models.
I mentioned contextual learning earlier, which may be the last part of non-Bittensor information I will introduce, but contextual learning is a broadly defined process used to guide language models to achieve more desirable outputs. Reasoning is the process that models continuously undergo when evaluating inputs, and the training results may affect the accuracy of output labels. While training is costly, it only occurs when the model is ready to reach the training level specified by the team during the model creation process. Reasoning is always happening and utilizes various additional services to facilitate the reasoning process.
Current State of Bittensor
With the background knowledge established, I will explore some details regarding the performance of Bittensor sub-networks, current capabilities, and future plans. To be honest, it is challenging to find high-quality articles on this topic. Fortunately, some members of the Bittensor community sent me information, but even so, forming an opinion requires a lot of work. I lurked in their Discord looking for answers, during which I realized that I had been a member for about a month but had not checked any channels (I never use Discord, preferring Telegram and Slack).
In any case, I decided to see what Bittensor's original vision was, and I found the following in previous reports:
I will introduce it in the next few paragraphs, but the theory of composability does not hold. There has been some research on this topic, and the previous screenshot comes from the same report that defines the Bittensor network as a sparse mixture model (a concept proposed in a 2017 research paper).
Bittensor has many sub-networks, enough for me to feel it necessary to dedicate an entire section in this report to them. Whether you believe it or not, despite their critical importance to the network's practicality and supporting all technologies, there is no dedicated section on Bittensor's website to introduce these and how they operate. I even asked on Twitter, but it seems that the mysteries of sub-networks can only be understood by those who hang out in Discord for hours and learn about each sub-network's operations on their own. Despite the daunting task ahead of me, I still did some work.
Sub-network 1 (commonly abbreviated as sn1) is the largest sub-network in the Bittensor network, responsible for text generation services. Among the top 10 validators in sn1 (I used the same top 10 ranking for other sub-networks), there are about 4 million TAO staked, followed by sn5 (responsible for image generation), which has about 3.85 million TAO staked. By the way, all this data can be found on TaoStats.
The multimodal sub-network (sn4) has about 3.4 million TAO, sn3 (data scraping) has about 3.4 million TAO, and sn2 (multimodal) has about 3.7 million TAO. Another rapidly growing sub-network is sn11, responsible for text training, with a TAO staked amount similar to that of sn1.
In terms of miner and validator activity, sn1 is also the absolute leader, with over 40/128 active validators and 991/1024 active miners. Sn11 actually has the most miners of all sub-networks, with 2017/2048. The following chart describes the registration costs of sub-networks over the past month and a half:
Currently, the cost to register a sub-network is 182.12 TAO, a significant drop from the peak of 7,800 TAO in October, although I am not entirely sure if this number is accurate. In any case, with over 22 registered sub-networks and Bittensor gaining increasing attention, we are likely to see more sub-networks registered in due time. Some of these sub-networks seem to take a while to gain traction.
Regarding other sub-networks, sn9 is a cool sub-network specifically for training:
Here is a description of the Bittensor scraping sub-network:
The sub-network models are very unique, exemplifying a common technique in machine learning research known as mixture of experts (MoE), where models are divided into multiple parts and provided with individual labels instead of being assigned an entire task. This is interesting to me because Bittensor is not a unified model; it is actually a network of models that query in a semi-random manner. BitAPI is an example of this process, a product built on sn1 that randomly samples the top 10 miners for inbound user queries. While there may be dozens or even hundreds of miners in any given sub-network, the best-performing models receive more rewards.
Currently, combining multiple models or composing multiple models to increase or "stack" functionality is not feasible; this is not how large language models operate. I tried to reason with community members, but I think it is important to note that, for now, Bittensor is not an example of a collection of unified models but rather a network of models with different functionalities.
Some people compare Bittensor to on-chain oracles that provide access to ML models. Bittensor separates the core logic of the blockchain from the validation of sub-networks, running models off-chain to accommodate more data and higher computational costs, thus enabling more powerful models. You may recall that the only process completed on-chain is reasoning. See below for Bittensor's explanation:
I think many in the community are overly focused on trying to convince everyone that Bittensor will change the world, while in reality, they are just making progress in changing the way AI and cryptocurrency interact. They are unlikely to transform the entire network of miners uploading models into an extremely intelligent supercomputer—this is not how machine learning works. Even the best-performing and most expensive available models are years away from meeting the definition of artificial general intelligence (AGI).
As the machine learning community continues to iterate and implement new features, the definition of AGI often varies, but the basic idea is that AGI can reason, think, and learn entirely like a human. The core dilemma lies in the fact that scientists classify humans as beings with consciousness and free will, which is difficult to quantify in humans, let alone in powerful neural network systems.
For now, sub-networks are a unique way to break down various tasks related to AI-based applications, and the community and teams are responsible for attracting builders who hope to leverage these core functionalities of the Bittensor network.
It is also worth noting that Bittensor is highly efficient in the machine learning field outside of cryptocurrency. Opentensor and Cerebras released the BTLM-3b-8k open-source LLM as early as July this year. Since then, BTLM has been downloaded over 16,000 times on Hugging Face and has received very positive reviews.
Some have stated that due to BTLM's lightweight architecture, BTLM-3b ranks high in the same category as Mistral-7b and MPT-30b, becoming "the best model per VRAM." Below is a chart from the same tweet listing the models and their data accessibility ratings, with BTLM-3b receiving a good score:
I mentioned on Twitter that Bittensor has not done anything to accelerate AI research, so I think it is only right to acknowledge my mistake here. Additionally, I have heard that BTLM-3b is used for validation in some cases because it is inexpensive and runs quickly on most hardware.
Uses of TAO
Don't worry, I haven't forgotten about the token.
Bittensor draws heavily from Bitcoin's inspiration while also incorporating very similar tokenomics from the OG playbook, with a maximum of 21 million TAO and a halving mechanism every 10.5 million blocks. As of the writing of this article, the circulating supply of TAO is approximately 5.6 million, with a market cap of nearly $800 million. The distribution of TAO is considered extremely fair, and this Bittensor report notes that early supporters did not receive any tokens, although it is difficult to verify the truth, we trust our sources.
TAO serves both as a reward token for the Bittensor network and as an access token for the Bittensor network, allowing TAO holders to stake, participate in governance, or use their TAO to build applications on the Bittensor network. One TAO is minted every 12 seconds, with newly minted tokens being evenly distributed to miners and validators.
In my view, the tokenomics of TAO easily envision a world where the reduced release due to halving leads to intensified competition among miners, naturally resulting in higher quality models and a better overall user experience. However, there is also the issue that fewer rewards may have the opposite effect and fail to attract fierce competition, instead leading to stagnation in the number of deployed models or competing miners.
I could continue discussing the utility of TAO, price prospects, and growth drivers, but the previously mentioned report does a pretty good job in this regard. Most of Crypto Twitter has identified a very reliable narrative behind Bittensor and TAO, and any additional content I add at this point would not further enhance that. From an external perspective, I would say that this is quite reasonable tokenomics, with nothing unusual. However, I should mention that currently, purchasing TAO is very difficult, as it has not yet been listed on most exchanges. This situation may change in 1-2 months, and I would be very surprised if Binance does not list TAO soon.
Outlook
I am definitely a fan of Bittensor and hope they can achieve their bold mission. As the team stated in the Bittensor Paradigm article, Bitcoin and Ethereum are revolutionary because they democratized access to finance and made the idea of completely permissionless digital markets a reality. Bittensor is no exception, aiming to democratize AI models within a vast intelligent network. Despite my support, it is clear that they are far from achieving their desired goals, which is true for most projects built on cryptocurrency. This is a marathon, not a sprint.
If Bittensor wants to stay ahead, they need to continue fostering friendly competition and innovation among miners while expanding the possibilities of sparse mixture model architecture, MoE concepts, and the decentralized composition of intelligence. Accomplishing all this alone is already challenging enough, and incorporating cryptocurrency technology into it makes it even more difficult.
The road ahead for Bittensor is still long. Although discussions around TAO have increased in recent weeks, I believe that most of the crypto community does not fully understand how Bittensor currently operates. There are some obvious questions without simple solutions, some of which are: a) Is high-quality large-scale reasoning achievable? b) How to attract users? and c) Does pursuing the goal of composite large language models make sense?
Whether you believe it or not, supporting the narrative of decentralized currency is actually quite a challenge, although rumors of ETFs make it a bit easier.
Building a decentralized network composed of intelligent models that can iterate and learn from each other sounds incredible, partly because it is indeed true. In the current context window and the limitations of large language models, it is impossible for a model to self-improve over and over again until it reaches the level of AGI; even the best models are still limited. Nevertheless, I believe that building Bittensor as a decentralized LLM hosting platform with novel economic incentives and built-in composability is not only positive but is actually one of the coolest experiments currently in the crypto space.
Incorporating economic incentives into AI systems poses challenges, and Bittensor states that if miners or validators attempt to game the system in any form, they will adjust the incentive mechanisms accordingly. Here is an example from June of this year, where token release was reduced by 90%:
This is entirely predictable in blockchain systems, so let’s not pretend that Bitcoin or Ethereum have been 100% perfect throughout their entire lifecycles.
For outsiders, the adoption of cryptocurrency has historically been a hard pill to swallow, and AI is similarly controversial, even more so. Combining the two presents challenges for anyone trying to maintain user growth and activity, which takes time. If Bittensor ultimately achieves its goal of composite large language models, it could be a profoundly significant achievement.