Vana: Let your data flow freely like tokens in the AI era to create value
Written by: Thinking Oddly
Have you ever wondered why social media platforms like Reddit and X (formerly Twitter) can be used for free? The answer is actually hidden in the posts you make every day, the likes you give, and even the time you spend scrolling.
In the past, these platforms sold your attention as a commodity to advertisers. Today, they have found a bigger buyer—AI companies. Reports indicate that a single data licensing agreement between Reddit and Google can bring in $60 million annually for the former. Yet, this vast wealth has nothing to do with you and me as data creators.
What’s even more unsettling is that the AI trained on our data may replace our jobs in the future. While AI may also create new job opportunities, the wealth concentration effect brought about by this data monopoly undoubtedly exacerbates social inequality. We seem to be sliding into a cyberpunk world controlled by a few tech giants.
So, as ordinary people, how can we protect our interests in this AI era? After the rise of AI, many have viewed blockchain as humanity's last line of defense against AI. Based on this thinking, some innovators have begun to explore solutions. They propose that first, we must reclaim ownership and control over our data; second, we should use this data to collaboratively train an AI model that truly serves ordinary people.
This idea may seem idealistic, but history tells us that every technological revolution begins with a "crazy" concept. Today, a new public chain project called "Vana" is turning this concept into reality. As the first decentralized data liquidity network, Vana aims to transform your data into freely circulating tokens, thereby promoting a truly user-controlled decentralized AI.
Founders of Vana and the Origin of the Project
In fact, the birth of Vana can be traced back to a classroom at the Massachusetts Institute of Technology (MIT) Media Lab, where two young individuals with dreams of changing the world—Anna Kazlauskas and Art Abal—met.
Left: Anna Kazlauskas; Right: Art Abal
Anna Kazlauskas majored in computer science and economics at MIT, and her interest in data and cryptocurrency dates back to 2015. At that time, she was involved in early Ethereum mining, which gave her a profound understanding of the potential of decentralized technology. Subsequently, Anna conducted data research at international financial institutions such as the Federal Reserve, European Central Bank, and World Bank, which made her realize that in the future world, data would become a new form of currency.
Meanwhile, Art Abal was pursuing a master's degree in public policy at Harvard University and conducting in-depth research on data impact assessment at the Belfer Center for Science and International Affairs. Before joining Vana, Art led innovative data collection methods at the AI training data provider Appen, which significantly contributed to the birth of many generative AI tools today. His insights into data ethics and AI accountability infused Vana with a strong sense of social responsibility.
When Anna and Art met in a course at the MIT Media Lab, they quickly discovered their shared passion for data democratization and user data rights. They realized that to truly address the issues of data ownership and AI fairness, a new paradigm was needed—a system that would allow users to genuinely control their own data.
It was this shared vision that prompted them to co-found Vana. Their goal is to create a revolutionary platform that not only fights for data sovereignty for users but also ensures that users can derive economic benefits from their own data. Through innovative mechanisms like the Data Liquidity Pool (DLP) and Proof of Contribution system, Vana enables users to securely contribute private data and collectively own and benefit from the AI models trained on this data, thus promoting user-led AI development.
Vana's vision has quickly gained recognition in the industry. To date, Vana has announced that it has completed a total of $25 million in funding, including a $5 million strategic round led by Coinbase Ventures, an $18 million Series A round led by Paradigm, and a $2 million seed round led by Polychain. Other notable investors include Casey Caruso, Packy McCormick, Manifold, GSR, DeFiance Capital, and more.
In a world where data is the new oil, the emergence of Vana undoubtedly provides us with an important opportunity to reclaim data sovereignty. So, how does this promising project operate? Let’s delve into Vana's technical architecture and innovative concepts.
Vana's Technical Architecture and Innovative Concepts
Vana's technical architecture can be described as a meticulously designed ecosystem aimed at achieving data democratization and maximizing value. Its core components include Data Liquidity Pools (DLP), Proof of Contribution mechanisms, Nagoya Consensus, user self-hosted data, and a decentralized application layer. These elements collectively build an innovative platform that protects user privacy while unlocking the potential value of data.
- Data Liquidity Pool (DLP): The Foundation of Data Valuation
The Data Liquidity Pool is the fundamental unit of the Vana network, akin to "liquidity mining" for data. Each DLP is essentially a smart contract specifically designed to aggregate certain types of data assets. For example, the Reddit Data DAO (r/datadao) is a successful DLP case that has attracted over 140,000 Reddit users, aggregating users' Reddit posts, comments, and voting history.
After users submit data to the DLP, they can earn specific token rewards for that DLP, such as the RDAT token for the Reddit Data DAO (r/datadao). These tokens not only represent the user's contribution to the data pool but also grant users governance rights and future profit-sharing rights for the DLP. Notably, Vana allows each DLP to issue its own tokens, providing a more flexible value capture mechanism for different types of data assets.
In Vana's ecosystem, the top 16 DLPs can also receive additional VANA token emission rewards, further stimulating the formation and competition of high-quality data pools. In this way, Vana cleverly transforms scattered personal data into liquid digital assets, laying the groundwork for data valuation and liquidity.
- Proof of Contribution: Accurate Measurement of Data Value
Proof of Contribution is the key mechanism that Vana employs to ensure data quality. Each DLP can customize a unique Proof of Contribution function based on its characteristics. This function not only verifies the authenticity and integrity of the data but also assesses the contribution of the data to the performance improvement of AI models.
Taking the ChatGPT Data DAO as an example, its Proof of Contribution encompasses four key dimensions: authenticity, ownership, quality, and uniqueness. Authenticity is ensured by verifying the data export links provided by OpenAI; ownership is validated through users' email verification; quality assessment is conducted using LLM to score randomly sampled conversations; and uniqueness is determined by calculating the feature vectors of the data and comparing them with existing data.
This multi-dimensional assessment ensures that only high-quality, valuable data is accepted and rewarded. Proof of Contribution serves not only as the basis for data pricing but also as a key guarantee for maintaining the overall data quality of the ecosystem.
- Nagoya Consensus: Decentralized Data Quality Assurance
Nagoya Consensus is the heart of the Vana network, drawing inspiration from and improving upon Bittensor's Yuma Consensus. The core idea of this mechanism is to collectively assess data quality through a group of validating nodes, using a weighted average to arrive at a final score.
More innovatively, validating nodes are required not only to assess data but also to score the rating behaviors of other validating nodes. This "dual-layer assessment" mechanism greatly enhances the fairness and accuracy of the system. For example, if a validating node gives a high score to obviously low-quality data, other nodes will penalize this misconduct with a punitive score.
Every 1800 blocks (approximately 3 hours) constitutes a cycle, during which the system allocates corresponding rewards to validating nodes based on the comprehensive scores from that period. This mechanism not only incentivizes validating nodes to remain honest but also quickly identifies and eliminates bad behavior, thereby maintaining the healthy operation of the entire network.
- Non-Custodial Data Storage: The Last Line of Privacy Protection
One of Vana's significant innovations lies in its unique data management approach. In the Vana network, users' raw data is never truly "on-chain"; instead, users choose their storage locations, such as Google Drive, Dropbox, or even personal servers running on their Macbooks.
When users submit data to the DLP, they are actually providing a URL pointing to the encrypted data and an optional content integrity hash. This information is recorded in Vana's data registration contract. When validators need to access the data, they request a decryption key, then download and decrypt the data for verification.
This design cleverly addresses the issues of data privacy and control. Users always maintain complete control over their data while being able to participate in the data economy. This not only ensures the security of the data but also opens up possibilities for broader future data application scenarios.
- Decentralized Application Layer: Diversified Realization of Data Value
At the top of Vana is an open application ecosystem. Here, developers can utilize the data liquidity accumulated by DLPs to build various innovative applications, while data contributors can derive actual economic value from these applications.
For example, a development team might train a specialized AI model based on the data from the Reddit Data DAO. Users who contributed data can not only use the model once training is complete but also receive a share of the profits generated by the model according to their contribution ratio. In fact, such AI models have already been developed; for details, you can read “Why the Old Token r/datadao is Reviving After Hitting Bottom?”.
This model not only incentivizes more contributions of high-quality data but also creates a truly user-led AI development ecosystem. Users transition from mere data providers to co-owners and beneficiaries of AI products.
In this way, Vana is reshaping the landscape of the data economy. In this new paradigm, users shift from passive data providers to active participants and co-beneficiaries in building the ecosystem. This not only creates new channels for individuals to capture value but also injects new vitality and innovative momentum into the entire AI industry.
Vana's technical architecture not only addresses core issues in the current data economy, such as data ownership, privacy protection, and value distribution, but also paves the way for future data-driven innovations. As more data DAOs join the network and more applications are built on the platform, Vana has the potential to become the infrastructure for the next generation of decentralized AI and data economy.
Satori Testnet: Vana's Public Testing Ground
With the launch of the Satori testnet on June 11, Vana showcased the prototype of its ecosystem to the public. This platform serves not only as a technical validation site but also as a rehearsal for future mainnet operation modes. Currently, the Vana ecosystem offers participants three main pathways: running DLP validating nodes, creating new DLPs, or submitting data to existing DLPs to participate in "data mining."
- Running DLP Validating Nodes
Validating nodes are the gatekeepers of the Vana network, responsible for verifying the quality of data submitted to the DLP. Running a validating node requires not only technical capabilities but also sufficient computing resources. According to Vana's technical documentation, the minimum hardware requirements for a validating node are 1 CPU core, 8GB RAM, and 10GB of high-speed SSD storage.
Users interested in becoming validators need to first choose a DLP and then register as a validator through that DLP's smart contract. Once registration is approved, validators can run validating nodes specific to that DLP. Notably, validators can run nodes for multiple DLPs simultaneously, but each DLP has its unique minimum staking requirements.
- Creating New DLPs
For users with unique data resources or innovative ideas, creating new DLPs is an attractive option. Creating a DLP requires a deep understanding of Vana's technical architecture, particularly the Proof of Contribution and Nagoya Consensus mechanisms.
The creators of new DLPs need to design specific data contribution goals, validation methods, and reward parameters. At the same time, they must implement a Proof of Contribution function that accurately assesses data value. While this process is complex, Vana provides detailed templates and documentation support.
- Participating in Data Mining
For most users, submitting data to existing DLPs to participate in "data mining" may be the most straightforward way to get involved. Currently, there are 13 DLPs that have been officially recommended, covering various fields from social media data to financial prediction data.
- Finquarium: Aggregating financial prediction data.
- GPT Data DAO: Focused on ChatGPT chat data exports.
- Reddit Data DAO: Concentrating on Reddit user data, officially launched.
- Volara: Focused on collecting and utilizing Twitter data.
- Flirtual: Collecting dating data.
- ResumeDataDAO: Concentrating on LinkedIn data exports.
- SixGPT: Collecting and managing LLM chat data.
- YKYR: Collecting Google Analytics data.
- Sydintel: Crowdsourcing intelligence to reveal the dark corners of the internet.
- MindDAO: Collecting time series data related to user happiness.
- Kleo: Building the world's most comprehensive browsing history dataset.
- DataPIG: Focusing on token investment preference data.
- ScrollDAO: Collecting and utilizing Instagram data.
Some of these DLPs are still in development, while others are already online, but all are in the pre-mining stage. Users can only officially submit data for mining once the mainnet is launched. However, users can lock in participation eligibility in various ways in advance. For example, users can participate in relevant challenge activities in the Vana Telegram App or pre-register on the official websites of various DLPs.
Conclusion
The emergence of Vana signifies a paradigm shift in the data economy. In the current wave of AI, data has become the "oil" of the new era, and Vana seeks to reshape the extraction, refining, and distribution model of this resource.
Essentially, Vana is building a solution to the "tragedy of the commons" for data. Through clever incentive design and technological innovation, it transforms personal data—a seemingly infinite supply that is difficult to monetize—into a manageable, priceable, and tradable digital asset. This not only opens new avenues for ordinary users to participate in the distribution of AI dividends but also provides a possible blueprint for the development of decentralized AI.
However, Vana's success still faces many uncertainties. Technically, it needs to find a balance between openness and security; economically, it must prove that its model can generate sustained value; and socially, it must address potential data ethics and regulatory challenges.
On a deeper level, Vana represents a reflection and challenge to the existing data monopoly and AI development model. It raises an important question: in the age of AI, do we choose to continue reinforcing the existing data oligopoly, or do we attempt to build a more open, fair, and diverse data ecosystem?
Regardless of whether Vana ultimately succeeds, its emergence provides us with a window to rethink data value, AI ethics, and technological innovation. In the future, projects like Vana may become important bridges connecting Web3 ideals with AI realities, pointing the way for the next stage of development in the digital economy.