Yao Qian: Data Hosting Promotes Data Security and Sharing
Source: Yao Qian, China Finance Magazine
In the era of the digital economy, data has become a new type of production factor and is a fundamental and strategic resource driving economic transformation and upgrading. Transforming data into data assets, enabling its orderly circulation and compliant use, is an important issue for the development of the digital economy. In recent years, China has successively promulgated and implemented relevant laws and regulations such as the "Cybersecurity Law," "Data Security Law," and "Personal Information Protection Law," initially establishing a legal guarantee system for data. On December 2, 2022, the Central Committee of the Communist Party of China and the State Council issued the "Opinions on Building a Data Basic System to Better Play the Role of Data Elements," proposing 20 policy measures including the establishment of a property rights system for data elements, a circulation and transaction system, a benefit distribution system, and a governance system. The aforementioned programmatic documents have important guiding significance for exploring specific implementation plans for data confirmation of rights, pricing, circulation, transaction, use, distribution, and governance.
Dilemmas in Data Rights Distribution
As a brand-new production factor, how to price data and distribute benefits has attracted the attention of many researchers and industry professionals. In February 2022, Turing Award winner and Chinese Academy of Sciences academician Yao Qizhi released a data element pricing algorithm and benefit distribution platform. He believes that the data pricing algorithm is a very novel interdisciplinary field that involves economics, computational science, and artificial intelligence, requiring a theoretical foundation in information economics, game theory, and computational economics. Among them, information economics studies the value and role of information in economic activities; cooperative game theory in game theory can provide a theoretical basis for multi-party modeling of data; computational economics involves joint modeling of data elements and calculation of computing power costs. Mr. Yao Qizhi's research results show that, based on cooperative game theory, the contribution of different data to decision-making models can be established, with data elements that contribute more being more valuable. By coupling the utility function of economic entities with the contribution of decision-making models, we can conduct a reasonable and fair quantitative assessment of the economic value of different data elements, thus pricing and distributing benefits for data elements. This is the mechanism of data element pricing, which in practice requires the market mechanism to play a role in achieving effective pricing and reasonable allocation of data resources. To this end, it is crucial to streamline relationships among all parties.
Data stakeholders can be divided into two levels: one level includes data subjects, data processors, and data users directly related to data production and consumption; the other level includes regulatory agencies, the state, and international organizations indirectly related to data production and consumption. The business scenarios directly related to data production and consumption are: data subjects generate raw data, usually including Know Your Customer (KYC) data, transaction detail data, etc.; data processors collect and control raw data, processing it into data products and services, such as customer profiles and statistical analyses; data users obtain data products and services from data processors for commercial purposes, including marketing and risk identification. The business scenarios indirectly related to data production and consumption include: regulatory agencies supervising the industry according to their responsibilities, such as anti-money laundering and anti-monopoly; the state legislating data governance, such as the Cybersecurity Law, Data Security Law, and Personal Information Protection Law, and controlling cross-border data flow; and international organizations promoting the establishment of global data standards, such as data message standards ISO 8583 and ISO 20022.
Currently, there are many unreasonable phenomena in the distribution of rights among data stakeholders, mainly reflected in data processors monopolizing data rights by leveraging their technological advantages and application scenarios. Data users obtain data products and services from data processors and pay a price; data processors monopolize data rights, resulting in data subjects being unable to gain benefits from transferring raw data, while the state cannot collect corresponding digital taxes, and regulatory agencies face difficulties in supervision and law enforcement due to a lack of data. Furthermore, data processors often use their technological advantages to establish their own standards to maintain data benefits, leading to data silos and monopolies.
Data Custody Infrastructure Reshaping Data Rights Distribution Pattern
In traditional methods, data processors handle both data storage and data usage; however, in the new data custody model, the storage, usage, and management of data are separated, with data custodians providing public and trustworthy data storage and custody services for all parties. The data storage work is undertaken by specialized data custody institutions, initially focusing on high-value data and database logs, gradually transitioning to full data storage. Data processors collect and process data under regulatory conditions and provide data products and services to consumers, with the processed data also needing to be uniformly stored by data custody institutions. Data custody also supports regulatory agencies and relevant national departments in preventing data abuse, monitoring cross-border data flow, law enforcement, and collecting digital taxes.
The new data custody infrastructure changes the traditional model centered on data controllers, establishing a new type of production relationship centered on data, fundamentally altering the distribution pattern of data rights, and helping to establish a fair pricing mechanism between data consumers and data processors (see Figure 1).
From the perspective of data processing and service flow: data subjects entrust raw data to data custodians; data processors obtain data, process it, and the processed data products also need to be entrusted; data custodians supervise the data usage and service processes of data processors; data processors can provide data products and services to data users in a market-oriented manner.
From the perspective of data rights distribution flow: data users consume data products and services and pay a price to data custodians; data custodians distribute raw data rights to data subjects according to rules and distribute value-added data rights to data processors; data custodians report regulatory data and cooperate with law enforcement according to regulatory requirements; data custodians pay digital taxes as required by the state; data custodians conduct data governance according to common standards.
International Practices in Data Custody
In recent years, international exploration in data custody has begun, achieving initial results in certain areas, with practices in copyright custody being of particular reference value.
To achieve a balance between knowledge dissemination and copyright protection, the global non-profit organization Creative Commons has launched a licensing model, attempting to provide a free, simple, and standardized way of granting copyright that allows others to copy, distribute, and use knowledge works while ensuring that copyright is not infringed. There are six types of licenses. Among them, the most permissive license allows reusers to distribute, adapt, and reconstruct the original work through any medium as long as they provide attribution, and it allows for commercial purposes; the most restrictive license only allows reusers to copy and distribute the work in its unaltered form for non-commercial purposes while retaining the original author's attribution. Currently, Creative Commons has gathered various educators, artists, technicians, legal experts, social activists, and related international groups who support open knowledge sharing. They entrust the copyright of their works to content platforms that support Creative Commons licenses, allowing reusers to distribute, remix, adapt, and rebuild the original work according to the specified rules. Currently, platforms such as Wikipedia, Google, Bing, Flickr, and YouTube have integrated Creative Commons licenses, with over 1.4 billion works entrusted to these platforms for open sharing under licenses, including literary arts, open education, and scientific research in video or audio formats.
The licensing-based work custody and sharing model effectively resolves the contradiction between protecting creators' rights and open knowledge sharing, and the data custody idea proposed in this article aligns with it. However, it is concerning that since the custodians of works include commercial platforms like Google and YouTube, their profit-seeking nature may ultimately deviate from the original intention of open knowledge sharing. In light of this, to avoid potential conflicts of commercial interests, a better solution for data custody is to either entrust data to trustworthy non-profit public institutions or to custodians based on trustworthy Web 3.0 technologies.
The former idea has similar cases. The Public Library of Science (PLOS), established in 2001, is a non-profit organization aimed at promoting open sharing of global scientific journals. For over 20 years, PLOS has organized many influential journals for open sharing. Researchers can publish their research results online on PLOS after undergoing rigorous peer review, and the results can be freely accessed without restrictions. Additionally, PLOS also stores the foundational data related to research results in dedicated databases, publishing them alongside research articles to ensure that the data in the articles is verifiable, replicable, and reusable, which helps promote new scientific research. Overall, the communication platform created by PLOS can be regarded as a trustworthy data custody infrastructure.
The latter idea is currently being actively explored. Blockchain technology has unique advantages in copyright confirmation and protection, as it can effectively avoid conflicts between commercial interests and public services without relying on specific institutions. Currently, Creative Commons is actively researching how to integrate the knowledge licensing model with Web 3.0 technology to better achieve free and open sharing of knowledge.
Conclusion
Data custody institutions, as trustees for all data subjects, can effectively ensure data security, controllability, and efficient utilization by centrally managing data assets. Just as front-end stock trading requires back-end stock registration and custody, data custody institutions play the role of back-end infrastructure for big data exchanges, forming a complete big data infrastructure system together with big data exchanges. Data custody institutions can be industry alliances formed by relevant organizations to promote data co-construction and sharing; they can also utilize blockchain technology to achieve on-chain custody, confirmation of rights, transactions, circulation, and rights distribution based on consortium chains or managed public chains. Which method is better remains to be further explored and verified in future practices.