The mainnet is about to launch. How does the distributed storage protocol EpiK turn garbage data into valid data?

ChainNews
2021-08-09 11:26:19
Collection
EpiK Protocol integrates data labeling, distributed storage, and data application functions, attempting to solve the problem of the market's lack of effective data.

Written by: Zeo Zhang

Source: ChainNews

Tencent founder Ma Huateng admitted at the 2017 "Entering the Smart New Era" China (Shenzhen) IT Leaders Summit:

Currently, a lot of big data is garbage data because it lacks labels. No matter how good the algorithm is, it can't be calculated. Data cleaning and labeling are very difficult, and we even have to spend a lot of manpower to clean the data before letting AI learn.

This statement reveals the difficulties in the development of artificial intelligence. After decades of iteration in internet computing, the accumulation of algorithms and computing power has pushed artificial intelligence to a new stage, but the lack of high-quality effective data has become one of the important reasons hindering its development. In addressing this issue, the EpiK Protocol, a distributed storage protocol for AI data that has been running smoothly on the testnet for a year, proposes a solution that combines blockchain technology.

On August 15, 2021, the EpiK Protocol, which has been running smoothly on the testnet for a year, will officially launch its mainnet. As a project that integrates data labeling, distributed storage, and data application functions for the first time, EpiK Protocol attempts to solve the market's lack of effective data and build a distributed storage protocol for AI data that is co-built, shared, and beneficial to all.

image

Why is there a lack of effective data?

On one hand, platforms tacitly allow or even encourage false data brushing behaviors, resulting in a proliferation of useless data.

In the traditional internet industry, data brushing is a common phenomenon: more than 90% of newly opened online stores choose to brush orders to attract traffic; even WeChat accounts with over a million subscribers often brush reading volumes to meet advertisers' data needs; even worse, a certain travel platform was exposed for using a combination of machines and humans to impersonate users and post tens of millions of user reviews, creating confusion and affecting users' real evaluations of products.

In the blockchain industry, which has always touted openness, transparency, and on-chain traceability, data brushing is still not uncommon. The star project in the distributed storage field, Filecoin, has also faced doubts about "ineffective data": when the Filecoin network was just launched, high mining rewards attracted a large number of miners, and some miners even used external programs to fill virtual data or package worthless garbage data.

The rapid increase in Filecoin's storage computing power led to a sudden explosion in the amount of stored data. Additionally, since the Filecoin network initially could not distinguish between stored data, there was very little real and effective data, resulting in a waste of physical storage and negatively impacting the overall development of the Filecoin ecosystem.

On the other hand, the high cost of data processing burdens most artificial intelligence systems.

As we all know, artificial intelligence requires continuous deep learning, which necessitates a massive amount of data support.

A large user base is active on the internet every day, generating a vast array of data. However, this data cannot be used directly. Deep learning in artificial intelligence requires obtaining datasets, labeling data, etc., among which data labeling incurs significant manpower costs.

The widespread application of deep learning networks requires a large amount of labeled data for training to achieve the desired results. However, in the era of big data, although there is an abundance of data, the vast majority remains unlabeled, and labeling these training data requires human intervention.

The higher the quality requirements for data, the more detailed the labeling needs become, which in turn raises the requirements for the quality and expertise of the labelers, leading to higher corresponding costs.

For a long time, this data has been processed by specialized data labeling companies (such as Amazon Mechanical) for use in fields like artificial intelligence. However, the tripartite collaboration between users, data processing companies, and data demanders results in extremely high costs for acquiring valuable data.

image

Labeling, Storage, Sales: EpiK Protocol's One-Stop Data Service

The EpiK Protocol ecosystem introduces three roles: domain experts, bounty hunters, and data enterprises, aiming to build a decentralized, large-scale, co-built, shared, and beneficial AI data storage protocol. Through decentralized storage technology IPFS, decentralized autonomous organizations (DAO), and token economic models, it organizes and incentivizes global community members to compile human knowledge from various fields into usable AI data and continuously update this eternal knowledge base.

In terms of data labeling, EpiK Protocol connects C-end users to launch an AI data labeling system.

"Domain experts" design AI data formats for different fields and publish data labeling tasks; anyone can register as a "bounty hunter," participate in data labeling, become an AI teacher, and earn EPK token rewards.

After completing data labeling, "bounty hunters" return the processed data, and "domain experts" verify the AI data in their respective fields to earn EPK token rewards. Additionally, "domain experts" will optimize AI data formats based on the data results, creating a virtuous cycle that continuously improves data quality.

image

In terms of data storage, EpiK Protocol launches an AI data storage system.

Data that has been labeled and verified will be uploaded by "domain experts" to the AI data storage system for distributed storage, and devices participating in data storage can also earn EPK token rewards.

In terms of data sales, data enterprises can access data by staking EPK and pay to download valid data from the AI data storage system.

Since EpiK Protocol's AI data labeling system is directly aimed at C-end users, it eliminates the presence of intermediary data labeling companies, simplifying all aspects of data processing and circulation, thus reducing costs. For example, the cost of a usable AI voice dialect data in the traditional market is about 12 yuan, while in the EpiK Protocol system, the cost is about 2 yuan, which is 1/6 of the traditional market.

More importantly, because the EpiK Protocol system has "domain experts" from various industries overseeing data governance, the effective data generated by the EpiK Protocol system can more accurately meet the data needs of different AI fields.

image

An Open Economic Model Collaborating with B-end, C-end, and Industry Experts

Unlike the current business model of distributed storage mainly focused on B-end archival data storage services, EpiK Protocol is a decentralized collaborative AI data storage protocol that collaborates with B-end enterprises, C-end users, and domain experts, demonstrating capabilities in cost control, revenue enhancement, and service experience that rival centralized internet giants.

C-end Users: Lower Data Labeling Threshold

EpiK Protocol has developed the AI data collection application "Knowledge Continent" for C-end users, lowering the data labeling threshold while enhancing its fun aspect. The cartoon interface and concise layout turn the tedious and complex task of data labeling into an enjoyable game, organizing its global community members to co-build a large-scale open AI database through gamification.

image

Various industries can create AI data types in "Knowledge Continent," including finance, medicine, law, social media, e-commerce, etc. In the future, as Knowledge Continent develops and is utilized, companies with data needs can choose to collaborate here to collect, organize, and process data together.

Additionally, EpiK Protocol's labeling efficiency is higher. In the three weeks since the AI data labeling system began operations, EpiK Protocol has labeled 17,272 valid data points, with complete data metrics, and each data point has been manually verified up to 10 times. Compared to traditional labeling methods, EpiK Protocol's labeling efficiency is nearly 10 times higher.

image

Most importantly, users participating in EpiK Protocol data labeling can earn higher rewards. In traditional data labeling models, data processors merely serve as manual labor and do not enjoy data profit-sharing rights. However, in EpiK Protocol, the EPK earned for contributing to data effectively grants data equity, allowing contributors to share in the profit dividends from data usage in the future. As data demand increases, the demand for EPK will rise, leading to appreciation in EPK value, benefiting EPK holders.

B-end Users: Incentivizing Effective Data

The EpiK Protocol AI data storage system adopts a classic 1 + 3 configuration, that is, 1 Deamon + 3 Miners (8 cores, 16G, 250G SSD, 3T HDD, 15M bandwidth). Compared to Filecoin, the minimum computing power required to participate in block production in the EpiK Protocol AI data storage system is 0, storage is free, and there is no need to specify nodes, with default unlimited copies and unlimited time, allowing for better utilization of each idle storage device.

Most importantly, while Filecoin allows for the storage of useless data to earn computing power, in the EpiK Protocol storage system, only data verified by "domain experts" can earn computing power. This not only ensures high-quality data but also further curbs the negative impact of ineffective data wasting storage space.

Project Team

EpiK Protocol boasts top industry advisors and powerful investment institutions. Notable AI scientist Ben Goertzel, founder of SingularityNET and the chief scientist behind the world's first robot citizen Sophia, serves as an advisor to EpiK Protocol, assisting in promoting the European and American data markets and helping to build a high-quality AI data ecosystem.

image

In terms of financing, EpiK Protocol has garnered the favor of institutions including FBG Capital, JACKDAW, 1475, ChainUp Capital, and 7 O'clock Capital, helping the distributed storage of AI data enter the public eye with new momentum.

Mainnet Launching Soon

According to the latest news from the team, EpiK "Mainnet 1.0 Rosetta" will officially launch at 12:00 PM on August 15, 2021. This date also marks the one-year anniversary of the EpiK testnet launch. Currently, the testnet 5.0 has over 60,000 testing nodes and is producing blocks stably.

As the mainnet approaches, mining has become one of the focal points for EpiK Protocol users. As the incentive token for the EpiK Protocol ecosystem, EPK has a total issuance of 1 billion, with the following specific distribution rules:

  1. Genesis Team: 5%, released 1/16 every 90 days;
  2. Foundation: 5%, released 1/4 every 90 days;
  3. Investors: 20%, released 1/7 every 90 days;
  4. Community: 70%, block production rewards decrease every 90 days, halving over 4 years, fully released over 50 years.

image

EpiK Protocol has two major systems: AI data labeling and AI data storage, corresponding to two main participation methods:

The first type is to participate in labeling AI data and become an EPK bounty hunter. By utilizing spare time to label data in different AI fields, the more tasks completed, the higher the EPK earnings. Those who answer questions seriously also have the chance to win knowledge badge NFTs, which can later be used to participate in EPK airdrop events.

The second type is to participate in storing AI data and become an EPK storage node. By utilizing idle storage devices, one can participate in storing valid AI data. Each storage node needs to complete a basic stake of 1,000 EPK to have block production rights. Storage nodes are randomly selected for block production opportunities, but the probability of being selected is linked to the amount of successfully stored data: the more data stored, the higher the probability of selection, and the top 100 storage providers for the same file enjoy double computing power.

Only data verified by domain experts is counted as valid storage, allowing storage nodes to earn computing power. Therefore, to acquire more valid storage, additional traffic collateral is required. 1 EPK can be used to access 10 Mib of data or to package 10 Mib of data. Currently, the unlocking period for basic collateral withdrawal is 0 days, while the unlocking period for traffic collateral withdrawal is 3 days.

Conclusion

EpiK Protocol has also begun exploring distributed governance, launching EpiK DAO on July 20. As the first DAO governance model in the distributed storage track, community users can leverage EpiK DAO to participate in the dynamic adjustment of EpiK ecosystem resources, effectively ensuring the sustainable development of the EpiK community and addressing potential resource misallocation issues in the future.

As the era of Web 3.0 approaches, the importance of data is increasingly highlighted. EpiK Protocol has created a low-threshold, high-efficiency data revenue-sharing closed loop from data labeling to distributed data storage and enterprise data application integration. It will be exciting to see how EpiK Protocol realizes rich application scenarios in the future.

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
ChainCatcher Building the Web3 world with innovators