2022 Year-End Summary: The Current State and Future of Decentralized Storage
Author: Jason, Puzzle Ventures
TL; DR
Separating non-core data from the main chain and storing it in a DSN (Decentralized Storage Network) has become a mainstream solution for addressing scalability, enhancing interoperability, and protecting privacy.
Filecoin has significant advantages in miner networks and storage costs, but Arweave's permanent storage can provide a more reliable solution for NFT or social applications.
L1 expanded storage networks can offer new solutions with higher interoperability compared to DSN, but still require time to validate the model.
Future decentralized storage scenarios will include complete interoperability, data availability, security, and user/developer-friendly off-chain computing provided through middleware and APIs.
Development History of Decentralized Storage
The data storage sector has always been a cornerstone and a key focus area in the tech industry. Since centralized cloud storage services were provided by companies like Amazon, Microsoft, and Alibaba, the cloud storage market has formed a market size of over $100 billion (2021), with a significant head effect (Amazon 40%, Microsoft 22%, Alibaba Cloud 10%, data source: Gartner). With the explosion of Web3, a number of decentralized storage providers have begun to meet the demands that traditional cloud storage markets cannot satisfy.
Although most dApps are still deployed on AWS, and over 50% of Ethereum nodes were running on AWS before the merge[1], Web3 projects have gradually recognized the importance and inevitability of connecting back-end and metadata with decentralized storage. According to the definition from Ethereum.org, decentralized storage is a data-sharing system composed of a p2p network, where each operator owns a part of the overall data and can reconstruct the data through algorithms[2].
While decentralized storage can solve many issues, such as data privacy protection and censorship resistance, the current development of decentralized systems is still immature and has many urgent problems to address. After continuous attempts with (main) chain storage solutions, DSN (Decentralized Storage Network) has become the mainstream solution, which can maximize storage capacity to nearly infinite levels while ensuring data security and privacy.
Table 1 Comparison of Blockchain Data Storage Solutions
For specific on-chain expansion solutions, please refer to the article: https://zhuanlan.zhihu.com/p/48078642
The author believes that the off-chain storage layer will eventually become an important infrastructure in the Web3 ecosystem, solving any protocol layer and operational layer data storage issues through hot and cold storage methods.
In the future, the scalability issue in the "impossible triangle" can be resolved by completely separating the storage layer while maintaining interoperability, and improving the efficiency of data computation, invocation, and pushing through middleware and API services built on top of the cold storage layer.
Current Status of Decentralized Storage: Classification and Scale
Decentralized storage essentially serves the application layer of the Web3 ecosystem, thus its solutions are more inclined to meet the needs of end users, executing data storage, computation, and invocation demands in a more efficient and cost-effective manner. In terms of classification, Arweave, Filecoin, and Storj have formed three independent leading decentralized storage networks, with Filecoin and Storj being more decentralized P2P storage networks, while Arweave is a more tightly organized storage node network.
On the other hand, different types of storage aggregation networks have gradually become more attractive gateways due to their ability to meet user needs for cost optimization and convenient storage, but they still rely on Arweave or Filecoin. Additionally, L1 expansion networks represented by EthStorage focus on designs that can directly invoke the main chain DA layer while providing additional computing power, deeply linking with L1.
(Note: The projects listed in the figure mainly consist of existing and new projects, excluding past projects)
Source: Puzzle Ventures
Moreover, computing networks and coordinating middleware built on top of underlying storage networks or independently are also supplementary parts of the decentralized storage ecosystem, but this article does not focus on them as they are primarily ecosystem dApps.
From the perspective of development scale, Filecoin leads significantly in terms of revenue, FDV, and market share, and the user storage utilization continues to increase. However, a quarterly revenue of over three million dollars is not considered much in either the Web2 or Web3 fields, especially under nearly monopolistic conditions. Therefore, the entire decentralized storage sector still has considerable room for growth.
Table 2 Decentralized Storage Q3 2022 Revenue and Scale
Source: Messari, Web3 Index, CoinGecko
Meanwhile, Arweave and Storj's quarterly revenues are still at the level of hundreds of thousands of dollars, which translates to daily revenues of just over a thousand dollars, far from reaching their potential market scale. This is partly related to the overall market slump during the bear market, and partly because the current DSN storage solutions have not yet reached a tipping point that perfectly aligns with Web3. In the future, the entire DSN market will need more middleware and aggregators to provide more user traffic, while also continuously adjusting and optimizing its own costs and value.
Mainstream Solutions: Comparison of Filecoin and Arweave
1) Functionality
At this stage, the different technical architectures adopted by Filecoin and Arweave also lead to differences in functionality. According to a research report by Fundamental Labs[3], we define the evaluation framework for functionality as: flexibility of storage range, permanence of storage, redundancy avoidance capability, incentivization of data storage, universality of stored data, and data availability. To some extent, Arweave and Filecoin are complementary solutions: Arweave focuses more on the permanence and stability of data storage, making it more suitable for metadata and historical data; while Filecoin can provide more flexible storage solutions (storage time, types), making it more suitable for personal and non-critical data storage.
Source: Fundamental Labs
From the latest technological development progress, Filecoin version 17 "Shark" mainly establishes the programmability of the storage protocol, adding an interactive layer of smart contracts on the storage layer, with plans to launch the FEVM system in February next year. Data availability and activity on Filecoin will improve, supporting more diversified interoperability, which is a flexibility advantage of Filecoin compared to Arweave. Meanwhile, Arweave continues to develop in data security, node incentive stability (endowment), and storage entry points, continuously "recharging faith" in the concept of permanent storage. Therefore, while it is difficult to assert which solution has more advantages, both decentralized storage protocols have developed relatively well in various functionalities to meet the basic needs of decentralized storage.
2) Costs
Similarly, it is also challenging to directly compare the costs of Filecoin and Arweave; this article aims to provide a reasonable perception of the costs of both. To create comparability, this article uses the price for the same storage size over a 200-year storage period as the comparison object (Arweave is generally considered to support storage for 200 years).
Table 3 Comparison of Cloud Storage Costs (Data extracted in December 2022)
Source: Amazon.com, Arweavefees.com, File.app, Filfox.info
From Table 3, it can be seen that the basic S3 Standard package from AWS costs about $0.023 per GB per month, which means that storing for 200 years would require $55.2 per GB, while the Glacier Deep Archive plan, which is more suitable for long-term cold storage, can reduce the price to $2.38 per GB, but has limitations on the number of archiving per year. Arweave's price fluctuates with the price of the AR token, and the price in December 2022 was $1.42 per GB, meaning that after about 5 years of storage, it can match the storage cost of S3 standard.
On the other hand, the price calculation for Filecoin is more complex, as different miners offer different prices, and the price can be affected by the FIL token price, storage duration, and frequency of data invocation. However, regardless, the storage cost of Filecoin is already several orders of magnitude cheaper compared to Arweave and S3, and can almost be considered zero-cost storage.
But is price the most important factor? The permanence of Arweave's storage holds greater significance for Web3. For example, if an NFT loses its metadata and image data, it instantly loses value and becomes an inaccessible link, whereas in Filecoin's storage protocol, the overly decentralized management approach can lead to the risk of Garbage Collection, meaning that if certain documents are not starred in the storage node, there is a risk of accidental deletion.
At the same time, when the storage time for Filecoin expires, it requires updating the storage party, which can also create risks. In contrast, Arweave encourages nodes to store data for 200 years through the AR incentive mechanism, and compensates for the potential depreciation of AR in the future, while periodically conducting random checks during the storage process to ensure that nodes are still storing data intact. These measures are all aimed at ensuring that each node can obtain incentives through complete data storage, thereby safeguarding the security and reliability of the stored data. In a Web3 world where hacking incidents are frequent, the value brought by the reliability of data storage cannot be measured simply by storage prices.
3) Ecosystem
The decentralized storage ecosystem includes miners/nodes and ecosystem dApps.
Table 4 Comparison of Ecosystem Data between Filecoin and Arweave
Source: Filscan.io, Messari
Filecoin, due to its early community building and high attention, currently has a storage capacity of about 18.9EB, which is 170,000 times that of Arweave. However, Arweave has achieved a storage capacity growth rate 2.5 times higher than Filecoin in over three years of development, and Filecoin's growth rate has actually approached a stable linear trend. From the market capitalization perspective, Arweave's current FDV is about 40% of Filecoin's, indicating that Arweave is likely to quickly catch up with Filecoin's storage capacity ecosystem in the future.
According to the future plans released by Filecoin in November this year[4], Filecoin will focus on meeting the needs of computation and a broader range of data onboarding, thereby truly becoming Layer 0 of the Web3 world. From the perspective of storage capacity expansion, Filecoin promotes more miners to participate in node construction through incentive programs like FIL+ and ESPA Bootcamps.
In terms of data onboarding and computation, Filecoin will focus on specific niche markets, such as NFT metadata, Web2 data sources, etc., and then establish on-chain computing power through the Filecoin Virtual Machine to make these data more conveniently usable. Recent Filecoin proposals (FIP-0044, FIP-0045) also indicate that data onboarding is currently a development focus.
In contrast, Arweave's latest version 2.6[5] primarily focuses on cost reduction and efficiency improvement, aiming to lower overall storage costs and prices through hardware iteration while enhancing data storage and invocation efficiency. Thus, Arweave hopes to attract more developers and users by establishing a better foundational environment and meeting further use cases such as gateways and computation through ecosystem projects.
Therefore, Filecoin's logic is to provide a unified, storage-focused decentralized storage network reliant on a miner ecosystem; while Arweave resembles a standard Web3 ecosystem, creating a user-friendly environment suitable for development through a complete underlying architecture and incentive mechanisms, achieving organic growth in user and dApp numbers.
From the perspective of ecosystem projects, Arweave has already incubated a number of high-quality projects targeting technical optimization, storage entry, and application expansion, including Bundlr, Kyve, ArDrive, KwilDB, etc., with most project leaders being young entrepreneurs with significant growth potential. Filecoin's ecosystem projects have a large base advantage in quantity, resembling an open-source environment that can meet broader storage-related needs but lacks a focused development direction.
New Solutions: Aggregation Networks and L1 Expansion Networks
1) Aggregation Networks
The concept of aggregation networks actually emerged as early as the launch of Filecoin, with early projects like Coldstack. However, Coldstack did not achieve significant success, mainly because a purely Uber-style aggregator cannot genuinely enhance storage efficiency and cannot provide further value beyond that. While storage protocol aggregation networks can indeed address user pain points regarding storage efficiency and costs, there are various approaches to achieve this.
Firstly, these aggregation networks must perform better than the gateways of Filecoin and Arweave themselves to gain market recognition. As Filecoin develops towards FVM chain-based computation, it may squeeze out a batch of aggregation networks built on Filecoin that provide additional computation capabilities. Secondly, different aggregation protocols should solve problems that Filecoin or Arweave themselves find difficult to achieve, rather than creating a unified aggregation traffic entry (at least not as an early entry point), because the ultimate control of a unified traffic entry will be held by Filecoin and Arweave themselves.
Jackal is an aggregator solution built on Filecoin that focuses on enhancing data privacy and security. It provides a set of AES-256 encryption solutions for existing files and manages and charges through a PoS consensus L1 chain established via Cosmos, effectively creating a private cloud storage protocol on top of Filecoin.
At the same time, Jackal will provide three copies of files through its Proof-of-Persistence algorithm and optimize storage prices. Redundancy and cost are not the core issues of decentralized storage, but enhancing data privacy and security is something that Filecoin itself finds challenging to achieve in a short time. 4EVERLAND is another platform that aggregates Filecoin, Arweave, and Dfinity as underlying storage layers, primarily focusing on providing decentralized storage, computing, and networking core capabilities for dApps, creating a connection layer between the underlying decentralized storage layer and application layer.
Currently, Web hosting and Bucket are its two main products. Hosting can complete front-end hosting with the acceleration of over 200 global gateways in 4EVERLAND, while Bucket serves as an aggregated gateway. Both products significantly enhance the practical usability of decentralized storage from a user-friendliness perspective and address optimization issues in details such as multi-currency payments and data dashboards.
2) L1 Storage Expansion Networks
L1 storage expansion networks are essentially similar to existing Layer 2 solutions, which are a collective term for solutions that enhance L1 performance and scalability. EthStorage is the first Layer 2 solution centered on storage expansion, mainly including three categories of functions: Proof of Publication, External Data Retention, Access Protocol.
Proof of Publication achieves CRUD functionality for stored data through its specialized KZG Commitment and Reed-Solomon Code, while Filecoin and Arweave can currently only achieve CRD. After adding the Update function, combined with Access Protocol, which allows direct rendering of resources hosted on Ethereum contracts via Web3 URLs, it becomes easy to implement on-chain activities such as Web3 Email, Web3 Blog, and Web3 Drive, all of which contribute to the future large-scale applications of Web3.
On the other hand, due to its EVM compatibility and on-chain storage characteristics, EthStorage has shorter interaction paths compared to Filecoin and Arweave, thus offering higher composability. For example, EthStorage can store NFT metadata on-chain in an EVM environment and can complete NFT combinations through smart contracts, enabling new programmable features that enhance user experience, while the programmability of on-chain data (on a large scale) will also give rise to more new DeFi, GameFi, and SocialFi applications in the future. Currently, L1 storage expansion networks and DSN networks are complementary yet competitive; it will take time to validate their survival in the EVM ecosystem and evolve their storage paradigms into official standards.
Future Development Possibilities
In summary, Filecoin and Arweave have developed into representative projects of decentralized storage, marking the completion of the 0 to 1 process of decentralized storage. In terms of development philosophy, separating non-core on-chain data to the storage layer has become the main means of Ethereum expansion.
At the same time, storage aggregation networks and L1 expansion networks, as supplements and extensions, can meet user-specific needs more efficiently and at lower costs, while the foundational DSN network still has many issues to resolve: a lack of ecological applications, separation of storage and computation, and difficulties in Web2 onboarding are all hindering DSN's progress toward the next wave of growth. In the future, this article predicts that the following types of projects may become the engines of the next wave of growth:
- DSN Aggregators: Aggregation entry points built on different/same DSN networks, providing cost-optimized choices, unified token payments, data availability/invocation services, and other tools to enhance user-friendliness. Similar projects: Lighthouse, 4EVERLAND
- Off-chain Computation: Although Filecoin has laid out decentralized computation on-chain, purely on-chain computation cannot meet more complex computational requirements. Comprehensive solutions that combine on-chain data and off-chain computation can enhance the value of data usage by completing more complex calculations and expanding use cases. Similar projects: KwilDB, Tableland
- L1 Storage Expansion Networks: The advantages of L1 storage and computing expansion are that they can directly connect deeply with the L1 ecosystem, even meeting the data storage needs of projects with high demands like social and gaming, while also enabling rapid large-scale integration after obtaining official recognition. Similar projects: EthStorage
- Third-party Data Security Networks: Providing security protection for data stored in existing DSNs, auditing, backing up storage nodes, and blocking hacker attacks, along with offering relevant compensation insurance plans. Similar projects: Jackal
References:
[1]https://cointelegraph.com/news/3-cloud-providers-accounting-for-over-two-thirds-of-ethereum-nodes-data
[2]https://ethereum.org/en/developers/docs/storage/
[3]https://6pjecoitbb3mbacc67rvmct3gbugplnty5ptok4rgunlo24tvq.arweave.net/89JBORMIdsCAQvfjVgp7MGhnrbPHXzcrk_TUat2uTrA
[4]https://filecoin.io/blog/posts/the-filecoin-masterplan/
[5]https://arweave.news/arweave-2-6-major-upgrade-more-data-less-energy