Decentralized Storage Concept, Practice, and Prospects
Author: PermaDAO
Decentralized storage is a method of data storage that does not rely on a single central control point. This approach contrasts with traditional centralized storage (such as conventional cloud storage services like Amazon S3 or Google Cloud), which is typically managed by a single company or organization.
Mainstream Decentralized Storage
Currently, the mainstream decentralized storage solutions on the market include Arweave, Filecoin, and Storj. Each has its unique features and design philosophies:
- Arweave focuses on long-term or permanent data storage.
- Filecoin offers a decentralized marketplace similar to traditional cloud storage, supporting flexible storage needs.
- Storj emphasizes providing secure and privacy-protecting decentralized cloud storage services.
All three platforms utilize blockchain technology, but their application scenarios, technical implementations, and payment models differ, making each suitable for different types of storage needs:
- Arweave
- Goal: To provide a long-term, permanent data storage solution. Arweave aims to store data "forever," primarily for long-term data preservation.
- Technology: Uses a unique blockchain technology called "Blockweave." Unlike traditional blockchains, Blockweave includes references to earlier random blocks in each new block, designed to encourage long-term data preservation.
- Payment Model: Users pay a one-time fee for data storage, and theoretically, the data can be accessed permanently once stored.
- Filecoin
- Goal: To create a decentralized storage marketplace similar to traditional cloud storage services.
- Technology: Filecoin is the incentive layer of IPFS (InterPlanetary File System). It uses "proof of storage" and "proof of spacetime" to ensure data is correctly stored.
- Payment Model: Users pay storage providers based on the amount of data stored and the duration. This is a more traditional leasing model, allowing users to increase or decrease storage as needed and pay accordingly.
- Storj
- Goal: To provide users with a decentralized cloud storage solution, focusing on security and privacy protection.
- Technology: Storj uses encryption and sharding techniques to protect data security and privacy. Data is encrypted and split into multiple small pieces on the client side before being distributed across nodes worldwide.
- Payment Model: Storj's payment model is similar to traditional cloud storage, charging based on the storage space and bandwidth used.
In contrast, Arweave stands out with its emphasis on permanent storage, focusing more on data censorship resistance and durability. Filecoin and Storj both utilize a storage marketplace, emphasizing the use of blockchain technology to reconstruct the storage market.
Business Architecture Analysis
The theoretical basis for Arweave's permanent storage is similar to "Moore's Law." According to statistics on data storage costs from 1980 to the present, storage costs have been decreasing at a rate of 20% per year. Following this statistical trend, the cost of data storage will converge to a constant after an infinite number of years. Arweave's permanence is based on this, calculating the storage cost for data over 200 years. Users will pay this fee upfront when storing data.
At the same time, Arweave has designed a very elegant and simple data mining mechanism. We can call it "Effective Data Mining."
"Effective data" refers to data that has already been stored in the Arweave network, for which users have paid a 200-year storage fee. Another group of roles in the network—miners—mine using effective data and provide reading services for effective data. Unlike other storage blockchains, Arweave does not require miners to store data but establishes incentive rules to encourage each miner to maximize the storage of "effective data." In the Arweave network, the more "effective data" a miner stores, the greater their mining "computational power."
Assuming there are 100 TB of effective data in the Arweave network, miners are not required to store all 100 TB of data. In other words, a miner can mine by storing only 100 MB of data, but their computational power will be very small. If a miner chooses to store all 100 TB of data, their computational power will reach its maximum.
In the "Effective Data Mining" mechanism, the Arweave network incentivizes miners to store as much data as possible but does not force them to store all data. So, is there a possibility of data loss under this incentive model? Below is a simulation calculation regarding data loss:
In the first and second rows, the 0.5 indicates that a single node stores 50% of the data. Assuming there are 200,000 blocks in the network, with 200 nodes, each node randomly stores 100,000 blocks (50% of the block data), the probability of a single block being inaccessible can be calculated as 6.223\^10-61. The data reliability provided by cloud services is 99.9999999%, or 10 to the 7th power. The above Arweave calculation reaches an astonishing 61st power.
Both Filecoin and Storj have established a data storage market using blockchain technology. Among them, Storj primarily improves data privacy. This article mainly explains the principles of Filecoin.
Similar to a traditional order book, users need to place bids in the trading market when using Filecoin, specifying the duration and number of backups for data storage, and miners will accept profitable orders. To ensure fairness in the entire trading market, Filecoin has established a complex economic model with various rules, including penalties and small installment payments. Its core technologies are proof of replication and proof of spacetime.
Proof of Replication: Miners prove to users that data has been stored using dedicated physical devices. Each time miners prove they have stored user data, the network pays them a fee.
Proof of Spacetime: If only proof of replication exists, it does not guarantee that your data is continuously stored; miners could store this data only when submitting proof. Therefore, Filecoin supplements with proof of spacetime, aimed at ensuring miners continuously store this data.
In summary, the basis and implementation plan for Arweave's permanence are:
- The cost of permanence decreases year by year.
- Incentivizing miners through "Effective Data Mining" to achieve permanent data storage.
Filecoin and Storj create decentralized storage markets using blockchain technology, with their models resembling traditional trading market order books, where demand is provided by order placers and miners accept orders to ensure data storage. The core technical points of Filecoin are: proof of replication and proof of spacetime.
Storage Operations
There are two ways to store data on Arweave. The first method is to send data directly to Arweave nodes and pay in AR. The second method is to use the ANS-104 (Bundled) data binding protocol to batch package data into Arweave.
Directly Storing Data to Arweave
Users only need to prepare a wallet holding AR to complete this action. Use the following code to store a file named file.pdf to Arweave:
For more documentation, refer to: https://github.com/ArweaveTeam/arweave-js.
Using ANS-104 to Store Data to Arweave (Recommended)
Arweave's block generation rate is relatively low, usually around 2 minutes, and a single block can only process 1000 transactions, which greatly limits the number of transactions for Arweave storage. Although a single Arweave transaction can store unlimited data, users can store 100 MB or even 10 GB of data directly in one transaction. To solve the transaction volume expansion issue, ANS-104 was created.
ANS-104 is a multi-transaction binding technology that can bind tens of thousands of different data entities into a single ordinary Arweave transaction. It can be compared to Ethereum's Layer 2 Rollup solutions, with the distinction that ANS-104 does not compromise data security; the bound data is also 100% fully stored on Arweave.
The code example for storing data using ANS-104 is as follows:
This code uses the arseeding light node as the data binding service. The arseeding light node is a fully open-source Arweave data node that supports all Arweave native node interfaces and extends the ANS-104 interface. Additionally, arseeding integrates the cross-chain payment protocol everPay, allowing users and developers to use various assets such as ETH, BNB, USDT, and USDC to pay for data permanence, in addition to using AR.
For more documentation, refer to: https://web3infra.dev/docs/Arseeding/guide/quickStart.
Storage Costs
Currently, storing 1 GB of data on Arweave costs $7.5. For the latest storage fee reference, visit: https://ar-fees.arweave.dev/.
Retrieving and Downloading Data from Arweave
Arweave has a standardized GraphQL service interface, allowing individuals and organizations to implement Arweave indexing according to standards. Here are two typical and useful indexing gateways:
- ArweaveNet Gateway, the most comprehensive index. https://arweave.net/graphql
- KNN3 Gateway, real-time retrieval of arseeding node data, fast speed. https://knn3-gateway.knn3.xyz/arseeding/graphql
To download Arweave data, you only need to know the data's ARID or ItemID. Here is a code example:
Filecoin Storage Method
Unfortunately, Filecoin does not provide storage tools for ordinary users and developers; for regular developers, Filecoin is in an unusable state. Some sporadic technical documents can be found that describe storage solutions through third-party service providers, but upon careful examination of the service providers' documentation, most only offer IPFS storage, and these providers may not necessarily store data on Filecoin. Perhaps due to the author's limited ability, I could not find a good way to store data on Filecoin, nor is there a corresponding interface to directly retrieve data from Filecoin.
Storj Storage Method
Storj's storage method is similar to Web2; developers need to register on the official website and obtain an API-KEY. Storj's storage is compatible with the AWS S3 interface, so I won't elaborate further. Storj's storage costs are very low, with 1 GB of storage for 1 month costing only $0.004. However, when converted to 200 years of storage costs, it is slightly higher than Arweave, at $9.6.
From the storage operations, it can be seen that Arweave's transaction processing model is consistent with that of Bitcoin/Ethereum and other blockchains. Filecoin does not provide usable SDKs and interfaces, which is regrettable, as the so-called storage leader is in an unusable state for developers, which is quite disappointing. Storj's storage method is entirely consistent with Web2.
It is worth noting that Arweave is a native blockchain storage; once data is sent to Arweave, it cannot be deleted or tampered with. Filecoin and Storj operate on a leasing model, where project parties can stop storage leasing services at any time. Under this model, data does not possess blockchain characteristics, and its characteristics are consistent with those stored in centralized cloud services.
To more clearly distinguish the differences between Arweave and Filecoin and other data storage solutions, we can refer to the data on Arweave as "consensus data." Whether it is data on BTC or Ethereum, all belong to consensus data, which possesses characteristics of immutability and traceability. The data stored in the Filecoin storage leasing market cannot be called consensus data.
Development Prospects
Decentralized storage has emerged with two completely different business lines. The business line represented by Arweave focuses on consensus data, emphasizing characteristics such as data decentralization, censorship resistance, and traceability. The business line represented by Filecoin centers on a decentralized market, emphasizing the allocation of storage resources and proof of successful storage. Drawing a parallel to the development of DeFi, early IDEX created an order book market using blockchain technology, which is a very traditional business model aimed at solving ticket exchange through a hanging order and eating single model. The explosion of DeFi was brought about by the liquidity mining technology of the Uniswap AMM trading model, which allowed orders to operate completely automatically, achieving the combinatorial nature of liquidity, ultimately leading to the explosive growth of DeFi Summer. In the current decentralized storage track, Filecoin represents a blockchain technology-created order book market, while Arweave uses a unified model similar to AMM to manage data supply and demand. The unified model of Arweave makes data pricing and processing easier, allowing for a more convenient transformation of ordinary data into consensus data. This data on consensus may usher in a "data combinatorial" explosion.
At the same time, it is necessary to mention the SCP theory (Storage Consensus Paradigm), whose core idea is that as long as data storage has consensus, applications formed using this data can also achieve consensus. SCP emphasizes off-chain computation, where data can be stored on various chains like BTC and Ethereum, forming a unique state by aggregating data on the blockchain. Since these states produce the same results when run on any computing unit, why do we still need to compute them on-chain? Wasting so many computing resources?
Currently popular BRC20 and Bitcoin inscriptions use off-chain computation consensus. The BRC20 protocol is consistent with the storage consensus emphasized by Arweave SCP, both providing immutable and traceable transaction data through the blockchain as a data layer, with state calculations conducted entirely off-chain. With Arweave's storage capabilities, the SCP theory can obtain a more powerful consensus data set. The Arweave SCP theory has developed a complete engineering application solution—Permaweb, which is equivalent to the ultimate version of a Bitcoin indexer. Permaweb can not only handle assets but also process text, images, and even videos. Imagine a not-so-distant future where a super-powerful indexer can perform streaming playback, creating a completely decentralized TikTok.
Currently, the Permaweb solution supports a wide range of application types, whether it is cloud storage, content co-creation, or games, this architecture can be easily used for development. Data between Permaweb applications can be combined. For example, a writer can upload their written content and copyright to Arweave through content co-creation, and in another game, developers can directly reference the writer's content and allow players to pay the author for copyright.
The biggest dilemma currently facing DePIN is blockchain performance. DePIN devices will enter thousands of households, but no blockchain can support such a massive user interaction. Most DePIN still use centralized methods to process data, which will cause DePIN to lose its decentralized characteristics. Consensus data can provide stronger empowerment for DePIN; once DePIN data becomes permanent, these data will also acquire combinatorial characteristics. For example, a green energy certificate can offset energy consumption during blockchain PoW computation, serve as an identifier in content creation, and become a badge in games. Data and value will flow everywhere.
Consensus data is also applicable to the field of AI. Human knowledge and history should be preserved forever; consensus data can ensure that AI cannot pollute or tamper with human knowledge and history. Similarly, consensus data can serve as the best raw material for AI, allowing it to learn and process various effective information.