IOSG Weekly Brief |Deconstructing the Data Availability Layer: The Overlooked Lego Blocks in a Modular Future
Author: Jiawei, IOSG Ventures
Editor: Olivia, IOSG Ventures
Deconstructing the Data Availability Layer: The Overlooked Lego Blocks in a Modular Future
- There is almost no disagreement on using erasure coding to solve the data availability issue for light clients; the difference lies in how to ensure that the erasure coding is correctly encoded. Polygon Avail and Danksharding use KZG commitments, while Celestia employs fraud proofs.
- For Rollup data availability, if we understand DAC as a consortium chain, then what Polygon Avail and Celestia are doing is making the data availability layer more decentralized—essentially providing a "DA-Specific" public chain to enhance the trust level.
- In the next 3 to 5 years, the architecture of blockchain will inevitably evolve from monolithic to modular, with each layer presenting a low-coupling state. We may see the emergence of many modular component providers such as Rollup-as-a-Service (RaaS) and Data Availability-as-a-Service (DAaaS), achieving the composability Lego of blockchain architecture. Modular blockchain is one of the important narratives supporting the next cycle.
- In modular blockchains, the execution layer has already been "divided into four parts," with few newcomers; the consensus layer is fiercely competitive, with Aptos and Sui emerging, and although the competitive landscape of public chains has not yet settled, its narrative is an old wine in a new bottle, making it difficult to find reasonable investment opportunities. The value of the data availability layer still remains to be explored.
Modular Blockchain
Before discussing data availability, let's take a moment to briefly review modular blockchain. (Please send "modular" in the public account to get the correct illustration)
Image source: IOSG Ventures, modified from Peter Watts
There is no strict definition for the layering of modular blockchains; some layering methods start from Ethereum, while others lean towards a more generalized perspective, mainly depending on the context of the discussion.
- Execution Layer: Two things happen at the execution layer. For a single transaction, it executes the transaction and makes state changes; for a batch of transactions, it computes the state root of that batch. Currently, part of the work of Ethereum's execution layer is delegated to Rollups, such as StarkNet, zkSync, Arbitrum, and Optimism.
- Settlement Layer: This can be understood as the process of verifying the validity of the state root of Rollup contracts on the main chain (zkRollup) or fraud proofs (Optimistic Rollup).
- Consensus Layer: Regardless of whether PoW, PoS, or other consensus algorithms are used, the consensus layer is meant to reach an agreement on something in a distributed system, namely the validity of state transitions. In the context of modularity, the meanings of the settlement layer and consensus layer are somewhat similar, so some researchers unify them.
- Historical State Layer: Proposed by Polynya (specifically for Ethereum). After introducing Proto-Danksharding, Ethereum only maintains immediate data availability within a certain time window, after which it performs pruning operations, delegating this work to others. For example, Portal Network or other third parties storing this data can be classified into this layer.
- Data Availability Layer: What problems exist with data availability? What are the corresponding solutions? This is the main issue this article will focus on, and we will not summarize it here.
Image source: IOSG Ventures
Going back to 2018 and 2019, data availability was more about the context of light client nodes; however, under the later Rollup perspective, data availability has another layer of meaning. This article will explain data availability from two different contexts: "nodes" and "Rollups."
DA in Nodes
Image source: https://medium.com/metamask/metamask-labs-presents-mustekala-the-light-client-that-seeds-data-full-nodes-vs-light-clients-3bc785307ef5
First, let's look at the concepts of full nodes and light clients.
Since full nodes download and verify every transaction in each block themselves, therefore, they do not require an honest assumption to ensure that the state is executed correctly, providing a good security guarantee. However, running a full node requires resources for storage, computing power, and bandwidth, which ordinary users or applications have no incentive to do, aside from miners. Moreover, if a node only needs to verify certain information on-chain, running a full node is clearly unnecessary.
This is what light clients do. In IOSG's article "Multi-Chain Ecology: Our Current Stage and Future Landscape," we briefly introduced light clients. Light clients are a term that distinguishes them from full nodes; they often do not interact directly with the chain but rely on nearby full nodes as intermediaries to request the information they need, such as downloading block headers or verifying account balances.
As nodes, light clients can quickly synchronize the entire chain because they only download and verify block headers; in cross-chain bridge models, light clients act as smart contracts—the light client of the target chain only needs to verify whether the tokens from the source chain are locked, without needing to verify all transactions from the source chain.
Where is the problem?
There is an implicit issue: since light clients only download block headers from full nodes, rather than downloading and verifying each transaction themselves, a malicious full node (block producer) can construct a block containing invalid transactions and send it to light clients to deceive them.
We can easily think of using "fraud proofs" to solve this problem: that is, only one honest full node needs to monitor the validity of blocks and construct a fraud proof upon discovering an invalid block, sending it to light clients to alert them. Alternatively, after receiving a block, light clients can actively inquire the entire network for any fraud proofs; if none are received after a certain period, they can assume the block is valid. This way, light clients can achieve almost the same level of security as full nodes (but still rely on honest assumptions).
However, in the above discussion, we actually assumed that block producers will always publish all block data, which is the basic premise for generating fraud proofs. However, malicious block producers may hide some of the data when publishing blocks. At this point, full nodes can download the block and verify it as invalid; but the characteristics of light clients prevent them from doing so. Moreover, due to the lack of data, full nodes cannot generate fraud proofs to warn light clients.
Another scenario is that, due to network reasons, some data may be uploaded later, and we may not even be able to determine whether the data loss is due to objective conditions or intentional actions by the block producer—thus, the reward and punishment mechanism for fraud proofs cannot take effect.
This is the data availability issue we want to discuss in nodes.
Image source: https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding
The above image presents two scenarios: one, a malicious block producer publishes a block with missing data, and the honest full node issues a warning, but then the producer supplements the remaining data; two, an honest block producer publishes a complete block, but a malicious full node issues a false warning. In both cases, others in the network see complete block data after T3, but there are malicious actions involved.
From this perspective, using fraud proofs to ensure the data availability of light clients has vulnerabilities.
Solution
In September 2018, Mustafa AI-Bassam (now CEO of Celestia) and Vitalik proposed using multi-dimensional erasure coding to check data availability—light clients only need to randomly download a portion of the data and verify it to ensure that all data blocks are available and can be reconstructed if necessary.
There is almost no disagreement on using erasure coding to solve the data availability issue for light clients; Reed-Solomon erasure coding is used in Polygon Avail, Celestia (and Ethereum's Danksharding).
The difference lies in how to ensure that the erasure coding is correctly encoded: Polygon Avail and Danksharding use KZG commitments, while Celestia employs fraud proofs. Both have their pros and cons; KZG commitments are not quantum-resistant, while fraud proofs rely on certain honest assumptions and synchronization assumptions.
In addition to KZG commitments, there are also solutions using STARK and FRI to prove the correctness of erasure coding.
(Note: The concepts of erasure coding and KZG commitments are mentioned in IOSG's article "Merging Soon: A Detailed Explanation of Ethereum's Latest Technical Roadmap." Due to space limitations, we will not elaborate on them in this article.)
DA in Rollup
Data availability in Rollups means: in zkRollup, it is necessary to allow anyone to reconstruct the Layer2 state independently to ensure censorship resistance; in Optimistic Rollup, it is necessary to ensure that all data from Layer2 is published, which is a prerequisite for constructing fraud proofs. So where is the problem?
Image source: https://forum.celestia.org/t/ethereum-rollup-call-data-pricing-analysis/141
Let's look at the cost structure of Layer2. Aside from fixed costs, the variable costs related to the number of transactions per batch mainly involve Layer2's gas costs and on-chain data availability expenditures. The former has a negligible impact; the latter requires a constant payment of 16 gas per byte, accounting for 80%-95% of the overall Rollup cost. (On-chain) data availability is expensive; what can be done?
First, reduce the cost of storing data on-chain: this is what the protocol layer does. In IOSG's article "Merging Soon: A Detailed Explanation of Ethereum's Latest Technical Roadmap," we mentioned that Ethereum is considering introducing Proto-Danksharding and Danksharding to provide Rollups with "large blocks," i.e., a larger data availability space, and using erasure coding and KZG commitments to address the resulting node burden. However, from the perspective of Rollups, passively waiting for Ethereum to adapt for them is unrealistic.
Second, put data off-chain. The following diagram lists the current off-chain data availability solutions, with generalized solutions including Celestia and Polygon Avail; in Rollups, user-selectable options include StarkEx, zkPorter, and Arbitrum Nova.
Image source: IOSG Ventures
(Note: Validium originally referred specifically to the scaling solution combining zkRollup with off-chain data availability; for convenience, this article uses Validium to refer to off-chain data availability solutions and compares them together.)
Next, let's take a closer look at these solutions.
DA Provided by Rollup
In the simplest Validium solution, a centralized data operator is responsible for ensuring data availability, and users need to trust that the operator will not act maliciously. The advantage of this is low cost, but in reality, there is almost no security guarantee.
Thus, StarkEx further proposed a Validium solution maintained by a Data Availability Committee (DAC) in 2020. Members of the DAC are well-known individuals or organizations within legal jurisdictions, and the trust assumption is that they will not collude or act maliciously.
This year, Arbitrum proposed AnyTrust, also using a data committee to ensure data availability and building Arbitrum Nova based on AnyTrust.
zkPorter proposed that Guardians (holders of zkSync Tokens) maintain data availability; they need to stake zkSync Tokens, and if a data availability failure occurs, the staked funds will be forfeited.
All three provide an option called Volition: users can freely choose between on-chain or off-chain data availability based on specific use cases, balancing security and cost.
Image source: https://blog.polygon.technology/from-rollup-to-validium-with-polygon-avail/
General DA Scenarios
The above proposals are based on the idea that since the credibility of ordinary operators is not high enough, a more authoritative committee is introduced to enhance credibility.
Is the security level of a small committee sufficient? The Ethereum community raised the issue of Validium's ransom attack two years ago: if enough private keys of committee members are stolen, making off-chain data availability unavailable, users can be threatened—only by paying enough ransom can they withdraw from Layer2. Given the lessons from the Ronin Bridge and Harmony Horizon Bridge hacks, we cannot ignore this possibility.
Since off-chain data availability committees are not sufficiently secure, what if we introduce blockchain as a trust entity to guarantee off-chain data availability?
If we understand the aforementioned DAC as a consortium chain, then what Polygon Avail and Celestia are doing is making the data availability layer more decentralized—essentially providing a "DA-Specific" public chain, with a series of validation nodes, block producers, and consensus mechanisms to enhance the trust level.
In addition to enhancing security, if the data availability layer itself is a chain, then it can actually provide data availability not limited to a specific Rollup or chain, but as a generalized solution.
Image source: https://blog.celestia.org/celestiums/
Let's take Celestia's application Quantum Gravity Bridge on Ethereum Rollup as an example. The L2 Contract on the Ethereum main chain verifies validity proofs or fraud proofs as usual, with the difference being that data availability is provided by Celestia. There are no smart contracts on the Celestia chain, and it does not compute data, only ensuring data availability.
The L2 Operator publishes transaction data to the Celestia main chain, and Celestia's validators sign the Merkle Root of the DA Attestation and send it to the DA Bridge Contract on the Ethereum main chain for verification and storage.
This effectively uses the Merkle Root of the DA Attestation to prove all data availability, and the DA Bridge Contract on the Ethereum main chain only needs to verify and store this Merkle Root, greatly reducing overhead.
(Note: Other data availability solutions include Adamantium and EigenLayr. In the Adamantium solution, users can choose to host their off-chain data, signing to confirm its availability after each state transition; otherwise, funds will be automatically returned to the main chain to ensure security; or users can freely choose data providers. EigenLayr is a more academic solution proposing Coded Merkle Tree and data availability oracle ACeD. We will not elaborate on these here.)
Summary
Image source: IOSG Ventures, modified from Celestia Blog
After discussing the above solutions one by one, we will make a horizontal comparison from the perspectives of security/degree of decentralization and gas costs. Note that this coordinate chart only represents the author's personal understanding, serving as a rough qualitative division rather than a quantitative comparison.
The lower left corner of Pure Validium has the lowest security/degree of decentralization and gas costs.
The middle part includes the DAC solutions of StarkEx and Arbitrum Nova, the Guardians validator set solution of zkPorter, and the generalized solutions of Celestia and Polygon Avail. The author believes that zkPorter's use of Guardians as a validator set offers slightly higher security/degree of decentralization compared to DAC; while the DA-Specific blockchain solutions are slightly higher compared to a set of validators. At the same time, gas costs also increase accordingly. Of course, this is just a very rough comparison.
The box in the upper right corner contains on-chain data availability solutions, which have the highest security/degree of decentralization and gas costs. From within the box, since the data availability of these three solutions is provided by the Ethereum main chain, they have equivalent security/degree of decentralization. Pure Rollup solutions are clearly lower in gas costs compared to monolithic Ethereum, and with the introduction of Proto-Danksharding and Danksharding, the costs of data availability will further decrease.
Note: The context of "data availability" discussed in this article is mostly under Ethereum; it should be noted that Celestia and Polygon Avail are generalized solutions and are not limited to Ethereum itself.
Finally, we summarize the above solutions in a table.
Image source: IOSG Ventures
Closing Thoughts
- After discussing the above data availability issues, we find that all solutions are essentially balancing trade-offs under the constraints of a trilemma, and the differences between solutions lie in the "granularity" of the trade-offs.
- From the user's perspective, it is reasonable for protocols to provide options for both on-chain and off-chain data availability. Because under different application scenarios or among different user groups, users' sensitivity to security and cost varies.
- The above discussion focuses more on the support of the data availability layer for Ethereum and Rollups. In cross-chain communication, Polkadot's relay chain provides native security guarantees for data availability for other parachains; while Cosmos IBC relies on the light client model, making it crucial to ensure that light clients can verify the data availability of both the source and target chains.
- The benefits of modularity lie in its plug-and-play and flexibility, allowing protocols to adapt as needed: for example, offloading Ethereum's data availability burden while ensuring security and trust levels; or enhancing the security level of light client communication models in a multi-chain ecology, reducing trust assumptions. Data availability can play a role not only in Ethereum but also in multi-chain ecosystems and even more future application scenarios.
We believe that in the next 3 to 5 years, the architecture of blockchain will inevitably evolve from monolithic to modular, with each layer presenting a low-coupling state. We may see the emergence of many modular component providers such as Rollup-as-a-Service (RaaS) and Data Availability-as-a-Service (DAaaS), achieving the composability Lego of blockchain architecture. Modular blockchain is one of the important narratives supporting the next cycle. - Among them, the valuation behemoth of the execution layer (i.e., Rollup) has already been "divided into four parts," with few newcomers; the consensus layer (i.e., various Layer1s) is fiercely competitive, and with public chains like Aptos and Sui starting to emerge, although the competitive landscape of public chains has not yet settled, its narrative is an old wine in a new bottle, making it difficult to find reasonable investment opportunities.
Meanwhile, the value of the data availability layer still remains to be explored.