Understand the B^2 New Version Technology Roadmap in One Article: The Necessity of Off-Chain DA and Verification Layer for Bitcoin
Written by: Faust, Geek Web3
Abstract:
- B\^2 Network has set up a DA layer called B\^2 Hub off the Bitcoin chain, which draws on the ideas of Celestia, introducing data sampling and erasure coding to ensure that new data can be quickly distributed to a large number of external nodes while striving to avoid data withholding. At the same time, the Committer in the B\^2 Hub network uploads the storage index of the DA data and the data hash to the Bitcoin chain for anyone to read;
- To alleviate the pressure on DA layer nodes, historical data in B\^2 Hub will not be permanently stored, so B\^2 is also attempting to build a storage network that incentivizes more nodes to store a more complete historical dataset through a storage incentive mechanism similar to Arweave;
- In terms of state validation, B\^2 adopts a hybrid validation scheme, verifying ZK proofs off-chain and challenging the ZK proof verification traces on-chain through the bitVM approach. As long as one challenger node initiates a challenge after detecting an error, the B\^2 network is secure, which aligns with the trust model of fraud-proof protocols, but since ZK is used, this state validation is actually hybrid.
- According to B\^2 Network's future roadmap, the EVM-compatible B\^2 Hub can serve as an off-chain validation layer and DA layer connecting multiple Bitcoin Layer2s, becoming a Bitcoin off-chain functionality expansion layer similar to BTCKB. Since Bitcoin itself cannot support many scenarios, this method of building off-chain functionality expansion layers will become increasingly normalized in the Layer2 ecosystem.
B\^2 Hub: A Universal DA Layer and Validation Layer Off the Bitcoin Chain
The current Bitcoin ecosystem is a blue ocean where opportunities and scams coexist. This new field, revitalized by the summer of inscriptions, is like a fertile virgin land, permeated with the scent of money. With the emergence of Bitcoin Layer2s like mushrooms after rain this January, this land, once barren, has instantly become a cradle for countless dreamers.
But returning to the most essential question: what is Layer2? People seem to have never reached a consensus. Is it a sidechain? An indexer? Is a bridge chain called Layer2? Can a simple plugin relying on Bitcoin and Ethereum be considered a Layer? These questions are like a set of difficult equations, always lacking a definitive conclusion.
According to the ideas of the Ethereum and Celestia communities, Layer2 is merely a special case of modular blockchains, where there exists a close coupling between the so-called "second layer" and "first layer," allowing the second layer network to inherit the security of Layer1 to a large extent or to some extent. As for the concept of security itself, it can be broken down into several sub-indicators, including: DA, state validation, withdrawal validation, censorship resistance, and anti-reorganization.
Due to various issues inherent in the Bitcoin network, it is naturally unfavorable for supporting a more complete Layer2 network. For instance, in terms of DA, Bitcoin's data throughput is far lower than Ethereum's. Calculating based on its average block time of 10 minutes, Bitcoin's maximum data throughput is only 6.8KB/s, which is about 1/20 of Ethereum's, and such crowded block space naturally leads to high data publishing costs.
(The data publishing cost in a Bitcoin block can even reach $1.13 per byte)
If Layer2 directly publishes new transaction data to Bitcoin blocks, it cannot achieve high throughput or low fees. Therefore, it must either compress the data size as much as possible before uploading it to Bitcoin blocks. Currently, Citrea has adopted this approach, claiming to upload the state change amount (state diff) over a period, which is the result of state changes occurring across multiple accounts, along with the corresponding ZK proof, to the Bitcoin chain.
In this case, anyone can download the state diff and ZKP from the Bitcoin mainnet to verify its validity, but the on-chain data size can be lightweight.
(The white paper from the former Polygon Hermez explains the principles of the above compression scheme)
This approach greatly compresses the data size but ultimately still encounters bottlenecks. For example, suppose tens of thousands of transactions occur within 10 minutes, causing state changes across thousands of accounts; you still need to summarize and upload these account changes to the Bitcoin chain. Although this is much lighter than directly uploading each transaction's data, it still incurs considerable data publishing costs.
Thus, many Bitcoin Layer2s simply do not upload DA data to the Bitcoin mainnet and directly use third-party DA layers like Celestia. B\^2 has adopted another approach, directly building a DA network (data distribution network) off-chain called B\^2 Hub. In B\^2's protocol design, transaction data or important data like state diffs are stored off-chain, only uploading the storage index and data hash (actually the Merkle root, referred to as data hash for convenience) to the Bitcoin mainnet.
These data hashes and storage indices are written to the Bitcoin chain in a manner similar to inscriptions, allowing anyone running a Bitcoin node to download the data hash and storage index locally. Based on the index value, they can retrieve the original data from B\^2's off-chain DA layer or storage layer. By checking the data hash, one can determine whether the data obtained from the off-chain DA layer is correct (whether it corresponds to the data hash on the Bitcoin chain). Through this simple method, Layer2 can avoid over-reliance on the Bitcoin mainnet for DA issues, saving fee costs and achieving high throughput.
Of course, one point that cannot be overlooked is that this off-chain third-party DA platform may engage in data withholding, refusing to allow the outside world to access new data. This scenario has a specific term called "data withholding attack," which can be summarized as an anti-censorship issue in data distribution. Different DA solutions have different ways to address this, but the core principle is to disseminate data as quickly and widely as possible to prevent a small number of privileged nodes from controlling access to the data.
According to the official new roadmap of B\^2 Network, its DA solution draws on Celestia. In the latter's design, third-party data providers continuously supply data to the Celestia network, and Celestia block producers organize these data fragments into a Merkle Tree structure, inserting them into TIA blocks and broadcasting them to validators/full nodes in the network.
Due to the large amount of data, the blocks are quite large, and most people cannot afford to run full nodes, only running light nodes. Light nodes do not synchronize complete blocks but only synchronize a block header, which contains the root of the Merkle Tree.
Light nodes, relying solely on the block header, naturally do not know the full picture of the Merkle Tree, nor do they know what new data exists, making it impossible to verify whether the data is problematic. However, light nodes can request a specific leaf from the full nodes. Full nodes will provide the requested leaf along with the corresponding Merkle Proof to the light nodes, allowing the latter to be assured that this leaf indeed exists in the Merkle Tree of the Celestia block and is not fabricated false data.
(Image source: W3 Hitchhiker)
There are many light nodes in the Celestia network, which can frequently initiate high-frequency data sampling from different full nodes, randomly selecting several data fragments from the Merkle Tree. After obtaining these data fragments, light nodes can also propagate them to other nodes they can connect to, thus quickly distributing data to as many people/devices as possible to achieve efficient data dissemination. As long as enough nodes can quickly access the latest data, people no longer need to trust a small number of data providers, which is actually one of the core purposes of DA/data distribution.
Of course, relying solely on the above-described scheme still presents attack scenarios, as it can only ensure that people can quickly access data during distribution but cannot guarantee that the data's production source is not malicious. For example, Celestia block producers might mix in some garbage data in the block, and even if people obtain all the data fragments in the block, they cannot restore the "complete dataset that should be included." (Note: The term "should be" is very important here.)
Further, the original dataset might contain 100 transactions, and if the data for one transaction is not fully disseminated to the outside world, it only takes hiding 1% of the data fragments for the outside world to be unable to parse the complete dataset. This is precisely the scenario discussed in the earliest data withholding attack issues.
In fact, to understand data availability based on the described scenario, the term availability describes whether the transaction data in the block is complete, usable, and can be directly verified by others, rather than what many people understand as whether the historical data of the blockchain can be read by the outside world. Therefore, both Celestia officials and the founder of L2BEAT have pointed out that data availability should be renamed to data publishing, meaning whether a complete usable transaction dataset has been published in the block.
Celestia introduces two-dimensional erasure coding to address the aforementioned data withholding attacks. As long as 1/4 of the data fragments (erasure coding) in the block are valid, people can restore the corresponding original dataset. Unless the block producer mixes in 3/4 of garbage data fragments in the block, making it impossible for the outside world to restore the original dataset, but in such cases, the amount of garbage data in the block would be too much and would easily be detected by light nodes. Therefore, for block producers, it is better not to act maliciously, as wrongdoing is quickly noticed by countless people.
Through the previously described scheme, it is possible to effectively prevent "data distribution platforms" from engaging in data withholding, and B\^2 Network will take Celestia's data sampling as an important reference in the future, possibly combining cryptographic techniques like KZG commitments to further reduce the costs of data sampling and verification for light nodes. As long as there are enough nodes executing data sampling, the distribution of DA data can become effective and trustless.
Of course, the above scheme only addresses the data withholding issue of the DA platform itself, but in the underlying structure of Layer2, there are other entities capable of initiating data withholding besides the DA platform, including the sequencer. In the workflow of B\^2 Network and most Layer2s, new data is generated by the sequencer, which aggregates and processes transactions sent from the user side, along with the state change results after executing these transactions, packages them into batches, and sends them to the B\^2 Hub nodes acting as the DA layer.
If the batch generated by the sequencer initially has issues, there is still the possibility of data withholding, along with other forms of malicious scenarios. Therefore, after the B\^2 DA network (B\^2 Hub) receives the batch generated by the sequencer, it will first verify the content of the batch and reject it if there are issues. It can be said that B\^2 Hub not only acts as a DA layer similar to Celestia but also serves as an off-chain validation layer, somewhat akin to CKB's role in the RGB++ protocol.
(Incomplete B\^2 Network underlying structure diagram)
According to B\^2 Network's latest technical roadmap, after B\^2 Hub receives and verifies the batch, it will only retain it for a limited time. After this window period, the batch data will be expired and deleted from the B\^2 Hub nodes. To address issues of data expiration and loss similar to EIP-4844, B\^2 Network has set up a group of storage nodes, which will be responsible for permanently storing batch data, allowing anyone to search for the historical data they need at any time in the storage network.
However, no one will run B\^2 storage nodes for free. To encourage more people to run storage nodes and enhance the trustlessness of the network, an incentive mechanism must be provided; to provide an incentive mechanism, anti-cheating measures must be considered first. For example, if you propose an incentive mechanism where anyone who stores data locally on their device can receive rewards, some may download the data and then secretly delete part of it while claiming their stored data is complete, which is the most common cheating method.
Filecoin uses proof protocols called PoRep and PoSt to allow storage nodes to present storage proofs to the outside world, proving that they have indeed preserved the data completely over a given period. However, this storage proof scheme requires generating ZK proofs and has high computational complexity, placing high demands on the hardware of storage nodes, which may not be an economically viable method.
In B\^2 Network's new technical roadmap, storage nodes will adopt a mechanism similar to Arweave, competing for block rights to obtain token incentives. If a storage node privately deletes some data, its probability of becoming the next block producer will decrease, while nodes that retain the most data will have a higher chance of successfully producing blocks and obtaining more rewards. Therefore, for most storage nodes, it is better to retain a complete historical dataset.
Of course, the incentives are not only for storage nodes but also for the previously mentioned B\^2 Hub nodes. According to the roadmap, B\^2 Hub will form a permissionless POS network, where anyone who stakes enough tokens can become a member of the B\^2 Hub or storage network. Through this method, B\^2 Network attempts to create a decentralized DA platform and storage platform off-chain and integrate Bitcoin Layer2s beyond B\^2 in the future, building a universal off-chain DA layer and data storage layer for Bitcoin.
ZK and Fraud Proof Hybrid State Validation Scheme
Having discussed B\^2 Network's DA solution, we will now focus on its state validation scheme. The so-called state validation scheme refers to how Layer2 ensures that its state transitions are sufficiently "trustless."
(The five major security indicators evaluated by the L2BEAT website for Scroll, where State Validation refers to the state validation scheme)
As mentioned earlier, in the workflow of B\^2 Network and most Layer2s, new data is generated by the sequencer, which aggregates and processes transactions sent from the user side, along with the state change results after executing these transactions, packaging them into batches and sending them to other nodes in the Layer2 network, including regular Layer2 full nodes and B\^2 Hub nodes.
After receiving the batch data, B\^2 Hub nodes will parse its content and perform validation, which includes the previously mentioned "state validation." In fact, state validation is about verifying whether the "state changes after transaction execution" recorded in the batch generated by the sequencer are correct. If B\^2 Hub nodes receive a batch containing erroneous states, they will reject it.
In essence, B\^2 Hub is a POS public chain, with a distinction between block producers and validators. Every so often, the block producers of B\^2 Hub will generate new blocks and propagate them to other nodes (validators), which contain the batch data submitted by the sequencer. The remaining workflow is somewhat similar to what was mentioned with Celestia, where many external nodes frequently request data fragments from B\^2 Hub nodes. In this process, batch data will be distributed to many node devices, including the previously mentioned storage network.
Within B\^2 Hub, there exists a role called Committer, which can be rotated, and it will submit the data hash (actually the Merkle root) of the batch, along with the storage index, to the Bitcoin chain in the form of inscriptions. As long as you read this data hash and storage index, you can retrieve the complete data from the off-chain DA layer/storage layer. Assuming there are N nodes off-chain storing the batch data, as long as one of them is willing to provide data externally, anyone can access the data they need, with the trust assumption being 1/N.
Of course, it is not difficult to see that in the above process, B\^2 Hub, which is responsible for verifying the validity of Layer2 state transitions, is independent of the Bitcoin mainnet and is merely an off-chain validation layer. Therefore, at this point, the state validation scheme of Layer2 cannot be equated with the reliability of the Bitcoin mainnet.
Generally speaking, ZK Rollup can fully inherit the security of Layer1, but currently, the Bitcoin chain only supports some very simple computations and cannot directly verify ZK proofs. Therefore, no Layer2 can equate its security model with that of Ethereum's ZK Rollup, including Citrea and BOB.
Currently, the "more feasible" approach is as described in the BitVM white paper, where complex computations are moved off the Bitcoin chain, and only certain simple computations are performed on-chain when necessary. For example, the computational traces generated during ZK proof verification can be made public for external inspection. If people find an issue with a particular subtle computational step, they can verify this "disputed computation" on the Bitcoin chain. This requires using Bitcoin's scripting language to simulate the functions of special virtual machines like EVM, which may involve a significant engineering effort but is not unfeasible.
In B\^2 Network's technical scheme, after the sequencer generates a new batch, it forwards it to the aggregator and prover, the latter of which ZK-ifies the batch data verification process, generating ZK proofs that are ultimately forwarded to B\^2 Hub nodes. B\^2 Hub nodes are EVM-compatible and verify ZK proofs through Solidity contracts, involving all computational processes being broken down into very low-level logic gate circuit forms, which are then expressed in the form of Bitcoin scripting language and submitted to a sufficiently high-throughput third-party DA platform.
If people have doubts about the disclosed ZK verification traces and believe that a particular small step is erroneous, they can "challenge" on the Bitcoin chain, requesting Bitcoin nodes to directly check this problematic step and impose appropriate penalties.
(Overall structure diagram of B\^2 Network, excluding data sampling nodes)
So who gets punished? In fact, it is the Committer. In B\^2 Network's setup, the Committer not only publishes the aforementioned data hash to the Bitcoin chain but also publishes the "commitment" of ZK proof verification to the Bitcoin mainnet. Through certain settings of Bitcoin Taproot, you can challenge and question the "ZK proof verification commitment" published by the Committer at any time on the Bitcoin chain.
Let me explain what "commitment" means. The meaning of "commitment" lies in the fact that certain individuals claim that certain off-chain data is accurate and publish corresponding statements on-chain; this statement is the "commitment," which binds the commitment value to specific off-chain data. In B\^2's scheme, if someone believes there is an issue with the ZK verification commitment published by the Committer, they can challenge it.
Some may ask, didn't we mention earlier that B\^2 Hub directly verifies the validity of the batch upon receipt? Why is there a need to "double-check" the ZK proof? Why not directly disclose the verification process of the batch and allow people to challenge it? Why introduce ZK proofs? This is actually to compress the computational traces sufficiently small. If the entire verification process of Layer2 transactions and the computation flow that generates state changes were disclosed in the form of logic gates and Bitcoin scripts, it would result in a massive data size. However, after ZK-ifying, the data size can be compressed to a great extent before being published.
Here is a rough summary of B\^2's workflow:
- The sequencer of B\^2 is responsible for generating new Layer2 blocks and aggregating multiple blocks into a data batch. The data batch is sent to the aggregator and validator nodes within the B\^2 Hub network.
- The aggregator sends the data batch to the prover nodes, which generate the corresponding zero-knowledge proof. The ZK proof is then sent to B\^2's DA and validator network (B\^2 Hub).
- B\^2 Hub nodes verify whether the ZK proof sent by the aggregator corresponds to the batch sent by the sequencer. If they correspond, the verification passes. The data hash and storage index of the verified batch will be sent to the Bitcoin chain by a designated B\^2 Hub node (called the Committer).
- The B\^2 Hub node will publicly disclose the entire computational process of verifying the ZK proof, sending the commitment of the computation process to the Bitcoin chain, allowing anyone to challenge it. If the challenge is successful, the B\^2 Hub node that published the commitment will face economic penalties (its UTXO on the Bitcoin chain will be unlocked and transferred to the challenger).
B\^2 Network's state validation scheme introduces both ZK and fraud proofs, which actually constitutes a hybrid state validation method. As long as there is at least one honest node off-chain willing to initiate a challenge after detecting an error, it can ensure that the state transitions of B\^2 Network are sound.
According to members of the Western Bitcoin community, the Bitcoin mainnet may undergo appropriate forks in the future to support more computational functions. Perhaps in the future, directly verifying ZK proofs on the Bitcoin chain will become a reality, bringing a new paradigm shift to the entire Bitcoin Layer2. As a universal DA layer and validation layer, B\^2 Hub can not only serve as a dedicated module for B\^2 Network but also empower other Bitcoin Layer2s. In the great contest of Bitcoin Layer2, off-chain functionality expansion layers will become increasingly important, and the emergence of B\^2 Hub and BTCKB may just be the tip of the iceberg for these functionality expansion layers.