A Comprehensive Understanding of Blockchain Scalability Solutions in One Article
Author: Chasey, Buidler DAO
Every blockchain faces the impossible triangle composed of decentralization, security, and scalability. Among these, decentralization is the greatest advantage of blockchain technology and must be prioritized; however, if a long-lasting and sustainable ecosystem is to be established, security must also be taken to the extreme. This has led to the current situation where public chains generally have low scalability.
Image source: This looks like it was drawn by oneself How to improve the throughput of blockchain = scalability while balancing decentralization and security is an urgent problem that needs to be solved. In recent years, ETH2.0, as Ethereum's vision for scalability, has remained a focus of attention and high expectations worldwide, despite experiencing multiple delays. This also indicates that scalability has become a collective demand among public chain users, and throughput is one of the essential indicators when analyzing and valuing a blockchain. This article aims to provide a comprehensive overview of current blockchain scalability solutions, helping readers better understand the foundational concepts of these solutions.
Why Scalability is Needed
Before discussing specific scalability solutions, let's first clarify the role and necessity of scalability.
Nodes on the blockchain are divided into full nodes and light nodes. To ensure the integrity and security of transaction data, full nodes need to store the entire blockchain's transaction data; light nodes only need to store the Block Header and verify transactions by requesting the corresponding Body from full nodes. The more nodes there are, the stronger the decentralization of the chain, but the workload required to reach consensus also increases, negatively impacting throughput. Additionally, as shown in the figure, Bitcoin's block size limit is 1MB, while Ethereum has also set a Gas Limit (to prevent DDoS attacks), which restricts the block size to around 130KB.
Image source: Blockchair Due to the limited block size, miners cannot pack all transactions into the same block, so they tend to prioritize transactions based on expected returns (Gas Price) and selectively package them from high to low price to ensure maximum profit. This leads to long delays for transactions with low Gas Prices. As shown in the figure below, approximately 170,000 transactions are waiting for verification on Ethereum every minute.
Image source: Etherscan Currently, Bitcoin's throughput is as low as 7 TPS (Transactions per second), while Ethereum's throughput is limited to 15~20 TPS. For easier understanding, let's compare this with traditional online transaction methods: PayPal processes transactions at around 200 TPS, while VISA is about 1700 TPS, showing a significant gap.
Moreover, the continuously increasing transaction data puts pressure on the storage capacity required to maintain the blockchain. Currently, Bitcoin's storage has exceeded 400GB, with a year-on-year increase of 17.4%; Ethereum is nearly 900GB, with an average growth rate of 64.30%.
Image source: Blockchair As shown in the figure, the number of transactions occurring on Ethereum exceeds 1,250,000 daily, and with the gradual popularization of public chain ecosystems, this number will continue to grow, increasing the pressure on throughput, making scalability urgent.
Image source: YChart Having understood the importance of scalability, it's time to explore what methods can be employed to achieve it.
Classification of Scalability Solutions
The following diagram is an illustration from the Handbook of Research on Blockchain Technology (2020). In this article, we will focus on the "Write Performance" section of the diagram, explaining the current scalability solutions from both on-chain and off-chain perspectives.
Image source: Handbook of Research on Blockchain Technology (2020) On-Chain Scalability Solutions On-chain scalability solutions refer to schemes that achieve scalability by modifying the design of the original chain. Blockchain technology can be broken down into six major structural layers: consensus layer, network layer, data layer, incentive layer, contract layer, and application layer. The first three are the foundational layers of blockchain and are the targets for on-chain scalability solutions.
1. Consensus Layer = BFT; Satoshi; Hybrid
The consensus mechanism refers to the process by which nodes in the blockchain reach consensus on the availability of data and the consistency of the ledger state. Since the consensus mechanism completely determines the entire process from downloading data to packaging blocks, the efficiency of nodes in verifying transactions heavily relies on the design of the consensus mechanism. Current mainstream consensus mechanisms can be divided into BFT-type consensus, Satoshi consensus, and hybrid consensus.
BFT-type Consensus
When discussing BFT computation (Byzantine Fault-Tolerant), we must first mention the well-known Byzantine Generals Problem: The Byzantine Empire (Eastern Roman Empire) aimed to expand its territory. In a war, they attempted to send 10 armies to surround the enemy, which could withstand at most 5 Byzantine armies. Due to the distance between the armies, the generals needed to reach a consensus on action by sending attack/retreat signals to each other (including themselves, if they received 6 or more attack signals, they would attack; otherwise, they would retreat). The biggest problem faced by the generals was: What if a traitor appeared in one of the armies and deliberately sent false signals? In the blockchain, this problem is similar: how do nodes in the blockchain reach consensus by sending information to each other if there are traitor nodes (Byzantine nodes) in the network that send incorrect information?
The most famous BFT-type consensus is PBFT, which is simply explained in the references[17], so we will not elaborate further here. Using this type of consensus requires ensuring that all functioning nodes use the same random number and block algorithm for computation and generate blocks. When the original ledgers are the same, the computation results are the same, and the generated ledger is immutable and permanently public. Since each node needs to synchronize consensus with all other nodes, when the number of nodes is small, it can achieve extremely high throughput while ensuring security, but as the number of nodes increases, the amount of data that needs to be processed also increases, leading to a significant decrease in transaction processing speed.
Satoshi Consensus
Satoshi consensus mainly includes Proof of Work (PoW) and Proof of Stake (PoS). In this section, we will also discuss the variant DPoS of PoS, exploring the performance of various consensus mechanisms in terms of throughput.
PoW: Computing power determines the right to record (and also the voting power), and there is no need to set up mechanisms to authorize nodes. Its main problem regarding throughput is the slow block generation due to high difficulty; additionally, to ensure ledger consistency, intentional packaging delays must be set. The delay here refers to the fact that after miners package a block, they need to perform at least one more block's worth of proof of work to confirm the candidate block.
PoS: Holding coins grants the right to record and voting rights (separately). After packaging candidate blocks, packaging nodes broadcast them, and voting nodes vote on whether to add the candidate block to the blockchain, using a majority rule. Compared to PoW, PoS sacrifices some security due to the introduction of a voting mechanism; however, in terms of throughput, it has low latency due to fast packaging speed and no waiting time.
DPoS: Holding coins grants voting rights, and a board of directors is elected to be responsible for recording. It sacrifices some degree of decentralization compared to PoS while achieving higher throughput.
Hybrid Consensus
As the name suggests, hybrid consensus refers to a consensus that combines the advantages of different consensus mechanisms. For example, using PoW on the main chain to ensure security while using PoS on the side chain to guarantee throughput; combining PoS and PBFT to reduce the number of nodes to a constant value, thereby further enhancing throughput, etc.
2. Data Layer = Increase Block Size; Reduce Data; DAG
In addition to the consensus mechanism, the number of transactions that can be packed into each block is also closely related to transaction throughput. We can enhance blockchain throughput by increasing block capacity, reducing transaction data, or directly using a DAG data structure to handle transactions.
Relaxing/Removing Block Size Limits
Increasing block capacity allows more transaction data to be packed into each block, but it also increases the time for block broadcasting, increasing network latency and thus raising the risk of hard forks.
Reducing Data Stored in Blocks
A well-known solution in this category is Segwit (Segregated Witness): managing the signature part of block information and the data used to calculate transaction IDs separately, thereby compressing 60% of transaction information. This is an effective auxiliary solution to alleviate capacity issues but does not solve the fundamental problem.
DAG (Directed Acyclic Graph)
As shown in the figure below, the blockchain adopts a chain structure, where the block header can only contain the hash of one block; under the DAG structure, the block header can contain the hashes of multiple blocks; new blocks in the blockchain are added to the end of the chain and cannot be continued from the middle; DAG can continue from previous blocks. Image source: Russian Blogs The blockchain synchronizes accounting, requiring nodes to record the same information at the same time; DAG allows asynchronous accounting, where different nodes can record different information at the same time. Therefore, DAG can package more transactions per unit of time, achieving extremely high TPS. The main protocols based on DAG currently include SPECTRE and PHANTOM.
SPECTRE Protocol Resists Attacks Through Voting
As shown in the figure below, when the contents recorded in block X and block Y conflict, the blocks 6~8 recorded after block X will contain the same information as block X; blocks 9~11 will record the same information as block Y; block 12 can trace back to both block X and block Y, so it will record the same result as the previous round of voting (dashed line), which is X; blocks 1~5 will vote based on the voting situation of the blocks recording their own information, and since there are more blocks recording X, 1~5 choose to vote for block X.
Since malicious blocks will not associate with honest blocks until the attack begins, when using the SPECTRE protocol, as long as there are more honest nodes, conflicting transactions can be excluded. Its problem is that it is only applicable to general transactions, as it cannot linearly order all transactions by time, thus cannot run smart contracts. The PHANTOM protocol can solve this problem.
Image source: An overview of SPECTRE PHANTOM Protocol First Filters Honest Blocks Through Voting and Then Performs Topological Sorting
Before understanding its filtering method, we need to first understand the forking coefficient k = the number of possible forks (for example, in a blockchain where forks are not allowed, k is 0) and the GHOSTDAG algorithm = by tracing historical blocks, selecting the longest chain as the main chain, forming a subset S, assuming that all blocks in this subset are honest blocks. Subsequently, for each block, it verifies whether the intersection of blocks that have no connection with it and subset S is less than or equal to k; if it is less than or equal to k, it is deemed an honest block and added to subset S. Image source: An overview of PHANTOM The following example (k=3) illustrates: Suppose we need to determine the authenticity of block I, the blocks derived from block I are K, M, O, P, R, and the blocks that can be traced back from block I are C, D, and the original block. At this point, the blocks that are completely unrelated to block I are B, E, F, H, J, L, N, Q, S, T, U, among which the blocks B, F, J intersect with subset S, equaling k, thus block I is judged to be an honest block.
Image source: PHANTOM: A Scalable BlockDAG Protocol The sorting method uses topological sorting: first, blocks with no traceable blocks are treated as the original block 0, then among the remaining blocks, continue selecting blocks with no traceable blocks as block 1, and so on.
Image source: Kappo's Blog Currently, projects using the DAG structure are relatively centralized, so we will not delve deeper into this. Those interested in DAG can search for DAGLabs to learn more. 3. Network Layer = Sharding
Sharding refers to splitting the ledger into several parts, each managed by different groups of nodes. By implementing state sharding, the amount of transaction data each node needs to process decreases, which not only improves transaction processing speed but also reduces the performance requirements for nodes, lowering the barriers to participation in mining and enhancing decentralization.
Image source: Why sharding is great: demystifying the technical properties
Sharding 1.0
The initial idea for state sharding was to add n=64 data shards (blobs) to the beacon chain, where n verification nodes broadcast their data shards blob in each epoch, and a committee confirms the authenticity and availability of the data. After confirmation, the blob is added to the execution chain. The mechanism of reassigning validators corresponding to each shard chain after each epoch leads to issues with timely data synchronization after shard chain switching, causing delays. Additionally, this approach faces four problems: it cannot guarantee that all transaction data required by each shard is written into every block on the beacon chain; it cannot perform a global check on all shards; validating nodes may lead to liveness failure; combined with PoS, as long as there is enough money and enough controlled nodes, it becomes easier to control the committee. In the context of ETH issuance rate decreasing and validator centralization, this mechanism provides opportunities for MEV.
DankSharding
To mitigate the risks in Sharding 1.0, DankSharding proposed two key points: all blobs will be added to the beacon block; each committee member only processes a subset of the shard data, allowing all beacon data and shard data to be checked together.
Specifically, the core mechanism of Danksharding is divided into three parts:
Data Availability Sampling: Encoders use RS encoding for redundant transmission to reduce the verification pressure on nodes, while using KZG polynomial commitments to ensure encoding correctness. On top of this, by further sharding the data blocks and reorganizing between different data block shards, the RS encoding can be extended in two dimensions, lowering the threshold for full node data reconstruction, thus reducing centralization;
Block Producer-Packer Separation: Full nodes are divided into two roles: block producers and packers. Low-configuration block producers are responsible for decentralized selection of packers and obtaining bids from packers, while high-configuration, high-performance packers obtain packing rights through bidding, thus solving the value distribution problem of MEV;
Anti-Censorship List: Block producers specify a list of legitimate transactions, and packers prove they have seen the list and include the transactions from the list in their packing, preventing packers from deliberately ignoring legitimate transactions.
Proto-Danksharding/EIP-4844
The mechanism of DankSharding is relatively difficult to implement, so EIP-4844 emerged as a phased solution. EIP-4844 introduces time-sensitive blobs, similar to external hard drives, which exist for a limited time after being written to the mainnet and are then destroyed. In the design of EIP-4844, KZG polynomial commitments are also introduced to ensure forward compatibility when DankSharding is implemented later.
Off-Chain Scalability Solutions
Off-chain scalability solutions refer to processing transactions outside the mainnet to alleviate the processing pressure on the mainnet without changing the original chain structure. The main types include state channels, off-chain computation, and multi-chain solutions, where sidechains and child chains are categorized under multi-chain solutions for convenience.
- State Channels
State channels refer to locking a part of the blockchain state between specific participants (opening a channel) through multi-signature and other means -> updating the state off-chain with the consent of all participants for state transitions occurring in the channel -> confirming the final state and broadcasting it to the main chain. Since only the final state needs to be broadcast, using state channels to handle trivial mutual transactions can effectively reduce the number of transactions broadcast on the main chain, lowering transaction latency. Additionally, each participant can interact with other participants without opening a new channel through intermediaries: If Alice and Bob have a channel, and Bob and Carol have a channel, then Alice can interact with Carol through Bob without opening a new channel. State channels have low transparency and are usually only suitable for frequent transactions occurring between specific participants. Image source: EthHub 2. Off-Chain Computation
Off-chain computation aims to enhance on-chain throughput by moving all functions outside of verification to off-chain. It primarily needs to ensure security and privacy, and specific operational methods can be divided into verifiable off-chain computation, "enclave" type off-chain computation, off-chain secure multi-party computation, and incentive-driven off-chain computation.
Image source: Blockchain-Based Reputation Systems: Implementation Challenges and Mitigation
Verifiable Off-Chain Computation: zk-SNARKs, Bulletproofs, zk-STARKs
Off-chain provers upload the results of off-chain computations to the chain, which are then verified by on-chain validators.
"Enclave" Type Off-Chain Computation: Enigma, Ekiden
Creating a Trusted Execution Environment (TEE) on blockchain nodes and pre-setting data interfaces for computation. TEE acts as a black box, allowing plaintext data operations while effectively protecting data privacy, thus improving computational efficiency.
Off-Chain Secure Multi-Party Computation
Splitting data and distributing it to various nodes, where nodes calculate state changes based on the current blockchain state and the data they receive, combining the data computed by each node to obtain the complete computed data. Nodes need to compute less data, resulting in higher efficiency.
Incentive-Driven Off-Chain Computation
Solvers compute transaction data and stake collateral while publishing results; validators verify the results of the solvers, and if errors are found, they can stake collateral and initiate on-chain arbitration, with the correct party receiving the transaction fees paid by users.
3. External Chains = Sidechains; Rollups External chains refer to creating a new blockchain outside the main chain, transferring part of the transaction processing (such as computation and storage) to the new blockchain through cross-chain execution, and broadcasting the results to the main chain to enhance the processing efficiency of the main chain.
Sidechains
Sidechains are completely independent blockchains that project assets from the main chain to the sidechain using locking + minting/destroying methods, completing the entire process of transaction processing and storage entirely on the sidechain. The security of sidechains relies entirely on their own nodes and consensus mechanisms. They can be divided into anchored sidechains and federated sidechains (adding multi-signature addresses between the main chain and sidechain to verify transactions and reduce latency).
Image source: EthHub
Rollups
The difference between Rollups and sidechains is that transactions are only processed on the child chain, while data is still stored on the main chain, thus enjoying the data security of the mainnet while improving transaction processing efficiency. Rollups can be divided into four types based on data availability and transaction verification methods:
Fraud Proof (reporting invalid transactions) + Off-Chain DA (security -, scalability +) = Plasma
Plasma requires creating a smart contract on the main chain that writes the state transition rules of the child chain to connect the main chain and child chain (and grandchild chains, etc.). Its scalability mechanism is similar to state channels, improving throughput by reducing the transactions that the main chain needs to process and store. However, it establishes a new blockchain with an independent consensus mechanism that is not a specific channel between participants but a shared one. The child chain needs to periodically submit state updates to the smart contract on the main chain, and after successfully passing a 7-day challenge period (optimistic proof, to ensure transaction security), the block state is written to the main chain. It combines the advantages of state channels and sidechains: in state channel mode, if new participants need to be added, a new channel must be opened on-chain, while in Plasma, this is not necessary; state channels require the consent of all participants when synchronizing state to the main chain, while Plasma does not; state channels only retain the final state, while Plasma has a complete record of state transitions on the child chain; state transitions on the sidechain directly affect the state on the main chain, so its asset security depends on the sidechain itself, while Plasma's security relies on the main chain.
The main issues faced by the Plasma mechanism are: child chain nodes need to retain a large amount of transaction data on the child chain; nodes must be online. Image source: Plasma: An Innovative Framework to Scale Ethereum Validity Proof (inferring transaction validity) + On-Chain DA (security +, scalability -) = ZK-Rollup
zkRollup can effectively improve the issues faced by Plasma. Since both zkRollup and Validium use zero-knowledge proofs as their verification mechanism, we will briefly describe the concept of zero-knowledge proofs: Zero-knowledge proof refers to proving a proposition true to someone without providing any additional information other than the truth of the proposition. For example, A uses zero-knowledge proof to prove to B that he is an adult, and at this time, B will not know A's birthday, only that A is an adult. In zkRollup, zkSNARK is used to verify a large number of transactions, and after confirming the validity of these transactions, only a zero-knowledge proof that proves all these transactions are valid needs to be uploaded to the mainnet. This significantly compresses the data volume, allowing transaction data to be written to the mainnet.
zkRollup also faces some issues: zero-knowledge proof computation is difficult; initial trust is required; and it has poor universality.
Validity Proof (inferring transaction validity) + Off-Chain DA (security -, scalability +) = Validium
Validium also uses zero-knowledge proofs to ensure the validity of its transaction information and the availability of off-chain data. Its only difference from zkRollup is that data availability is placed off-chain, which gives Validium higher throughput, but the consequence is that the manager of data availability can slightly modify the Merklized state, preventing users from transferring funds. As shown in the figure below, if d3 is modified, the owner of d1 will not be able to obtain information from node m, which is necessary to prove account ownership. After zkRollup emerged, Validium has basically lost its competitiveness. Image source: Validium And The Layer 2 Fraud Proof (reporting invalid transactions) + On-Chain DA (security +, scalability -) = Optimistic Rollup
Optimistic Rollup is an upgraded version of Plasma and can also solve the universality problem of zkRollup. It assumes that the transaction information submitted by nodes is correct, and after submitting transaction information to the mainnet, there is a 7-day challenge period for others to check the correctness of the transactions. Its difference from Plasma is that it writes transaction data on-chain; its difference from zkRollup is that it does not use zero-knowledge proofs. By balancing the other two types of Rollup, Optimistic Rollup also sacrifices some throughput. Image source: ethereum.org
Summary and Future Outlook
Due to the existence of the impossible triangle, there will inevitably be no perfect scalability solution, and the costs of each solution need to be weighed. Personally, I believe that on-chain scalability tends to be more challenging to achieve due to the higher costs involved (including both hard forks and technical difficulties), and under normal circumstances, off-chain scalability solutions will be the main focus.
Among the current off-chain scalability solutions, Rollups maintain an advantage in security. However, although Rollups can significantly release the throughput of L1, the speed at which the data packaged by L2 is returned to L1 is still limited by Ethereum's block size. Currently, a block on Ethereum stores around 100KB, which means that less than 500KB of Rollups packaged data can be processed within a minute. Additionally, the mainnet needs to handle the original transaction information on L1, which collectively leads to a bottleneck in the throughput of off-chain scalability solutions.
Fortunately, different solutions can be combined: in on-chain solutions, the concept of DankSharding can significantly reduce the workload and data download requirements for verification nodes through sampling verification and RS encoding, addressing transaction processing speed issues, and is expected to complement Rollup after implementation. If we consider the implementation of sharding as a major premise, among the current Rollups, the cost of Optimistic Rollup mainly lies in the cost of writing to the mainnet, making it most compatible with ProtoDanksharding, which can reduce that cost. Therefore, I personally believe that in the near future (if EIP-4844 is successfully implemented), the combination of off-chain Optimistic Rollup and on-chain ProtoDankSharding will be the best choice.