1kx: The ultimate goal of blockchain scalability is trust minimization and horizontal scaling
Written by: weidai.eth
Compiled by: Luffy, Foresight News
Ethereum is a permissionless world computer that, at the time of writing, possesses the highest economic security and serves as a settlement ledger for a vast array of assets, applications, and services. Ethereum also has its limitations: block space is a scarce and expensive resource on the Ethereum mainnet. L2 scaling is considered the best solution to this problem, and in recent years, many projects have entered this space, most of which are Rollups. However, the strict definition of Rollup (where data resides on Ethereum L1) does not achieve infinite scalability for Ethereum, with a ceiling of processing a few thousand transactions per second.
First, let’s understand two concepts:
Trust Minimization: If the functionality of an L2 system does not require trusting parts outside of the underlying L1, then that system (or a function of it) is trust minimized.
Horizontal Scalability: If instances can be added without causing a global bottleneck, then the system is horizontally scalable.
In this article, we argue that trust minimized and horizontally scalable systems are the most promising way to scale blockchain applications, but this direction has not yet been fully explored. We arrive at our argument by exploring three questions:
Why should applications be trust minimized?
Why build horizontally scalable systems?
How can we enhance trust minimization and horizontal scalability?
Disclaimer: While this article focuses on Ethereum as the underlying L1, most of what we discuss here also applies to other decentralized settlement layers outside of Ethereum.
Why should applications be trust minimized?
Applications can connect to Ethereum in a trusted manner, allowing them to write to and read from the Ethereum blockchain, but the trust requirement lies in whether the business logic will be executed correctly. Centralized exchanges like Binance and Coinbase are excellent examples of trusted applications. Connecting to Ethereum means that applications can leverage a global settlement network with multiple assets.
Trusted off-chain services pose significant risks. The collapse of major exchanges and service providers in 2022 (such as FTX and Celsius) serves as a cautionary tale of what can happen when trusted services behave improperly or fail.
On the other hand, trust minimized applications can write to and read from Ethereum in a verifiable manner, such as smart contract applications like Uniswap, Rollups like Arbitrum or zkSync, and co-processors like Lagrange and Axiom. Broadly speaking, as applications are protected by the Ethereum network and more functionality is allocated to L1, trust is eliminated. Therefore, trust minimized financial services can be provided without counterparty or custodian risk.
By outsourcing functionality to L1, applications and services can gain three key properties:
Liveness (and Ordering): Transactions submitted by users should be included in a timely manner (executed and settled).
Validity: Transactions are processed according to pre-specified rules.
Data (and State) Availability: Users can access historical data as well as the current application state.
For each of the above properties, we can think about what trust assumptions are needed; in particular, whether Ethereum L1 provides that property or whether external trust is required. The table below categorizes different architectural examples.
Why build horizontally scalable systems?
Horizontal scalability refers to scaling by adding independent or parallel instances to the system, such as applications or Rollups. This requires that the system does not have a global bottleneck. Horizontal scalability can achieve exponential growth in system throughput.
Vertical scalability refers to scaling by increasing the throughput of the overall system (such as Ethereum L1 or data availability layers). When horizontal scalability encounters bottlenecks on shared resources, vertical scalability is often required.
Disclaimer 1: Rollups cannot horizontally scale because they may encounter data availability (DA) bottlenecks. Vertical scaling DA solutions require compromises in decentralization.
Data availability (DA) remains a bottleneck for Rollups. Currently, the maximum capacity target for each L1 block is 1 MB (85 KB/s). In the long term, EIP-4844 will provide an additional approximately 2 MB (171 KB/s) of available space. Through Danksharding, Ethereum L1 may ultimately support DA bandwidth of up to 1.3 MB/s. Ethereum L1 DA is a resource that many applications and services compete for. Therefore, while using L1 as DA provides the best security, it can become a bottleneck for the potential scalability of the system. Systems that use L1 as DA (typically) cannot horizontally scale and suffer from economies of scale. Alternative DA layers, such as Celestia or EigenDA, also have bandwidth limitations (though larger, at 6.67 MB/s and 15 MB/s, respectively). However, this comes at the cost of shifting trust assumptions from Ethereum to another (often less decentralized) network, compromising security.
Disclaimer 2: The only way to horizontally scale trust minimized services is to achieve (close to) zero marginal L1 data per transaction. Two known methods are State Difference Rollup (SDR) and Validiums.
State Difference Rollup (SDR) publishes state differences from a batch of aggregated transactions to Ethereum L1. For EVM, as transaction batches grow larger, the data published to L1 per transaction approaches a constant, far smaller than the transaction data of Rollups.
For example, during a stress test event with a surge of inscriptions, zkSync found that the calldata per transaction was reduced to 10 bytes. In contrast, for normal traffic, Rollups like Arbitrum, Optimism, and Polygon zkEVM have about 100 bytes of calldata per transaction.
Validium is a system that publishes validity proofs of state transitions to Ethereum without associated transaction data or state. Even under low traffic conditions, Validium has high horizontal scalability. Moreover, different Validiums can share a settlement layer.
In addition to horizontal scalability, Validium can also provide on-chain privacy (from public observers). Validium with privacy DA has centralized and gated data and state availability, meaning users must authenticate before accessing data, and operators can implement good privacy measures. This achieves a user experience similar to traditional networks or financial services: user activities are not subject to public scrutiny, but there is a trusted user data custodian, in this case, the Validium operator.
What about centralized orderers versus decentralized orderers? To maintain the horizontal scalability of the system, independent orderers (whether centralized or decentralized) are crucial. Notably, while systems using shared orderers have atomic composability, they cannot horizontally scale because as more systems are added, the orderer may become a bottleneck.
What about interoperability? If horizontally scalable systems are all on the same L1 settlement, they can achieve interoperability without additional trust, as messages can be sent from one system to another through the shared settlement layer. There is a trade-off between operational costs and message latency (which can be addressed at the application layer).
Trust Minimization in Horizontally Scalable Systems
Can we further minimize the trust requirements for liveness, orderers, and data availability in horizontally scalable systems?
Notably, at the cost of horizontal scalability, we know how to salvage trustless liveness and data availability. For example, L2 transactions can be initiated from L1 to ensure inclusion. Volition can provide users with the option to join L1 state availability.
Another solution is simply to decentralize (but not rely on L1). By using decentralized orderers (such as Espresso Systems or Astria) instead of a single orderer, the system can become more decentralized, thereby minimizing the trust required for liveness, ordering, and data availability. However, this approach has limitations compared to a single operator solution: (1) performance may be limited by the performance of the distributed system, (2) for Validiums with privacy DA, if the decentralized orderer network is permissionless, the default privacy protection may be lost.
How much can we reduce trust dependencies for single operator Validiums or SDRs? Here are a few directions.
Direction 1: Trust minimized data availability in Validiums. Plasma partially addresses the state availability issue, either through withdrawal issues for certain specific state models (including UTXO state models) or requiring users to be online regularly (Plasma Free).
Direction 2: Responsible pre-confirmation in SDRs and Validiums. The goal here is to provide users with quick pre-confirmation that the orderer has included their transaction, and if the inclusion commitment is not fulfilled, users should be allowed to challenge and impose penalties on the orderer. The challenge here is to prove that a transaction was not included, which may require users to provide additional data, which the orderer can simply withhold. Therefore, we can reasonably assume that we at least require SDR or Validium to employ a data availability committee for its complete calldata or transaction history, which can provide proof of non-inclusion (for pre-confirmed transactions) upon user request.
Direction 3: Rapid recovery from liveness failures. Single operator systems may encounter liveness failures (for example, Arbitrum went down during the inscription event). Can we design a system that does not experience service interruptions in similar situations? In a sense, allowing self-ordering and state proposals in L2 does provide guarantees against long-term liveness failures. More resilient single operator systems against short-term liveness failures have yet to be fully explored. One potential solution here is to hold relevant parties accountable by slashing for liveness failures. Another possible solution is simply to shorten the delay period before a takeover (currently set to about a week).
Conclusion
Scaling a global settlement ledger while maintaining trust minimization is a challenge. There is currently no clear distinction between vertical and horizontal scaling in today’s Rollup and data availability space. To truly scale trust minimized systems to every corner of the Earth, we need to build trust minimized and horizontally scalable systems.
Special thanks to Vitalik Buterin and Terry Chung for their feedback and discussions, and to Diana Biggs for her editorial comments.