Vitalik supports the route Epoch and slot: providing faster transaction confirmation times for Ethereum
Author: Vitalik
Compiled by: Nan Zhi, Odaily Planet Daily
One of the important attributes of a good blockchain user experience is fast transaction confirmation times. Today, Ethereum has made significant improvements compared to five years ago. Thanks to EIP-1559 and the stable block times after transitioning to PoS (The Merge), transactions sent by users on L1 can typically be confirmed within 5-20 seconds, which is roughly comparable to the experience of using a credit card for payment. However, further improving user experience is valuable, as certain applications even require latencies of hundreds of milliseconds or shorter. This article will explore some practical options for Ethereum to improve transaction confirmation times.
Overview of Existing Ideas and Technologies
Single Slot Finality
Currently, Ethereum's Gasper consensus uses a single slot and epoch architecture. A slot occurs every 12 seconds, during which a portion of validators votes on the head of the chain, and within 32 slots (6.4 minutes), all validators have the opportunity to vote once. These votes are then reinterpreted as messages in a consensus algorithm similar to PBFT, providing a strong economic guarantee known as finality after two epochs (12.8 minutes).
In recent years, we have become increasingly dissatisfied with the current approach. There are two main reasons: first, the method is complex, with many interaction errors between the slot-to-slot voting mechanism and the epoch-to-epoch finality mechanism; second, 12.8 minutes is too long, and no one wants to wait that long.
Single Slot Finality (SSF) replaces this architecture with a mechanism similar to Tendermint consensus, where block N is finalized before block N+1 is generated. The main difference from Tendermint is that we retain the "inactivity leak" mechanism, which allows the chain to continue operating and recover even when more than 1/3 of the validators are offline.
- (Note: Inactivity leak is a mechanism in PoS designed to penalize validators who are inactive for extended periods; once marked as inactive, their staked ETH will continue to be slashed.)
Tendermint is an efficient and secure Byzantine fault-tolerant consensus algorithm that allows for rapid transaction confirmations and ensures that the blockchain system continues to operate normally even when some nodes are malicious or offline.)
The main challenge of single slot finality is that it requires each Ethereum staker to publish two messages every 12 seconds, which is a significant load on the chain. There are some clever ideas to alleviate this issue, including the recent Orbit SSF proposal. While this significantly speeds up "finality" to enhance user experience, it does not change the fact that users still need to wait 5-20 seconds.
- (Note: Finality is not the same event as a transaction being packaged into a block and confirmed; a transaction may be confirmed but not finalized, which could lead to forks or rollbacks.) *
Rollup Pre-confirmation
In recent years, Ethereum has followed a rollup-centric roadmap, designing the Ethereum base layer (L1) to support data availability and other features, which can then be utilized by L2 protocols (such as rollups, validiums, and plasmas) to provide users with a level of security comparable to Ethereum on a larger scale.
This has created a separation of focus within the Ethereum ecosystem: Ethereum L1 focuses on censorship resistance, reliability, stability, and maintaining and improving certain core functions of the base layer, while L2 focuses on more directly engaging users through different cultures and technologies. However, as we move along this path, an inevitable question arises: L2 wants to provide users with confirmations faster than 5-20 seconds.
So far, at least theoretically, it has been the responsibility of L2 to create their own "decentralized sequencer" networks. A small group of validators could sign blocks every few hundred milliseconds and stake their assets behind these blocks. Ultimately, the headers of these L2 blocks would be published to L1.
But L2 validator sets can engage in "fraud": they could first sign block B1, then sign a conflicting block B2 and submit it to the chain before B1. However, if they do this, they would be caught and lose their staked assets. In practice, we have already seen real-world cases of centralized versions, but on the other hand, rollups have made slow progress in developing decentralized sequencer networks. One could argue that requiring all L2s to engage in decentralized sequencing is unfair: we are asking rollups to do work almost equivalent to creating an entirely new L1. Therefore, Justin Drake has been promoting a method that allows all L2s (and L1) to utilize a shared pre-confirmation mechanism across Ethereum: base pre-confirmation.
Base Pre-confirmation
The base pre-confirmation method assumes that Ethereum proposers are highly complex participants related to MEV. The pre-confirmation approach leverages this complexity by incentivizing these complex proposers to take on the responsibility of providing pre-confirmation services.
The basic idea of this method is to create a standardized protocol where users can provide additional fees to ensure that their transactions are guaranteed to be included in the next block, along with a declaration of the execution results of that transaction. If proposers violate any commitments made to users, they can be penalized.
As described, base pre-confirmations provide guarantees for L1 transactions. If rollups are "based," then all L2 blocks are L1 transactions, so the same mechanism can be used to provide pre-confirmation for any L2.
- (Note: Ethereum proposers can bundle a series of transactions into a bundle and package them into a block through a fee mechanism, ensuring transaction execution and order. For example, the well-known sandwiching ensures buying before a transaction and selling afterward. The proposal mentioned by Vitalik is conceptually consistent, as it allows proposers to lock in transaction results in advance, speeding up execution.) *
What Are We Actually Looking At?
Assuming we implement single slot finality. We use technology similar to Orbit to reduce the number of validators signing each slot, but not too much, so we can also make progress on the key goal of reducing the minimum staking requirement of 32 ETH. The slot time may increase to 16 seconds, and then we use rollup pre-confirmation or base pre-confirmation to provide users with faster confirmations. What do we ultimately achieve: an epoch-slot architecture.
There is a profound philosophical reason why the epoch-and-slot architecture seems so difficult to avoid: it takes less time to reach a rough consensus on something than to reach the maximum degree of "economic finality" agreement on that same thing.
One simple reason is the number of nodes. While the old linear decentralization/finality time/overhead trade-offs now seem mild due to hyper-optimized BLS aggregation and the upcoming ZK-STARKs, the following reasons cannot be ignored:
"Approximate consensus" requires only a small number of nodes, while economic finality requires a majority of nodes.
Once the number of nodes exceeds a certain scale, you need to spend more time collecting signatures.
In today's Ethereum, the 12-second slot is divided into three sub-slots: block publishing and distribution, proof, and proof aggregation. If the number of provers is significantly reduced, we can reduce it to two sub-slots and use an 8-second slot time. Another, more practical larger factor is the "quality" of the nodes. If we can also rely on a specialized subset of nodes to reach approximate consensus (while still using the full set of validators to determine finality), we could bring it down to about 2 seconds.
Therefore, in my view, the epoch-and-slot architecture is clearly the right choice, but not all epoch-and-slot architectures are equal, and it is valuable to explore the design space more fully. A direction worth exploring in depth is not to be tightly coupled like Gasper, but to have a stronger separation of focus between the two mechanisms.
What Should L2 Do?
In my opinion, L2 currently has three reasonable strategies:
Technically and philosophically "based." That is, they optimize the technical attributes of the Ethereum base layer and its values (high decentralization, censorship resistance, etc.). In its simplest form, you can think of these rollups as "branded shards," but they can also have greater ambitions and conduct extensive experiments on new virtual machine designs and other technical improvements.
Become "servers with blockchain scaffolding" and fully leverage it. If you start from the server and then add STARK validity proofs to ensure the server follows the rules; ensure users' rights to exit or enforce transactions; and the freedom of collective choice through coordinated large-scale exits or by changing the sequencer's votes, then you have gained most of the benefits of being on-chain while retaining most of the efficiency of the server.
(Note: Scaffolding refers to tools or methods that automatically generate the basic structure and code framework of a project, allowing developers to quickly start coding.)*
A compromise approach: a fast chain with a hundred nodes, where Ethereum provides additional interoperability and security. This is the current practical roadmap for many L2 projects.
For certain applications (such as ENS, key storage, and some payment protocols), a 12-second block time is sufficient. For those applications where this does not apply, the only solution is the epoch-and-slot architecture. In all three cases, the "epoch" is Ethereum's SSF, but the slot varies in the three cases mentioned above:
An Ethereum-native epoch-and-slot architecture
Server pre-confirmation
Committee pre-confirmation
A key question is how well we can achieve this in the first category. Particularly, if it becomes very good, then the significance of the third category feels less compelling. Since all "based" solutions do not apply to off-chain data L2s like plasmas and validiums, the second category will always exist. If an Ethereum-native epoch-and-slot architecture can reduce to a 1-second slot time, then the space for the third category will become much smaller.
Today, we are still far from the final answers to these questions. A key issue is: how complex will block proposers become, which remains an area of considerable uncertainty. Designs like Orbit SSF are very novel, so for example, using Orbit SSF as the epoch in the epoch-and-slot architecture remains a design space worth exploring fully. The more options we have, the better we can serve users of L1 and L2, and we can simplify the work for L2 developers.