Vitalik's new article: Improving the future of Ethereum's permissionless and decentralized network
Author: Vitalik Buterin
Compiled by: Deng Tong, Jinse Finance
Special thanks to Dankrad Feist, Caspar Schwarz-Schilling, and Francesco for their quick feedback and review.
I am sitting here writing this article on the last day of the Ethereum Developer Interoperability event in Kenya, where we have made significant progress in implementing and resolving the technical details of upcoming important Ethereum improvements, most notably PeerDAS, the Verkle tree transition, and the decentralized approach to storing historical records in the context of EIP 4444. From my own perspective, the pace of Ethereum's development and our ability to deliver large and important features that can significantly improve the experience for node operators and (L1 and L2) users is continuously increasing.
The Ethereum client teams are working together to deliver the Pectra development network.
Given the enhanced technical capabilities, an important question that needs to be raised is: Are we moving towards the right goals? A recent series of frustrated tweets from long-time Geth core developer Peter Szilagyi has prompted us to reflect on this question:
These concerns are valid. They reflect worries expressed by many in the Ethereum community. I have personally worried about these issues multiple times. However, I do not think the situation is as dire as Peter's tweets suggest. On the contrary, many issues are being addressed through ongoing protocol features, and many other issues can be resolved with very realistic adjustments to the current roadmap.
To understand what this means in practice, let’s review the three examples provided by Peter one by one. These issues are of widespread concern among many community members, and addressing them is very important.
MEV and Builder Dependency
In the past, Ethereum blocks were created by miners who used relatively simple algorithms to create blocks. Users sent transactions to a public p2p network, commonly referred to as the "mempool" (or "txpool"). Miners listened to the mempool, accepted valid transactions, and paid fees. They included transactions that could be executed, and if there wasn't enough space, they prioritized them based on the highest fees.
This is a very simple system and is friendly to decentralization: as a miner, you only need to run the default software, and you can earn the same level of fee income from a block as you would from a highly specialized mining farm. However, around 2020, people began to exploit what is known as miner extractable value (MEV): income that can only be obtained through executing complex strategies that understand the activities occurring within various DeFi protocols.
For example, consider a decentralized exchange like Uniswap. Suppose at time T, the USD/ETH exchange rate on centralized exchanges and Uniswap is $3000. At time T+11, the USD/ETH exchange rate on centralized exchanges rises to $3005. But Ethereum has not yet produced the next block. By time T+12, it has. Whoever created that block could make their first transaction a series of Uniswap purchases, buying all available ETH on Uniswap at prices from $3000 to $3004. This is additional income known as MEV. Similar issues exist for applications outside of DEXs. The Flash Boys 2.0 paper published in 2019 detailed this.
The chart in the Flash Boys 2.0 paper shows the amount of income obtainable using the various methods mentioned above.
The problem is that this breaks the reason why mining (or block proposing after 2022) could be "fair": now, larger participants with better capabilities to optimize such extraction algorithms can earn better returns in each block.
Since then, there has been an ongoing debate between two strategies, which I refer to as MEV Minimization and MEV Isolation. MEV Minimization takes two forms: (i) actively developing MEV-free alternatives to Uniswap (e.g., Cowswap), and (ii) building technical solutions within protocols, such as encrypted mempools, that reduce the information available to block producers, thereby decreasing the income they can earn. In particular, encrypted mempools can prevent strategies like sandwich attacks, which place transactions before and after user transactions to economically exploit them ("front-running").
MEV Isolation works by accepting MEV but trying to limit its impact on staking centralization by dividing the market into two types of participants: validators responsible for proving and proposing blocks, but the task of choosing block content is done through an auction protocol. Individual stakers no longer need to worry about optimizing DeFi arbitrage; they simply join the auction protocol and accept the highest bid. This is known as proposer/builder separation (PBS). This approach has precedents in other industries: one major reason restaurants can remain so decentralized is that they often rely on relatively centralized suppliers for various operations, which indeed have significant economies of scale. So far, PBS has been quite successful in ensuring a fair competitive environment for small and large validators, at least in terms of MEV. However, it brings another problem: the task of choosing which transactions to include becomes more centralized.
My view has always been that MEV Minimization is good, and we should pursue it (I personally often use Cowswap!) ------ although there are many challenges with encrypted mempools, MEV Minimization may not be enough; MEV will not drop to zero or even close to zero. Therefore, we also need some form of MEV Isolation. This raises an interesting task: how can we make the "MEV isolation box" as small as possible? How can we give builders as little power as possible while still allowing them to absorb the effects of optimizing arbitrage and other forms of MEV collection?
If builders have the power to completely exclude transactions from blocks, then attacks can easily occur. Suppose you have a collateral debt position (CDP) in a DeFi protocol backed by an asset whose price is rapidly falling. You want to add collateral or exit the CDP. A malicious builder might collude to refuse to include your transaction, delaying it until the price drops enough to forcibly liquidate your CDP. If this happens, you would have to pay a hefty penalty, and the builder would gain a significant portion. So how do we prevent builders from excluding transactions and executing such attacks?
This is where inclusion lists come into play.
Source: ethresear.ch post
Inclusion lists allow block proposers (i.e., stakers) to select the transactions that must be included in a block. Builders can still reorder transactions or insert their own, but they must include the proposer's transactions. Ultimately, the inclusion list is modified to constrain the next block rather than the current block. In either case, it deprives builders of the ability to completely exclude transactions from blocks.
MEV is a complex issue; even the above description omits many important nuances. As the saying goes, "You may not be looking for MEV, but MEV is looking for you." Ethereum researchers have been very consistently committed to the goal of "minimizing the isolation box," trying to reduce the potential harm that builders may cause (for example, by excluding or delaying transactions as a way to attack specific applications).
That said, I do believe we can go further. Historically, inclusion lists have often been seen as a "special case feature": typically, you wouldn’t consider them, but in case malicious builders start doing crazy things, they provide you with a "second" small path. This attitude is reflected in current design decisions: in the current EIP, the gas limit for inclusion lists is about 2.1 million. But we can make a philosophical shift in how we view inclusion lists: treating inclusion lists as blocks and viewing the builder's role as an auxiliary function to add some transactions to collect MEV. What if builders had a gas limit of 2.1 million?
I think the idea in this direction ------ truly pushing to make the isolation box as small as possible ------ is very interesting, and I support moving in this direction. This is a shift from the "2021 era philosophy": in the 2021 era philosophy, we were more enthusiastic about the idea that since we now have builders, we can "overload" their functions to serve users in more complex ways, such as by supporting ERC-4337 fee markets. In this new concept, the transaction validation part of ERC-4337 must be incorporated into the protocol. Fortunately, the ERC-4337 team has become increasingly enthusiastic about this direction.
In summary: The MEV mindset has returned to empowering block producers, including granting block producers the direct power to ensure user transactions are included. The account abstraction proposal has returned to eliminating reliance on centralized relayers or even bundlers. However, there is a strong argument that we have not gone far enough, and I believe the pressure to push the development process further in this direction is very welcome.
Liquid Staking
Today, individual stakers represent a relatively small proportion of all Ethereum staking, and most staking is done by various providers ------ some centralized operators and others DAOs, such as Lido and RocketPool.
I have conducted my own research ------ various polls, surveys, and face-to-face conversations, asking the question "Why don’t you ------ especially you ------ stake solo today?" For me, so far, a strong individual staking ecosystem is my preferred outcome for Ethereum staking, and one of the best things about Ethereum is that we are actually trying to support a strong individual staking ecosystem rather than simply yielding to delegation. However, we are far from this outcome. In my polls and surveys, there are some consistent trends:
The vast majority of those who do not stake solo attribute their main reason to the minimum of 32 ETH.
Among those who cited other reasons, the biggest one is the technical challenges of running and maintaining a validator node.
The loss of immediate availability of ETH, the security risks of "hot" private keys, and the loss of the ability to participate in DeFi protocols simultaneously are significant but smaller issues.
Farcaster polls show the main reasons people do not stake solo.
Staking research needs to address two key questions:
How do we address these concerns?
While there are effective solutions for most issues, if the majority still do not want to stake solo, how do we maintain the stability and robustness of the protocol to withstand attacks?
Many ongoing research and development projects are specifically aimed at addressing these issues:
Verkle trees combined with EIP-4444 allow staking nodes to operate with very low disk requirements. Additionally, they allow staking nodes to synchronize almost instantly, greatly simplifying the setup process and operations such as switching from one implementation to another. They also make Ethereum light clients more feasible by reducing the data bandwidth required to provide proofs for each state access.
Research (such as these proposals) allows for a larger set of validators (achieving smaller staking minimums) while reducing the overhead for consensus nodes. These ideas can be implemented as part of single-slot finality. Doing so will also make light clients more secure, as they will be able to verify the full set of signatures instead of relying on a sync committee.
Despite the growing history, ongoing optimizations of Ethereum clients continue to lower the cost and difficulty of running validator nodes.
Research on penalty limits may alleviate concerns about private key risks and enable stakers to simultaneously stake their ETH in DeFi protocols (if they wish).
0x01 withdrawal vouchers allow stakers to set their ETH address as the withdrawal address. This makes decentralized staking pools more feasible, giving them an edge over centralized staking pools.
However, we can still do more. It is theoretically possible to allow validators to withdraw faster: even if the validator set changes by a few percentage points each time it finalizes (i.e., once per epoch), Casper FFG remains secure. Therefore, if we put in the effort, we can significantly shorten the cycles. If we want to greatly reduce the minimum deposit size, we can make tough decisions and trade-offs in other directions. For example, if we quadruple the finalization time, the minimum deposit size will be reduced by a factor of four. Single-slot finality will later address this issue by completely surpassing the "every staker participates in every epoch" model.
Another important part of the whole issue is the economics of staking. A key question is: do we want staking to be a relatively niche activity, or do we want everyone or almost everyone to stake all their ETH? If everyone is staking, what responsibilities do we want everyone to bear? If people end up simply delegating responsibility out of laziness, this could ultimately lead to centralization. There are important and profound philosophical questions here. The wrong answers could lead Ethereum down a path of centralization and "recreating the traditional financial system through additional steps"; the right answers could create a shining example of a successful ecosystem with a broad and diverse set of independent stakers and highly decentralized staking pools. These questions touch on the core economics and values of Ethereum, so we need more diverse participation.
Node Hardware Requirements
Many key issues of Ethereum's decentralization ultimately boil down to a question that has defined blockchain for a decade: how conveniently do we want to run nodes, and how do we achieve that?
Today, running a node is difficult. Most people do not do it. On the laptop I used to write this article, I have a reth node that occupies 2.1 TB - already a result of heroic software engineering and optimization. I need to purchase an additional 4 TB hard drive to store that node on my laptop. We all want running nodes to become easier. In my ideal world, people would be able to run nodes on their phones.
As I mentioned above, EIP-4444 and Verkle trees are two key technologies that bring us closer to this ideal. If both are implemented, the hardware requirements for nodes could ultimately be reduced to less than a hundred gigabytes, and if we completely eliminate historical storage responsibilities (which may only apply to non-staking nodes), it could be close to zero. Type 1 ZK-EVM would eliminate the need for you to run EVM computations yourself, as you could simply verify the evidence that the execution was correct. In my ideal world, we would stack all these technologies together, and even Ethereum browser extension wallets (like Metamask, Rabby) would have a built-in node to verify these proofs, perform data availability sampling, and ensure the chain is correct.
The above vision is often referred to as "The Verge."
This is all well-known and understood, even by those who express concerns about the scale of Ethereum nodes. However, there is an important concern: if we offload the responsibility of maintaining state and providing proofs, isn’t that a centralization vector? Even if they cannot cheat by providing invalid data, is relying too heavily on them not contrary to Ethereum's principles?
A recent version of this concern is the discomfort many have with EIP-4444: if regular Ethereum nodes no longer need to store old historical records, then who does? A common answer is: surely there are enough large participants (like block explorers, exchanges, Layer 2s) motivated to hold this data, and compared to the 100 PB stored by the Wayback Machine, the Ethereum chain is small. Therefore, the idea that any history would actually be lost is absurd.
However, this argument relies on a few large participants. In my trust model classification, this is the N in 1 assumption, but N is very small. This has its tail risks. One thing we can do is store old historical records in a peer-to-peer network where each node only stores a small portion of the data. Such a network would still replicate sufficiently to ensure robustness: each piece of data would have thousands of copies, and in the future, we could use erasure coding (in fact, by placing historical records into EIP-4844 style blobs, this has already built-in erasure coding) to further enhance stability.
Blobs have erasure coding within and between blobs. The simplest way to provide ultra-stable storage for all of Ethereum's history is likely to place beacon and execution blocks into blobs. Image source: codex.storage
For a long time, this work has been relegated to a secondary status. Portal networks do exist, but they have not received the level of attention commensurate with their importance to Ethereum's future. Fortunately, there is now a strong interest in putting more resources into focusing on minimizing portal versions that emphasize distributed storage and historical accessibility. We should build on this momentum and work together to implement EIP-4444 as soon as possible, paired with a robust decentralized peer-to-peer network to store and retrieve old historical records.
For state and ZK-EVM, this distributed approach is more challenging. To build an efficient block, you simply need to have the complete state. In this case, I personally tend to take a pragmatic approach: we define and adhere to some degree of hardware requirements necessary for a "do everything node," which is higher than the (ideally continuously decreasing) cost chain of simple verification nodes, but still low enough for enthusiasts to afford. We rely on the N in 1 assumption, ensuring that N is reasonably large.
ZK-EVM proofs may be the trickiest part, as real-time ZK-EVM provers may require more powerful hardware than archival nodes, even with advances like Binius and the worst-case boundaries of multi-dimensional gas. We can work on a distributed proof network where each node takes on the responsibility of proving, for example: one percent of block execution, and then the block producer only needs to aggregate a hundred proofs at the end. Proof aggregation trees can provide additional assistance. But if this does not work well, another compromise is to allow the hardware requirements for proofs to become higher, but to ensure that "do everything nodes" can directly verify Ethereum blocks (without proofs) quickly enough to effectively participate in the network.
Summary
I believe that as long as there is some market mechanism or zero-knowledge proof system to force centralized participants to act honestly, the Ethereum mindset of the 2021 era has indeed become accustomed to shifting responsibility to a few large participants. Such systems generally work well in the broad sense but can fail catastrophically in the worst cases.
At the same time, I think it is important to emphasize that the current Ethereum protocol proposals have significantly deviated from this model and are taking the needs of a truly decentralized network more seriously. Ideas around stateless nodes, MEV mitigation, single-slot finality, and similar concepts have progressed further in this direction. A year ago, people seriously considered the idea of using relays as semi-centralized nodes for data availability sampling. This year, we no longer need to do those things, and PeerDAS has made surprisingly strong progress.
However, on all three central issues I mentioned above, as well as many other important issues, we can do a lot to move further in this direction. Helios has made tremendous progress in providing "truly light clients" for Ethereum. Now, we need to make this a default inclusion in Ethereum wallets and have RPC providers offer proofs and their results for verification, extending light client technology to Layer 2 protocols. If Ethereum scales through a rollup-centric roadmap, Layer 2 needs to achieve the same security and decentralization guarantees as Layer 1. In a rollup-centric world, there are many other things we should take more seriously; decentralized and efficient cross-L2 bridges are just one of many examples. Many dapps obtain logs through centralized protocols because Ethereum's native log scanning has become too slow. We can improve this through dedicated decentralized sub-protocols; this is one suggestion I have on how to do that.
There are almost limitless blockchain projects targeting the market of "we can go super fast, and we will consider decentralization later." I believe Ethereum should not join this ranks. Ethereum L1 can and certainly should be a powerful foundational layer for Layer 2 projects adopting super-scalable approaches, using Ethereum as the pillar of decentralization and security. Even a Layer 2-centric approach requires Layer 1 itself to have sufficient scalability to handle a large number of operations. But we should deeply respect the characteristics that make Ethereum unique and continue to strive to maintain and improve these characteristics as Ethereum scales.