Vitalik: How will Ethereum's multi-client concept interact with ZK-EVM?
Original Title: How willEthereum'smulti-client philosophy interact with ZK-EVM?
Original Author: Vitalik Buterin
Translated by: Qianwen, ChainCatcher
Ethereum maintains its security and decentralization through a multi-client architecture, which is very important but has not been deeply discussed. Ethereum did not design a "reference client" that everyone can run by default, but instead set a specification that is collaboratively managed by users (recently written in a highly readable but slow Python), with multiple teams implementing that specification (i.e., "clients"), which is what users actually run.
*Each Ethereum node runs a consensus client and an execution client. As of now, no consensus or execution client has more than 2/3 of the network share. If a client with less than 1/3 share in its category * makes * an error, the network will * continue * to operate normally. If a client with between 1/3 and 2/3 share in its category (e.g., Prysm, Lighthouse, or Geth) * makes * an error, the chain will continue to add blocks, but it will stop finalizing blocks, * giving * developers * time * to intervene.*
In the way of validating Ethereum proofs, an under-discussed but upcoming major shift is the rise of ZK-EVM. SNARKs that prove EVM execution have been developed for years, and this technology is being actively adopted by L2 protocols known as ZK rollups. Some of these ZK rollups are active on the mainnet, with more coming soon. However, in the long run, ZK-EVM will not only be used for rollups; we also want to use it to verify the execution of the first layer.
In this way, ZK-EVM will become the de facto third type of Ethereum client, which is as crucial to the security of the network as the current execution client and consensus client. This naturally raises the question: how will ZK-EVM interact with the multi-client philosophy? One of the difficulties has been solved: we already have multiple implementations of ZK-EVM and are actively developing them. But other difficult parts remain: how will we truly leverage the "multi-client" ecosystem to ensure the correctness of ZK proofs of Ethereum blocks? This question brings some interesting technical challenges—of course, there is also the pressing question of whether these trade-offs are worth it.
What was the initial motivation for Ethereum's multi-client philosophy?
Ethereum's multi-client philosophy is a form of decentralization, just like general decentralization, where people can focus on both the technical benefits of architectural decentralization and the social benefits of political decentralization. Ultimately, the multi-client philosophy is driven by both and serves both.
Technical Decentralization
The main benefit of technical decentralization is simple: it reduces the risk of a single bug in the software causing the entire network to collapse. A historical instance that embodies this risk is the 2010 Bitcoin overflow bug. At that time, the Bitcoin client code did not check whether the sum of transaction outputs overflowed (wrapping around to zero by summing beyond the maximum integer of 264-1), allowing someone to perform an overflow transaction and generate billions of bitcoins for themselves. This bug was discovered within hours, and a fix was hurriedly deployed across the network; if there had been a mature ecosystem at that time, these tokens would have been accepted by exchanges, bridges, and other entities, and the attacker would have made off with the funds. However, if there had been five different Bitcoin clients at that time, the likelihood of all of them having the same vulnerability would have been low, leading to an immediate split, with the side containing the vulnerability potentially losing.
Using multiple clients to reduce the risk of catastrophic vulnerabilities comes at a cost: you may encounter consensus failure vulnerabilities: if there are two clients that have subtly different interpretations of certain protocol rules, then although both interpretations are reasonable and prevent the theft of funds, this divergence can lead to a chain split. Such severe splits have occurred once in Ethereum's history (and other smaller splits have occurred as well).
Defenders of the single-client approach might cite consensus failure as a reason not to implement multiple versions: if there is only one client, that client cannot diverge with itself. Their model suggesting that the number of clients introduces risks might look like this:
Of course, I disagree with this analysis because (1) the catastrophic vulnerability from 2010 must also be considered, which was quite serious; (2) a single client has never actually existed. This was most evident in the 2013 Bitcoin fork incident: due to a divergence between two different versions of the Bitcoin client, one version had an unexpected, undocumented limit on the number of modifiable objects in a single block, leading to a chain split. Therefore, the theoretical "one client" often translates to two clients in practice, and the theoretical five clients may actually be six or seven clients—so we might as well choose the right side of the risk curve (as shown above) and have at least a few different clients.
Political Decentralization
Client developers in a monopolistic position hold significant political power. If a client developer proposes a change that users disagree with, theoretically they can refuse to download the updated version or create a fork without that version, but in practice, users often find it difficult to do so. What if this unpleasant protocol change is bundled with necessary security updates? What if the main team threatens to withdraw? Or in a more common scenario, what if the monopolistic client team is the only group with the most expertise in protocol knowledge? In this case, other members of the ecosystem find it difficult to judge whether the technical arguments put forth by the client team are appropriate, and the client team has significant leeway to push their own goals and values, which may not align with the broader pursuits of the community—what then?
Concerns about protocol politics, particularly stemming from the 2013-14 Bitcoin OP_RETURN wars, when some participants openly supported certain specific uses of the discriminated chain, were a significant factor in Ethereum's early adoption of the multi-client philosophy to avoid such situations from happening again. The focus on the Ethereum ecosystem—specifically avoiding the concentration of power within the Ethereum Foundation itself—also provided further support for this direction. In 2018, the team decided not to allow the foundation to implement the Ethereum PoS protocol (i.e., the current "consensus client"), but to leave this task entirely to external teams.
How will ZK-EVM appear in the first layer in the future?
Currently, ZK-EVM is used in rollups: allowing costly EVM execution to occur off-chain multiple times, while others only need to verify on-chain proofs that the EVM execution calculations are correct using SNARKs, thus achieving scalability. It also allows some data (especially signatures) to not be included on-chain to save on fees. This brings us many benefits in terms of scalability, and combining scalable computation with ZK-EVM and data availability sampling can lead to significant scalability.
However, today's Ethereum network also faces different problems that no second-layer scaling solution can resolve: the first layer is difficult to verify, leading to very few users running their own nodes. Instead, most users only trust third-party providers. Lightweight clients, such as Helios and Succinct, are taking steps to address this issue, but lightweight clients are far from being fully validating nodes: lightweight clients only verify the signatures of a random subset of validators (known as the "sync committee"), without verifying whether the chain is truly following the protocol rules. To enable users to genuinely verify whether the chain adheres to the rules, we must make changes.
Option 1: Shrink the first layer, forcing almost all activity to shift to the second layer
We could gradually reduce the first layer gas target per block from 15 million to 1 million, enough to allow a block to include a SNARK and some deposit and withdrawal operations, but not too many other operations, thus forcing almost all user activity to shift to second-layer protocols. Such a design could still support submitting many rollups in each block: we could use an off-chain aggregation protocol run by customized builders to aggregate and merge SNARKs from multiple second-layer protocols into one SNARK. In this way, the only function of the first layer would be to serve as an exchange center for second-layer protocols, verifying the second layer's proofs and occasionally facilitating large fund transfers between them.
This approach could work, but it has several significant weaknesses:
1. It is effectively not backward compatible. This means that many existing L1-based applications would become economically unfeasible. With fees becoming high, even exceeding the cost of clearing these accounts, users' funds could be frozen, amounting to hundreds or thousands of dollars. Allowing users to sign information to facilitate a large-scale migration to L2 could solve this issue, but it adds complexity to the transition. Moreover, to achieve truly low costs, some form of SNARK must be implemented on the first layer. When it comes to something like SELFDESTRU, I generally agree with breaking backward compatibility, but in this case, I do not recommend abandoning backward compatibility.
2. The cost of verification will not significantly decrease. Ideally, the Ethereum protocol should be easy to verify, not only on laptops but also on mobile phones, browser plugins, and even on other chains. The first synchronization of the chain, or synchronizing after being offline for a long time, should also be easy. A laptop node can verify 1 million gas in about 20 milliseconds, but even so, it means that after being offline for a day, it would take 54 seconds to synchronize (assuming single slot finality time increases to 32 seconds), and for mobile or browser plugins, each block may take hundreds of milliseconds, potentially leading to non-negligible battery consumption. While these numbers are manageable, they do not meet our ideal expectations.
3. Even in an L2-prioritized ecosystem, L1 can benefit at a low cost. Validiums can benefit from a more robust security model, allowing users to withdraw funds if new state data becomes unavailable. If the minimum scale required for economically feasible cross-L2 direct transfers is small, arbitrage will be more effective, especially for smaller tokens.
Therefore, using ZK-SNARKs to verify the first layer seems more reasonable.
Option 2: SNARK-verify the first layer
Type 1 (fully equivalent to Ethereum) ZK-EVM can be used to verify the EVM execution of Ethereum blocks (the first layer). We can write more SNARK code to verify the consensus aspects of the block. This will be a challenging engineering problem: currently, ZK-EVM takes minutes to hours to verify Ethereum blocks, while generating proofs in real-time will require one or more (i) improvements to Ethereum itself to eliminate SNARK-unfriendly components, (ii) specialized hardware for significant efficiency gains, and (iii) architectural improvements with more parallelization. There is currently no technical reason to suggest that this cannot be achieved—therefore, I hope that, even if it takes many years, we can realize this.
Here we need to consider the multi-client paradigm: if we want to use ZK-EVM to verify the first layer, which ZK-EVM should we use? There are three options:
1) Single ZK-EVM: abandon the multi-client model and choose a single ZK-EVM to verify blocks.
2) Closed multi-ZK-EVM: reach consensus on a specific set of multi-ZK-EVMs and stipulate in the consensus layer protocol that a block needs proofs from more than half of the ZK-EVMs in that group to be considered valid.
3) Open multi-ZK-EVM: different clients have different ZK-EVM implementations, and each client will only accept a block as valid once it receives a proof compatible with its own implementation.
For me, option 3 is ideal, and this situation will not change until our technology improves to the point where we can formally prove that all ZK-EVM implementations are equivalent and the most efficient option can be chosen freely.
Option 1 sacrifices the benefits of the multi-client model, while option 2 shuts down the possibility of developing new clients, leading to a more centralized ecosystem. Option 3 presents challenges, but these challenges seem smaller than those of the other two options, at least for now.
Implementing option 3 is not difficult: we can establish a p2p subnet for each type of proof, where clients using one type of proof will listen on the corresponding subnet and wait to receive proofs recognized as valid by validators.
The two main challenges of option 3 may be as follows:
1) Delay: Malicious attackers may delay the release of blocks and proofs valid for clients, which could take a long time (i.e., even 15 seconds) to generate proofs valid for other clients, enough time to create temporary forks and disrupt the chain over several time slots.
2) Data inefficiency: One of the benefits of ZK-SNARKs is that only data related to verification (sometimes called "witness data") can be removed from the block. For example, once you verify a signature, you no longer need to keep that signature in the block; you can just store a bit saying that the signature is valid while storing a proof in the block confirming the existence of all valid signatures. However, if we want to generate multiple types of proofs for a block, the original signatures will need to be publicly disclosed.
Delays can be addressed by carefully designing a single-slot deterministic protocol. A single-slot deterministic protocol may require more than two rounds of consensus per slot, so it can be required that the first round includes the block, only requiring nodes to verify the proof before signing in the third round (or the last round). This can ensure that there is always a significant time window between the deadline for releasing the block and the expected time for the proof to be available.
To solve the data efficiency problem, a separate protocol must be established to aggregate verification-related data. For signatures, we can use BLS aggregation, which is supported by ERC-4337. Another major category of verification-related data is ZK-SNARKs, which are used to protect privacy. This data typically has its own aggregation protocol.
It is also worth mentioning that SNARK-verifying the first layer has an important benefit: on-chain EVM execution no longer needs to be verified by every node, which could lead to a significant increase in the volume of EVM execution, achievable by greatly increasing the gas limit of the first layer or introducing enshrined rollups, or both.
Conclusion
Achieving the smooth operation of a multi-client ZK-EVM ecosystem is no easy task. But the good news is that much of this work is happening or will happen:
We already have multiple powerful ZK-EVM implementations, which are not yet Type 1 (fully equivalent to Ethereum), but many of them are actively moving in that direction.
Work on lightweight clients (like Helios and Succinct) may ultimately lead to more comprehensive SNARK verification of Ethereum chain PoS consensus.
Clients may begin to attempt to use ZK-EVM to prove the execution of Ethereum blocks themselves, especially when we can achieve stateless clients, which technically do not need to directly re-execute each block to maintain state. We may see a slow transition: from clients verifying Ethereum blocks by re-executing them to most clients verifying Ethereum blocks by checking SNARK proofs.
The ERC-4337 and PBS ecosystems may soon begin to use aggregation techniques like BLS and proof aggregation to save on transaction costs. Related work has also begun.
With these technologies in place, the future looks bright. Ethereum blocks will be smaller than they are now, and anyone will be able to run a fully validated node on their laptop or even on their phone or in a browser plugin, all while preserving Ethereum's multi-client philosophy.
In the longer term, anything could happen. Perhaps artificial intelligence will significantly enhance the performance of formal verification, making it easy to prove the equivalence of ZK-EVM implementations and identify all the bugs that lead to differences between them. We should even start this project now. If this formal verification-based approach can succeed, different mechanisms will need to be established to ensure the continued political decentralization of the protocol; perhaps by then, the protocol will be considered "complete," and the immutability of specifications will be more robust. However, even far into the future, an open multi-client ZK-EVM is a world that will eventually come.
In the short term, all of this is a long road ahead. ZK-EVM has emerged, but for ZK-EVM to be truly viable on the first layer, it needs to become Type 1 ZK-EVM and achieve fast, real-time proofs. With sufficient parallelization, this can be done, but it still requires a lot of work. Changes in consensus, such as increasing the transaction costs of KECCAK, SHA256, and other hash function precompiles, will also be an important part of the future blueprint. The first step of the transition may happen faster than we expect: once we transition to Merkle trees and stateless clients, clients may gradually start using ZK-EVM, and the transition to an "open, multi-ZK-EVM" world may happen automatically.