Vitalik's new work: What is the difference between L2 and execution sharding?

Vitalik Buterin
2024-05-24 08:15:45
Collection
The L2-centered ecosystem is, in a true technical sense, sharding.

Author: Vitalik Buterin

Compiled by: Peng Sun, Foresight News

Two and a half years ago, I mentioned in the article "Endgame" that the different development paths for blockchain in the future look very similar at least from a technical perspective. In both cases, there are a large number of transactions on-chain, and processing these transactions requires: (1) a large amount of computation; (2) a large amount of data bandwidth. A regular Ethereum node (like the 2 TB reth archive node running on my computer right now), even with strong software engineering performance and Verkle trees, is insufficient to directly verify the massive amounts of data and computation. Instead, in the two solutions of "L1 sharding" and rollup-centric approaches, ZK-SNARKs are used for computation verification, and DAS is used for data availability verification. Whether it's L2 sharding or rollups, DAS is the same, and the ZK-SNARKs technology is also the same. They are both smart contract code and a function of the protocol. From a true technical perspective, Ethereum is sharding, and rollups are sharding.

Vitalik's New Work: What is the Difference Between L2 and Execution Sharding?

This naturally raises a question: what are the differences between the two? One difference is the consequences of code vulnerabilities: in rollups, tokens can be stolen; in sharding, consensus can break. However, I expect that as the protocol solidifies and formal verification technology improves, the impact of code vulnerabilities will diminish. So, what other differences exist between these two potentially long-lasting solutions?

Diversity of Execution Environments

In 2019, we briefly discussed an idea in Ethereum regarding execution environments. Essentially, Ethereum would have different "zones," which could set different rules for accounts (including completely different methods like UTXO), how the virtual machine operates, and other functionalities. This would allow for diversity of methods across various parts of the stack, but if Ethereum tried to consolidate multiple functions, it would be challenging to achieve.

In the end, we abandoned some more ambitious plans and retained only the EVM. However, Ethereum L2 (including rollups, validiums, and Plasmas) can be said to ultimately serve as execution environments. Currently, we typically focus on EVM-equivalent L2s, but we actually overlook the diversity brought by many other methods:

  • Arbitrum Stylus, which adds a second WASM-based oracle outside of the EVM;
  • Fuel, which uses a UTXO-based architecture similar to Bitcoin (but more feature-rich);
  • Aztec, which introduces a new language and programming paradigm centered around ZK-SNARK-based privacy-preserving smart contracts.

Vitalik's New Work: What is the Difference Between L2 and Execution Sharding?

UTXO-based architecture, source: Fuel documentation

We could try to build the EVM into a super virtual machine that encompasses all possible paradigms, but doing so would significantly reduce the efficiency of each function, and it would be better to let these platforms do what they specialize in.

Security Trade-offs: Scalability and Transaction Speed

Ethereum L1 provides very strong security guarantees. If certain data is included in the final blocks on L1, then the entire consensus (including extreme cases of social consensus) will strive to ensure that this data is not modified, that any execution triggered by this data cannot be reverted, and that this data remains accessible. To achieve this security guarantee, Ethereum L1 is willing to accept high costs. At the time of writing, transaction fees are relatively low: Layer 2 charges less than 1 cent per transaction, and even basic ETH transfers on L1 are less than 1 dollar. If technological advancements are fast enough, the growth of available block space can keep pace with the increase in demand, meaning these fees may remain low in the future, but they may not. For many non-financial applications (such as social media or games), even 0.01 dollars per transaction is too high.

However, social media and games do not require the same security model as L1. If someone can pay a million dollars to erase their record of losing a chess game or make it seem like a tweet was posted three days later than it actually was, that’s fine. Therefore, these applications should not pay the same security costs. L2 solutions achieve this by supporting a range of data availability methods from rollups, plasma, to validiums.

Vitalik's New Work: What is the Difference Between L2 and Execution Sharding?

Different types of L2 are suitable for different use cases

Another trade-off arises from the asset transfer issues between L2s. I expect that in the next 5 to 10 years, all rollups will be ZK rollups, and highly efficient proof systems like Binius and Circle STARKs with lookups, combined with proof aggregation layers, will make it possible for L2 to provide the final state root in each slot. However, currently, we can only complexly mix Optimistic Rollups and ZK Rollups and use different proof time windows. If we had implemented execution sharding in 2021, the security model for keeping sharding honest would be Optimistic Rollup rather than ZK, so L1 would have to manage the complex fraud proof logic of the on-chain system, and withdrawal times would take up to a week to transfer assets between shards. But like code vulnerabilities, I believe this issue is ultimately temporary as well.

Transaction speed is the third aspect of the security trade-off and is a more enduring one. Ethereum produces blocks every 12 seconds and cannot go faster without becoming overly centralized. However, many L2s are exploring ways to compress block times to a few hundred milliseconds. Twelve seconds is not too bad: users typically wait about 6-7 seconds after submitting a transaction to be included in a block (not just 6 seconds, as the next block may not include them). This is comparable to the time I wait when I pay with a credit card. However, many applications require faster speeds, and L2 can deliver that.

To achieve faster speeds, L2 has a preconfirmation mechanism: L2's own validators digitally sign a commitment to include a transaction within a specific time frame, and if the transaction is not included, they face penalties. The StakeSure mechanism further promotes this.

Vitalik's New Work: What is the Difference Between L2 and Execution Sharding?

L2 Preconfirmation

Now, we could try to implement all these functionalities on L1. L1 could include a "fast preconfirmation" and "slow final confirmation" system. It could contain different shards with varying security levels. However, this would increase the complexity of the protocol. Additionally, having L1 do all the work risks overloading consensus, as many larger-scale or higher-throughput methods carry greater centralization risks or require stronger forms of "governance." If these stronger demands are completed on L1, their impact would ripple through other parts of the protocol. By providing trade-offs through L2, Ethereum can largely avoid these risks.

Benefits of Layer 2 for Organization and Culture

Imagine a country split in two, with one half becoming a capitalist nation and the other half becoming a government-dominated state (unlike what happens in reality, assume in this thought experiment that this is not the result of any traumatic war, but simply that one day borders naturally appeared, and that’s it). In the capitalist part, restaurants are composed of different decentralized ownership, blockchains, and electoral rights. In the government-dominated country, they are all branches of the government, just like police stations. On the first day, there wouldn’t be much change. People would essentially follow existing habits, which are feasible or not depending on technical realities like labor skills and infrastructure. However, a year later, you would see huge changes, as different incentives and control structures would lead to significant behavioral changes, affecting who comes and goes, what gets built, what gets maintained, and what gets abandoned.

Industrial organization theory discusses many such distinctions: it not only talks about the differences between government-managed economies and capitalist economies but also about the differences between economies dominated by large franchise owners and those where each supermarket is run by independent entrepreneurs. I believe the distinction between L1-centric ecosystems and L2-centric ecosystems is also similar.

Vitalik's New Work: What is the Difference Between L2 and Execution Sharding?

The "core developers manage everything" structure has major problems

As an L2-centric ecosystem, I believe Ethereum's main advantages are as follows:

Since Ethereum is an L2-centric ecosystem, you can freely build an independent sub-ecosystem with its unique functionalities while still being part of the larger Ethereum.

If you are just building an Ethereum client, you are part of the larger Ethereum, and while you have some room for innovation, it is far less than L2. If you are building a completely independent chain, your creative space is vast, but you also lose the benefits of shared security and shared network effects. L2 is a good balance point.

It not only provides opportunities to try new execution environments and security trade-off technologies that can achieve scalability, flexibility, and speed, but it also offers an incentive mechanism that encourages both developers to build and maintain and the community to support.

In fact, each L2 is isolated, which also means that deploying new methods is permissionless: there is no need to persuade all core developers that your new method is "safe" for other parts of the entire chain. If your L2 fails, that is your responsibility. Anyone can propose quirky ideas (like Intmax's Plasma method), and even if Ethereum core developers are completely uninterested, they can continue to build and eventually deploy. L1 functionalities and precompiles are not like this; even in Ethereum, the success or failure of L1 development often ultimately depends on politics, to a greater extent than we would like. Regardless of what can theoretically be built, the different incentive mechanisms generated by L1-centric ecosystems and L2-centric ecosystems will ultimately have a significant impact on the content, quality level, and order of what is actually built.

What Challenges Does Ethereum's L2-Centric Ecosystem Face?

Vitalik's New Work: What is the Difference Between L2 and Execution Sharding?

The L1 + L2 architecture has major problems, image source: Reddit

This L2-centric approach faces a key challenge that L1-centric ecosystems almost do not have to confront: coordination. In other words, while Ethereum has many L2s, the challenge is how to make it still feel like "Ethereum" and possess Ethereum's network effects, rather than N independent chains. Currently, this situation is lacking in many ways:

  • Cross-chain transactions between L2s often require centralized cross-chain bridges, which are very complex for ordinary users. If you have tokens on Optimism, you cannot simply paste someone else's Arbitrum address into your wallet to send funds.

  • Support for cross-chain smart contract wallets for individual smart contract wallets and organizational wallets (including DAOs) is not very good. If you change a key on one L2, you still need to change the key on every other L2.

  • Decentralized verification infrastructure is often lacking. Ethereum has finally started to have decent light clients, like Helios. However, if all activities occur on L2 and require their own centralized RPCs, it becomes meaningless. In principle, once you have the Ethereum block header, building light clients for L2 is not difficult; but in practice, there has been too little emphasis on this.

The community is working to improve these three aspects. For cross-chain token swaps, the ERC-7683 standard is a new solution that, unlike existing "centralized cross-chain bridges," has no fixed centralized nodes, tokens, or governance. For cross-chain accounts, most wallets' approach is to use cross-chain replayable messages to update keys in the short term and keystore rollups in the long term. Light clients for L2 are starting to emerge, such as Beerus for Starknet. Additionally, recent improvements in user experience through next-generation wallets have addressed more fundamental issues, such as allowing users to access DApps without manually switching networks.

Vitalik's New Work: What is the Difference Between L2 and Execution Sharding?

Rabby multi-chain asset balance overview, which previous wallets could not achieve!

But it must be recognized that an L2-centric ecosystem indeed faces challenges when trying to coordinate. Because a single L2 has no natural economic incentive to build infrastructure for coordination: small-scale L2s will not do so because they are only after a small share of the benefits; large-scale L2s will not either, as they can gain just as much or even more from strengthening their local network effects. If each L2 only considers itself without anyone thinking about how to align with the broader Ethereum system, we will fail, much like the urban utopia depicted in the images above.

It is hard to say there is a perfect solution to this problem. I can only say that the ecosystem needs to recognize more fully that cross-L2 infrastructure, like L1 clients, development tools, and programming languages, is a type of Ethereum infrastructure and should be given attention and funding. We have Protocol Guild; perhaps we need a Basic Infrastructure Guild.

Conclusion

In various public discussions, "L2" and "sharding" are often viewed as two opposing strategies for blockchain scalability. However, when you study the underlying technology, you discover a dilemma: the actual underlying scalability methods are completely the same. Whether it is data sharding, fraud verifiers, or ZK-SNARK verifiers, or solutions for cross "rollup, sharding" communication, the main difference lies in: who is responsible for building and updating these components, and how much autonomy they have?

An L2-centric ecosystem is technically sharding, but in sharding, you can build your own shard with your own rules. This is very powerful, with limitless creativity, allowing for a great deal of autonomous innovation. But it also presents some key challenges, especially in coordination. For an L2-centric ecosystem like Ethereum to succeed, it must understand these challenges and tackle them head-on to gain as many benefits from the L1-centric ecosystem as possible and get as close as possible to the optimal state of both.

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
banner
ChainCatcher Building the Web3 world with innovators