Vitalik: Cross L2 reading for wallets and other use cases

2023-06-21 16:03:52

Collection

This article will focus more directly on the technical aspects of a specific subproblem: how to more easily read L1 from L2, read L2 from L1, or read one L2 from another L2.

Scan with WeChat

Original: Deeper dive on cross-L2 reading for wallets and other use cases

Author: Vitalik Buterin

Compiled by: Lynn, MarsBit

In the article "Three Transitions," I outlined some key reasons why it is valuable to think of L1 + cross-L2 support, wallet security, and privacy as necessary foundational features of the ecosystem stack, rather than as additional components that can be designed separately by individual wallets.

This article will focus more directly on the technical aspects of a specific subproblem: how to more easily read L1 from L2, read L2 from L1, or read one L2 from another L2. Solving this problem is crucial for implementing an asset/key storage separation architecture, but it also has valuable applications in other areas, particularly optimizing reliable cross-L2 calls, including use cases for moving assets between L1 and L2.

Recommended Pre-Reading

Post on the Three Transitions

Ideas from the Safe team on holding assets across multiple chains

Why we need wide adoption of social recovery wallets

ZK-SNARKs, and some privacy applications

Dankrad on KZG commitments

Verkle trees

What is the goal?
What do cross-chain proofs look like?
What types of proof schemes can we use?
Merkle proofs
ZK SNARKs
Special-purpose KZG proofs
Verkle tree proofs
Aggregation
Direct state reading
How does L2 learn the latest Ethereum state root?
Wallets on chains not belonging to L2
Protecting privacy
Summary

What is the goal?

Once L2 becomes mainstream, users will have assets on multiple L2s and possibly L1s. Once smart contract wallets (multi-signature, social recovery, or otherwise) become mainstream, the keys required to access an account will change over time, and old keys will no longer be valid. Once these two things happen, users will need a way to change keys that have access to multiple accounts across many different places without having to perform a very high number of transactions.

In particular, we need a way to handle hypothetical addresses: these addresses have not been "registered" on the blockchain in any way but still need to receive and securely hold funds. We all rely on hypothetical addresses: when you first use Ethereum, you can generate an ETH address that others can use to pay you without "registering" that address on the blockchain (which would require paying transaction fees, thus needing to hold some ETH).

For EOAs (Externally Owned Accounts), all addresses start from hypothetical addresses. For smart contract wallets, hypothetical addresses are still possible, largely thanks to CREATE2, which allows you to have an ETH address that can only be filled by a smart contract with code that matches a specific hash.

EIP-1014 (CREATE2) address calculation algorithm

However, smart contract wallets introduce a new challenge: access keys may change. Addresses are hashes of initcode and can only contain the wallet's initial verification key. The current verification key will be stored in the wallet's storage, but this storage record will not automatically propagate to other L2 chains.

If a user has many addresses across many L2 chains, including those addresses that the L2 chain does not know about (because they are virtual), it seems there is only one way for users to change their keys: an asset/key storage separation architecture. Each user has (i) a "key storage contract" (on L1 or a specific L2 chain) that stores all the wallet's verification keys and the rules for changing keys, and (ii) "wallet contracts" on L1 and many L2 chains that retrieve verification keys through cross-chain reading.

There are two implementation approaches:

Lightweight version (only checks updated keys): Each wallet locally stores the verification key and includes a callable function to check the current state of the key storage via a cross-chain proof and updates the locally stored verification key to match. When a wallet is first used on a certain L2, calling this function to get the current verification key from the key storage is necessary.

Advantages: Very economical in using cross-chain proofs, so it doesn't matter if cross-chain proofs are expensive. All funds can only be accessed using the current key, so it remains secure.

Disadvantages: To change the verification key, on-chain key changes must be made on the key storage and every initialized wallet (though not including virtual wallets). This can consume a lot of gas.

Heavyweight version (checks for every transaction): Each transaction requires a cross-chain proof showing the current key of the key storage.

Advantages: Lower system complexity and quick updates to the key storage.

Disadvantages: Each transaction is expensive, so more engineering is needed to make cross-chain proofs reasonable. It is also not easy to be compatible with ERC-4337, which currently does not support cross-contract reading of mutable objects during verification.

What do cross-chain proofs look like?

To demonstrate the full complexity, we will explore the most difficult case: the key storage is on one L2, while the wallet is on another L2. If the key storage or wallet is on L1, only half of this design is needed.

Assuming the key storage is on Linea and the wallet is on Kakarot. The complete proof process for the wallet key includes:

Providing a proof of the current Linea state root, given the current Ethereum state root known to Kakarot.
Providing a proof of the key in the current key storage, given the current Linea state root.

There are two main implementation challenges:

What kind of proof do we use? (Is it a Merkle proof? Or something else?)

How does L2 first learn the latest L1 (Ethereum) state root (or, as we will see, the potential complete L1 state)? And how does L1 learn the L2 state root?

In both cases, how long will the delay be between something happening on one side and the other side being able to provide proof?

What types of proof schemes can we use?

There are five main options:

Merkle proofs
General ZK-SNARKs
Special-purpose proofs (e.g., using KZG)
Verkle proofs, which lie between KZG and ZK-SNARKs, considering both infrastructure workload and cost.
No proof, relying on direct state reading

In terms of the required infrastructure workload and user costs, I roughly rank them as follows:

"Aggregation" refers to aggregating all proofs provided by users in each block into one large meta-proof, merging them together. This is feasible for SNARKs and KZG, but not for Merkle branches (you can merge Merkle branches slightly, but you can only save log(txs per block) / log(total number of keystores), about 15-30%, so it may not be worth the cost).

Aggregation only becomes valuable when the scheme has a large number of users, so from a practical perspective, implementation version 1 can ignore aggregation and implement it in version 2.

How do Merkle proofs work?

This is straightforward: directly follow the diagram from the previous section. More specifically, each "proof" (assuming the most difficult case of proving one L2 entering another L2) contains the following:

A Merkle branch proving the state root of the holding L2 key storage, based on the latest state root of Ethereum known to L2. The state root of the holding L2 key storage is stored in a known storage slot at a known address (the L1 contract representing L2), so the path can be hardcoded.

A Merkle branch proving the current verification key, based on the state root of the holding L2 key storage. Similarly, the verification key is stored in a known storage slot at a known address, so the path can be hardcoded.

Unfortunately, Ethereum's state proofs are complex, but there are some libraries that can be used to verify them, and if these libraries are used, this mechanism is not too complicated.

A bigger issue is cost. Merkle proofs are long, and the length of the Patricia tree is about 3.9 times longer than theoretically necessary (to be precise: the length of a Merkle proof storing N objects in a tree is 32 * log2(N) bytes, while the proof length for these trees is about 32 * 15 * log16(N) ~= 125 * log2(N) bytes). With a state of about 250 million (2²⁸) accounts, each proof's size is 125 * 28 = 3500 bytes, approximately 56,000 gas, plus the additional cost of decoding and verifying hashes.

Two proofs combined will cost about 100,000 to 150,000 gas (excluding the signature verification used for each transaction), far exceeding the current base of 21,000 gas per transaction. However, if the proof is verified on L2, the inequality worsens. Computation within L2 is cheaper because it occurs off-chain and the number of nodes is much smaller than L1. However, data must be sent to L1. Therefore, the comparison is not 21,000 gas versus 150,000 gas; rather, it is 21,000 L2 gas versus 100,000 L1 gas.

We can calculate what this means by comparing L1 gas costs and L2 gas costs.

Comparison of Ethereum L1 and L2 Gas

Currently, the cost of a simple send operation on the L1 network is about 15-25 times that of L2, while the cost of token swaps is about 20-50 times that of L2. Simple send operations are relatively data-heavy, while swap operations are more computationally intensive. Therefore, swap operations are a better benchmark for approximating the cost of L1 computation versus L2 computation. Considering the above, if we assume a cost ratio of 30 times between L1 computation costs and L2 computation costs, this suggests that placing Merkle proofs on L2 could cost about fifty regular transactions.

Of course, using binary Merkle trees can reduce costs by about 4 times, but even so, in most cases, costs will still be too high—and if we are willing to give up compatibility with Ethereum's current hexagonal state tree, we might seek better options.

How do ZK-SNARK proofs work?

Conceptually, using ZK-SNARKs is also easy to understand: you simply replace the Merkle proofs in the diagram above with ZK-SNARKs that prove the existence of these Merkle proofs. The computational cost of a ZK-SNARK is about 400,000 gas, with a size of about 400 bytes (in comparison, the computational cost of future basic transactions is 21,000 gas and 100 bytes, which may reduce to about 25 bytes when compressed). Therefore, from a computational perspective, the cost of a ZK-SNARK is 19 times that of a basic transaction, and from a data perspective, the cost of a ZK-SNARK is 4 times that of a basic transaction, potentially 16 times the cost of a basic transaction in the future.

These numbers show a significant improvement over Merkle proofs, but they are still very expensive. There are two ways to improve this: (i) specific-purpose KZG proofs, or (ii) aggregation, similar to ERC-4337 aggregation but using more advanced mathematics. We can explore both methods.

How do specialized KZG proofs work?

Warning, this section is more mathematical than others. This is because we want to go beyond general tools and build some special-purpose, cheaper tools, so we need to delve deeper. If you are not fond of deep mathematical knowledge, you can skip to the next section.

First, let's review how KZG commitments work:

We can represent a set of data [D1 … Dn] with a KZG proof of a polynomial derived from the data: specifically, the polynomial P where P(w) = D1, P(w²) = D2 … P(wⁿ) = D_n. Here, w is a "root of unity," a value where wᴺ = 1 for some evaluation domain size N (this is all done in a finite field).
To "commit" to P, we create an elliptic curve point com(P) = P₀ * G + P₁ * S₁ + … + Pₖ * Sₖ. Here:
G is the generator point of the curve
Pᵢ is the i'th-degree coefficient of the polynomial P
Sᵢ is the i'th point in the trusted setup
To prove P(z) = a, we create a quotient polynomial Q = (P - a) / (X - z), and create a commitment com(Q) to it. It is only possible to create such a polynomial if P(z) actually equals a.
To verify a proof, we check the equation Q * (X - z) = P - a by doing an elliptic curve check on the proof com(Q) and the polynomial commitment com(P): we check e(com(Q), com(X - z)) ?= e(com(P) - com(a), com(1))

Some important key properties need to be understood:

A proof is just the com(Q) value, which is 48 bytes
com(P₁) + com(P₂) = com(P₁ + P₂)
This also means that you can "edit" a value into an existing commitment. Suppose that we know that Di is currently a, we want to set it to b, and the existing commitment to D is com(P). A commitment to "P, but with P(wⁱ) = b, and no other evaluations changed", then we set com(newP) = com(P) + (b-a) * com(Lᵢ), where Lᵢ is the "Lagrange polynomial" that equals 1 at wⁱ and 0 at other wʲ points.
To perform these updates efficiently, all N commitments to Lagrange polynomials (com(Lᵢ)) can be pre-calculated and stored by each client. Inside a contract on-chain it may be too much to store all N commitments, so instead you could make a KZG commitment to the set of com(Li) (or hash(com(Li)) values, so whenever someone needs to update the tree on-chain they can simply provide the appropriate com(L_i) with a proof of its correctness.

Thus, we have a structure where we can continually add values to the end of a growing list, although there is a certain size limit (in practice, hundreds of millions of values are feasible). We will then use this as our data structure to manage (i) the commitment of the key list stored on L2 and mirrored to L1, and (ii) the commitment of the L2 key list stored on Ethereum L1 and mirrored to each L2.

Keeping the commitments updated can become part of the core L2 logic, or it can be implemented through deposit and withdrawal bridges without requiring changes to the core L2 protocol.

The contents required for a complete proof are as follows:

The latest com (key list) stored on L2 (48 bytes)
A KZG proof of com (key list) as a value in com (mirrored list), where com (mirrored list) is the list of all key list commitments (48 bytes)
A KZG proof of your key in com (key list) (48 bytes, plus 4 bytes for the index)

These two KZG proofs can actually be merged into one, so the total size is only 100 bytes.

Note a detail: since the key list is a list rather than a key/value mapping like state, the key list must allocate positions in order. The key commitment contract will contain its own internal registry that maps each key storage to an ID, and for each key, it will store hash(key, key storage address) rather than just key, to explicitly inform other L2s about the key storage referred to by a specific entry.

The advantage of this technique is that it performs very well on L2. The data is only 100 bytes, about 4 times shorter than ZK-SNARKs, and much shorter than Merkle proofs. The computational cost is mainly a size-2 pairing check, about 119,000 gas. On L1, computation is more important than data, so unfortunately, KZG is slightly more expensive than Merkle proofs.

How do Verkle trees work?

Wallet

What Verkle trees look like. In practice, for IPA-based trees, each node can be given a width of 256 == 2⁸, or for KZG-based trees, a width of 2²⁴

Proofs in Verkle trees are longer than KZG; they can be several hundred bytes long. They are also difficult to verify, especially when you try to merge many proofs into one.

From a practical perspective, Verkle trees should be considered similar to Merkle trees, but more feasible without SNARKs (because of lower data costs), and cheaper when using SNARKs (because of lower proof costs).

The main advantage of Verkle trees is the potential for coordinated data structures: Verkle proofs can be used directly on L1 or L2 states without layering structures, and use the exact same mechanism for both L1 and L2. Once quantum computers become a problem, or once proving Merkle branches becomes efficient enough, Verkle trees can be replaced with appropriately SNARK-friendly hash functions into binary hash trees.

Aggregation

If N users make N transactions (or more realistically, N ERC-4337 UserOperations), needing to prove N cross-chain claims, we can save a lot of gas by aggregating these proofs: merging these transactions into a block or bundling them into a block builder can create a single proof that simultaneously proves all these claims.

This could mean:

A ZK-SNARK proof of N Merkle branches
A KZG multi-proof
A Verkle multi-proof (or a ZK-SNARK of a multi-proof)

In all three cases, each proof only costs a few hundred thousand gas. Builders need to create such a proof for the users of that L2 on each L2; therefore, for this proof to be useful, the entire scheme needs to have enough usage so that there are often at least a few transactions in the same block across multiple major L2s.

If using ZK-SNARKs, the main marginal cost is the "business logic" of passing numbers between contracts, so each user might incur a few thousand L2 gas. If using KZG multi-proofs, verifiers need to add 48 gas for each L2 holding key storage used within that block, so the marginal cost for each user's scheme will increase by about 800 L1 gas for each L2 (not per user). But these costs are much lower than the costs of not aggregating, which inevitably involves each user exceeding 10,000 L1 gas and hundreds of thousands of L2 gas. For Verkle trees, you can directly use Verkle multi-proofs, adding about 100-200 bytes per user, or you can do a ZK-SNARK of a Verkle multi-proof, which costs similarly to a ZK-SNARK of Merkle branches but is clearly cheaper to prove.

From an implementation perspective, allowing bundlers to aggregate cross-chain proofs through the ERC-4337 account abstraction standard may be the best approach. ERC-4337 already has a mechanism for builders to aggregate parts of UserOperations in a custom way. There is even an implementation for BLS signature aggregation, which can reduce L2 gas costs by 1.5 to 3 times, depending on what other forms of compression are included.

Wallet

Diagram from a BLS wallet implementation post showing the workflow of BLS aggregated signatures in early versions of ERC-4337. The workflow for aggregating cross-chain proofs may look very similar

Direct State Reading

The final possibility, which only applies to L2 reading L1 (not L1 reading L2), is to modify L2s to allow them to make static calls directly to L1 contracts.

This can be achieved through an opcode or precompile that allows calling L1, where you provide the target address, gas, and calldata, and it returns the output, although because these calls are static calls, they cannot actually change any L1 state. L2 must know about L1 to handle deposits, so there is nothing fundamentally preventing such an implementation; it is mainly a technical implementation challenge (see: Optimism's RFP supporting static calls to L1).

Note that if the key storage is on L1 and L2 integrates L1's static call functionality, then no proof is needed at all! However, if L2 is not on L1, then no proof is needed! However, if L2 has not integrated L1 static calls, or if the key storage is on L2 (which it may ultimately have to be once L1 becomes too expensive for users to even use a little), then proof is needed.

How does L2 learn the latest Ethereum state root?

All of the above schemes require L2 to access the latest L1 state root or the entire latest L1 state. Fortunately, all L2s have some functionality to access the latest L1 state. This is because they need such functionality to handle messages from L1 to L2, most obviously deposits.

In fact, if L2 has a deposit function, you can use L2 as it is to move the L1 state root into L2's contract: just let L1's contract call the BLOCKHASH opcode and pass it as a deposit message to L2. The complete block header can be received, and its state root can also be extracted on the L2 side. However, for each L2, it is best to have a clear method to directly access the complete latest L1 state or the latest L1 state root.

The main challenge in optimizing how L2 receives the latest L1 state root is to achieve both security and low latency simultaneously:

If L2 implements the "direct reading of L1" functionality in a lazy manner, only reading the finalized L1 state root, then the latency is typically 15 minutes, but in extreme cases of inactive leakage (which you must tolerate), the latency could be several weeks.
L2 can certainly be designed to read a closer L1 state root, but because L1 can revert (even a single-slot final state can revert during inactive leakage), L2 also needs to be able to revert. From a software engineering perspective, this is technically challenging, but at least Optimism already has this capability.
If you use a deposit bridge to bring the L1 state root into L2, then simple economic feasibility may require a long time between deposit updates: if the total cost of a deposit is 100,000 gas, assuming ETH is $1800 and fees are 200 gwei, if the L1 root is brought into L2 once a day, this would be a cost of $36 per day for each L2, or $13,148 per year for each L2 to maintain the system. If delayed by an hour, that rises to $315,569 per year for each L2. In the best case, impatient wealthy users continuously flood in, paying update fees and keeping the system updated for others. In the worst case, some altruistic actors will have to pay for it themselves.
"Oracles" (at least, the kind of technology some definers call "oracles") are not an acceptable solution here: wallet key management is a very security-critical low-level function, so it should rely on at most a few very simple, cryptographically untrusted low-level infrastructures.

Additionally, in the opposite direction (L1s reading L2):

In optimistic rollups, due to the delay of fraud proofs, state roots take a week to reach L1. In ZK rollups, due to the combination of proof time and economic constraints, it currently takes several hours, although future technologies will reduce this.
Pre-confirmation (from sequencers, testers, etc.) is not an acceptable solution for L1 reading L2. Wallet management is a very security-critical low-level function, so the security level of L2->L1 communication must be absolute: a false L1 state root should not even be pushed by taking over the L2 validator set. The only state root that L1 should trust is the one accepted as the final state root by the contract holding the L2 state root.

For many DeFi use cases, the speed of these trustless cross-chain operations is unacceptable; for these use cases, you do need faster bridging and less robust security models. However, for use cases of updating wallet keys, longer delays are acceptable: you are not delaying transactions for hours; you are delaying key changes. You are just keeping the old key longer. If you change keys because of a theft, then you do have a fairly long vulnerable period, but this can be mitigated, for example, through a wallet freeze function.

Ultimately, the best delay-minimizing solution is for L2 to implement direct reading of the L1 state root in the best way, where each L2 block (or state root computation log) contains a pointer to the most recent L1 block, so if L1 reverts, L2 can also revert. The keystore contract should be placed on the mainnet or on L2, as L2 is ZK-rollups, so it can quickly submit to L1.

Wallet

L2 chain blocks can not only depend on previous L2 blocks but also on L1 blocks. If L1 reverts such links, L2 will also revert. Notably, this is also how the early (pre-Dank) version of sharding was envisioned to work; see the code here

How much connection does another chain need to have with Ethereum to hold its keystore rooted in Ethereum or L2 wallets?

Surprisingly, not much. In fact, it doesn't even need to be a rollup: if it is an L3 or a validium, then holding wallets there is fine as long as you hold the key storage on L1 or ZK rollups. What you need is for people on the chain to have direct access to Ethereum's state root and to be technically and socially committed to reconstructing when Ethereum reconstructs, and to hard fork when Ethereum hard forks.

An interesting research question is to determine to what extent a chain can establish this form of connection with multiple other chains (such as Ethereum and Zcash). It is naively possible: if Ethereum or Zcash reorganizes, your chain can agree to reorganize (hard fork if Ethereum or Zcash hard forks), but this creates a dual technical and political dependency for your node operators and your community more broadly. Thus, such technology can be used to connect to some other chains, but the costs will increase. ZK bridge-based schemes have appealing technical characteristics, but they have a critical weakness in that they are not robust against 51% attacks or hard forks. There may be more clever solutions.

Protecting Privacy

Ideally, we also want to protect privacy. If you have many wallets managed by the same key storage, then we want to ensure:

Not letting the public know that these wallets are interconnected.
Social recovery guardians do not know what addresses they are guarding.

This raises some issues:

We cannot directly use Merkle proofs because they do not protect privacy.
If we use KZG or SNARKs, then the proof needs to provide a blind version of the verification key without revealing the location of the verification key.
If we use aggregation, then the aggregator should not know the plaintext location; instead, the aggregator should receive blind proofs and have a way to aggregate these proofs.
We cannot use the "lightweight version" (only using cross-chain proofs to update keys) because it creates privacy leaks: if many wallets are updated simultaneously due to an update program, then the timing will leak information about the potential correlation of these wallets. So we must use the "heavyweight version" (providing cross-chain proofs for each transaction).

Using SNARKs, the solution is conceptually easy: proofs are inherently information-hiding, and the aggregator needs to produce a recursive SNARK to prove the SNARKs.

Wallet

The main challenge of this approach today is that aggregation requires the aggregator to create a recursive SNARK, which is currently quite slow.

For KZG, we can use this work on non-index revealing KZG proofs (see also: a more formal version of this work in the Caulk paper) as a starting point. However, the aggregation of blind proofs is an open problem that needs more attention.

Unfortunately, directly reading L1 from L2 does not protect privacy, although implementing direct reading functionality is still very useful, both to minimize latency and because of its utility for other applications.

Summary

To have cross-chain social recovery wallets, the most realistic workflow is to maintain a key storage in one place while having wallets in many places, with wallets reading the key storage to either (i) update their local view of the verification key or (ii) during the verification of each transaction.
A key factor in achieving this is cross-chain proofs. We need to work on optimizing these proofs. Whether ZK-SNARKs, waiting for Verkle proofs, or custom KZG solutions seem to be the best options.
In the long run, aggregation protocols, where bundlers generate aggregate proofs as part of creating bundles of all UserOperations submitted by users, are necessary for minimizing costs. This may need to be integrated into the ERC-4337 ecosystem, although modifications to ERC-4337 may be required.
L2 should be optimized to minimize the latency of reading L1 state (or at least the state root) from within L2. Directly reading L1 state from L2 is ideal and can save proof space.
Wallets can not only be on L2, but you can also place wallets on systems with lower connection levels to Ethereum (L3s, or even standalone chains that only agree to include Ethereum state roots and reconstruct or hard fork when Ethereum reconstructs or hard forks).
However, key storage should be on L1 or on a high-security ZK-rollup L2. Storing on L1 can save a lot of complexity, although in the long run, even that may become too expensive, necessitating the establishment of key storage on L2.
Protecting privacy will require additional work and make some options more difficult. However, we should perhaps move towards privacy-protecting solutions, ensuring that any proposals we put forward are compatible with privacy protection.

Ethereum News and Research

Tracking news and research related to Ethereum

Topic or theme

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.