Paradigm: Challenges and Solutions for Ethereum State Growth
Written by: Storm Slivkoff, Georgios Konstantopoulos
Compiled by: Luffy, Foresight News
The growth of Ethereum's state and its relationship with gas limits is widely misunderstood. It is commonly believed that state growth is the primary scalability bottleneck for Ethereum. However, discussions about state growth are often hindered by imprecise terminology and a lack of detailed quantitative evidence.
A data-driven approach can clarify the state growth issue. In this article, we utilize high-resolution datasets to understand the magnitude and shape of state growth. In the process, we arrive at a surprising conclusion: modern consumer hardware can sustain the current rate of state growth for at least a decade. Furthermore, considering the continuous improvements in software and hardware, this runway may be extended indefinitely.
We believe Ethereum has a clear roadmap: 1) completely eliminate state growth as a scalability bottleneck; 2) raise gas limits to levels that support a globally scaled decentralized financial system. The goal of this series of articles is to develop a scientific approach to understanding and formulating this scalability roadmap.
This article is the first part of a series on Ethereum scalability, primarily focusing on state growth. Part 2 discusses historical growth, Part 3 addresses state access, and Part 4 covers gas limits.
What is State Growth?
The term "state growth" is often used to summarize any scalability bottleneck in Ethereum, namely when the size of data exceeds the capacity of Ethereum node hardware. However, state growth should not be thought of in such a singular way. Ethereum data comes in various types, each with a unique relationship to the underlying hardware components of the nodes. Therefore, it is crucial to use precise terminology to explain each different scalability bottleneck.
State is a set of data required to build and validate new Ethereum blocks. It consists of contract bytecode, contract storage, account balances, and account nonce values. History is the dataset required for nodes to synchronize from the genesis block to the latest block. History consists of blocks and transactions. State and history are non-overlapping datasets. From these definitions, there are at least three distinct phenomena that put significant pressure on node hardware:
- State growth: The accumulation of new accounts, new contract bytecode, and new contract storage.
- Historical growth: The accumulation of new blocks and new transactions.
- State access: A set of read and write operations used to build and validate blocks.
Each bottleneck has a unique relationship with the hardware limitations of the nodes. The four most relevant hardware limitations are:
- Network IO is the amount of upload and download speed that nodes must maintain to achieve stable consensus with peer nodes.
- Storage size is the amount of data that nodes must keep in permanent storage to build, validate, and distribute blocks.
- Memory size is the amount of data that nodes must cache in memory to stay in sync with the end of the blockchain.
- Storage IO is the amount of read and write operations that nodes must perform per second to stay in sync with the end of the blockchain.
The relationship between these bottlenecks and hardware limitations is illustrated in Figure 1.
Figure 1: Ethereum Scalability Bottlenecks
Starting from the top of the figure, every time Ethereum executes a transaction, all resources used by that transaction are priced in gas. Therefore, Ethereum's gas limit is a one-dimensional quantity that rate-limits all forms of on-chain activity. Downstream of the gas limit are block size and operations per block. The more bytes in each block, the faster historical growth occurs. The more IO operations per block, the greater the state access rate, and (generally) the greater the state growth rate.
Thus, scalability bottlenecks are related to the hardware constraints of nodes, as follows:
- To support significant state growth, nodes must have sufficient storage and memory space. If the state becomes too large, it will either not fit in storage, or the frequently accessed portion of the state will not fit in memory, leading to degraded performance.
- To support significant historical growth, nodes must have sufficient network bandwidth to share large amounts of block data and enough storage capacity to store that data.
- To support significant state access, nodes must have ample memory to cache hot state and sufficient storage IO to support enough read and write operations.
Particularly for state growth, the main challenge is to ensure that the rate of growth of state size does not outpace the continuous improvements in consumer hardware. Node memory and storage are limited resources, and they will eventually hit a bottleneck unless state growth stops or hardware is upgraded regularly. Fortunately, memory and storage hardware have been improving over the years. Even so, accurate predictions of these improvements remain uncertain, and it should not be assumed that their rapid growth will continue indefinitely.
Note that the upcoming EIP-4844 introduces data blobs that will bring some changes to these scalability relationships. After EIP-4844, it is expected that the accumulated history on disk will be significantly less, and network IO may increase substantially when transferring large amounts of blob data.
In this article, we will primarily focus on state size and state growth rate, rather than memory size and state access patterns. We will explore other topics in future work.
Composition of Ethereum State
The next step in understanding state growth is to examine the total scale of the state and the size of each state contribution. Currently, the amount of Ethereum state data is approximately 245.5 GB. This figure is measured using a reth node, but the numbers for each node client can be roughly compared, as shown in the table. Accounts, contract bytecode, and contract storage account for 14.1%, 4.3%, and 81.7% of the state, respectively.
Figure 2 shows how much state scale is occupied by various types of smart contract protocols. In the figure, the size of each contract category represents the number of bytes occupied by its storage slots and bytecode.
Figure 2: Distribution of Ethereum State
The numbers in Figure 2 represent the total number of bytes that node clients must store on disk. This includes data used for indexing and other types of storage overhead. The average storage size per account and per storage slot is 133.6 bytes and 191.3 bytes, respectively.
Here are some of the most important takeaways from Figure 2:
- Tokens are the largest contributors to state. The biggest contributors to Ethereum state are ERC-20 and ERC-721 tokens, which occupy 27.2% and 21.6% of the state, respectively. Tokens occupy such a large portion of the state because each user's balance for each token must be stored separately in its own 32-byte storage slot. Therefore, half of the Ethereum state size is proportional to the total number of Ethereum users and the total number of tokens held by each user.
- At least 7.4% of Ethereum's state is dormant. Some of the largest contracts in Ethereum state are no longer active. These protocols were launched when block space and state space were much cheaper than they are now, including most protocols in the gaming, gambling, and scam categories, as well as many inactive DEXs, including IDEX, Etherdelta, and Oasis. These protocols collectively account for at least 7.4% of Ethereum state. The true level of dormant state is likely higher, as it also includes long-tail projects in ERC-20, ERC-721, and other categories.
- L2 cross-chain bridges occupy less than 2% of Ethereum state. By utilizing techniques such as compression, ZK proofs, and improved encoding, L2 transactions utilize state more efficiently than mainnet transactions. Although L2 accounts for only 2% of mainnet state, the total number of transactions per second on L2 is five times that of the mainnet.
How Fast is Ethereum State Growing?
The most important aspect of state growth is the change in the rate of state growth over time. This rate reveals the severity of the state issue and its trends.
Figure 3 shows the state growth rate since Ethereum's inception in 2015. These growth rates are calculated by summing the contract bytecode and contract storage in each contract category.
Figure 3: Growth of Ethereum State Over Time
Here are some of the most important takeaways from Figure 3:
- Currently, the state is growing at about 2.62 GB per month, down from a peak of 5.99 GB per month. Based on these numbers, it is predicted that the total state size will be between 396 GB and 606 GB in five years. While one might describe the current growth rate as 12.8% per year, the absolute growth rate has been declining even as the state continues to grow, so simple exponential growth may not be an appropriate model.
- The recent decline in state growth is primarily due to a decrease in NFT activity. While one might expect some degree of correlation between different types of network activity, there is surprising independence among various state contributors. For example, despite the overall state growth rate declining in recent years, the ERC-20 state growth rate has actually been increasing each year since 2020.
- State growth has reached its lowest level since 2021. This decline is quite surprising, but it makes sense considering that state is primarily proportional to new token balances. If the state growth rate has been declining, one might think that Ethereum has the capacity to support higher gas limits. This may be true, but it is important to remember: 1) under the current gas pricing model, there is nothing to prevent a new surge in growth rates, and 2) state is not the only bottleneck downstream of gas limits.
What is an Acceptable State Growth Rate?
We now know the 1) scale, 2) composition, and 3) growth rate of Ethereum state. How do we determine the range of acceptable state growth rates? This question is complex because it depends on both unpredictable market forces and philosophical choices about what trade-offs Ethereum should make.
Let’s start with the simplest model, assuming no improvements in hardware in the future, and how long the current level of state growth can be sustained on ordinary consumer hardware. As shown in Figure 3, in recent years, the annual growth of state has been between 31 GB/year and 72 GB/year. Currently, common consumer hardware has a maximum storage capacity of about 4 TB and a memory capacity of about 64 GB. From this, we can create a simple storage and memory demand forecasting model:
- Storage: Nodes currently need to store about 1 TB of state data in total. In practice, this means many nodes are using disks that are at least 2 TB in size. For simplicity, let’s ignore future historical growth, as if we are in a post-EIP-4444 world. We can calculate the future runtime as: (remaining storage capacity) / (state growth rate), as shown in the table. Therefore, node storage hardware can support the current rate of state growth for over a decade without exhausting 2 TB of space. At the current rate of state growth, 4 TB would be sufficient to support operation for nearly half a century.
- Memory: Ethereum-on-arm users report that the minimum viable memory for running an Ethereum node is about 16 GB. If we assume that memory demand grows proportionally with state size, then an annual state growth of 30 GB to 72 GB would translate to an additional memory requirement of 2 GB - 4.7 GB per year. Therefore, at the current gas rate, 32 GB of RAM should be sufficient for 3 to 8 years. 64 GB of RAM should be sufficient for 10 to 23 years.
This is a simplified model with many assumptions. Possible conditions for scaling this model include 1) historical growth, 2) non-linear scaling of memory demand, 3) reduced hardware costs, 4) increased gas limits, 5) opcode gas repricing, and 6) future improvements in Ethereum architecture. Each of these factors can interact non-linearly and evolve over time. We will explore these model extensions in future work.
It must be emphasized that long-term sustainability is a good thing. Even if modern hardware can support years of operation, one should not take for granted the shortening of operational time. Any plans to accelerate state growth should include a significant buffer to accommodate unpredictable changes in the hardware or software environment.
How to Address the State Growth Issue?
Many different proposals have been made to address the state growth issue. Three improvements to Ethereum's architecture stand out: Rollups, Verkle tries, and state expiration. In summary, these constitute a comprehensive roadmap for addressing short-term, medium-term, and long-term state growth issues.
Short-term: Rollups do not solve the state growth problem, but they do alleviate the burden on the network. As shown in Figures 2 and 3, Rollups can utilize state more efficiently than the mainnet. Moving activity to L2 does require a certain amount of state to be stored on the mainnet to support user exits. However, the state footprint of L2 transactions is far lower than that of mainnet transactions. Therefore, Rollups can sustainably increase the total activity in the ecosystem. With the upcoming EIP-4844, the adoption of Rollups is expected to grow, and blobs will make Rollups cheaper.
Medium-term: Verkle tries address the state growth issue for validator nodes but do not solve the problem for nodes that need to construct new transactions. Verkle tries are a new data structure for Ethereum state. They support more efficient light clients and "stateless" nodes. These nodes will be able to validate new blocks without knowing the existing state values. This eliminates the state growth problem for validator nodes. Constructing new transactions still requires storage and access to state, but this is still more sustainable than our current situation, as transaction construction is a task that can be easily distributed across many machines. In terms of scope, Verkle tries represent a significant engineering effort that may take years to implement.
Long-term: State expiration addresses the state growth issue for all nodes but requires additional infrastructure. State expiration allows nodes to discard inactive portions of the state, as shown in Figure 2. Note that the term "state dormancy" may be a more appropriate name, as most existing proposals allow for the recovery of "expired" state through proof. Concerns about the loss of expired state over time can be mitigated as long as historical records (block and transaction data) are available to reconstruct the state. Therefore, whatever solution is developed for the historical preservation issue of EIP-4444 will also address the state preservation issue. However, if Verkle tries successfully achieve their goals, state expiration may become unnecessary.
There are more solutions to the state growth problem, including state rent and sharding, but historically, these may impact user experience or soundness. To achieve a final solution in the more distant future, it may be necessary to combine these solutions with others.
Conclusion
Although state growth is a key challenge for scaling Ethereum, we believe it is a solvable problem. Through our interpretation of the data, Ethereum can sustain the current level of state growth for many years and provides a comfortable buffer for architectural upgrades.
We believe that an empirical approach is crucial for designing Ethereum's gas limits and guiding Ethereum towards a final scalability solution. This article is just one step towards achieving that goal. There are other types of data beyond state, each of which imposes burdens on Ethereum nodes and Ethereum gas limits. We hope to explore these other bottlenecks in future work.