Paradigm: A Detailed Analysis of Ethereum's Historical Growth Issues and Their Solutions
Original Title: How to Raise the Gas Limit, Part 2: History Growth
Original Authors: Storm Slivkoff, Georgios Konstantopoulos
Original Compilation: Luffy, Foresight News
History growth is currently the biggest bottleneck for Ethereum scalability. Surprisingly, history growth has become a larger issue than state growth. Within a few years, historical data will exceed the storage capacity of many Ethereum nodes.
The good news is:
- History growth is a problem that is easier to solve than state growth.
- Solutions are actively being developed.
- Addressing history growth will alleviate the state growth problem.
In this article, we will continue to explore the Ethereum scalability issues discussed in Part 1, now shifting our focus from state growth to history growth. Using fine-grained datasets, our goals are 1) to technically understand Ethereum's scaling bottlenecks, and 2) to facilitate discussions around optimal solutions regarding Ethereum's gas limit.
What is History Growth?
History is the collection of all blocks and transactions executed by Ethereum throughout its entire lifecycle, encompassing all data from the genesis block to the current block. History growth refers to the accumulation of new blocks and transactions over time.
Figure 1 illustrates the relationship between history growth and various protocol metrics and Ethereum node hardware constraints. Compared to state growth, history growth is limited by a different set of hardware constraints. History growth puts pressure on network I/O because new blocks and transactions must be transmitted across the network. It also stresses the storage space of nodes, as each Ethereum node stores a complete copy of the historical record. If the rate of history growth exceeds these hardware limitations, nodes will no longer be able to reach stable consensus with their peers. For an overview of state growth and other scalability bottlenecks, see Part 1 of this series.
Figure 1: Ethereum Scalability Bottlenecks
Until recently, most of each node's network throughput was used to transmit historical records (e.g., new blocks and transactions). This changed with the introduction of blobs in the Dencun hard fork. Blobs now account for a significant portion of node network activity. However, blobs are not considered part of the historical record because 1) they are only stored by nodes for 2 weeks before being discarded, and 2) they do not need to replicate the data since Ethereum's genesis. Due to (1), blobs do not significantly increase the storage burden on each Ethereum node. We will discuss blobs further in the later sections of this article.
In this article, we will focus on history growth and discuss the relationship between history and state. Since state growth and history growth share some overlapping hardware constraints, they are related issues, and solving one can help address the other.
How Fast is History Growth?
Figure 2 shows the history growth rate since Ethereum's genesis. Each vertical line represents a month of growth. The y-axis indicates the number of gigabytes of historical growth for that month. Transactions are categorized by their "target address" and measured in RLP (https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/) byte size. Contracts that cannot be easily identified are classified as "unknown." The "other" category includes a range of smaller categories such as infrastructure and gaming.
Figure 2: Ethereum History Growth Rate Over Time
Several key points from the chart above:
- History growth is 6 to 8 times faster than state growth: The history growth rate recently peaked at 36.0 GiB/month and is currently at 19.3 GiB/month. The peak state growth rate is about 6.0 GiB/month and is currently at 2.5 GiB/month. A comparison of history and state in terms of growth and cumulative size will be discussed later in this article.
- Before Dencun, the history growth rate was accelerating: While state growth has been roughly linear over the years (see Part 1), history has exhibited super-linear growth. Given that linear growth rates lead to quadratic growth in overall scale, super-linear growth rates will lead to overall scale exceeding quadratic growth. This acceleration suddenly stopped after Dencun. This marks the first significant decline in history growth rate for Ethereum.
- Most recent history growth comes from Rollups: Each L2 publishes its transaction copies back to the mainnet. This generates a large amount of historical records and has made Rollups the most significant contributor to history growth over the past year. However, Dencun allows L2s to use blobs instead of historical records to publish their transaction data, so Rollups no longer generate most of Ethereum's historical records. We will discuss Rollups in more detail later in this article.
Who are the Biggest Contributors to Ethereum's History Growth?
The amount of history generated by different contract categories reveals how Ethereum's usage patterns have evolved over time. Figure 3 shows the relative contributions of various contract categories. This is the same data as in Figure 2, but normalized.
Figure 3: Contributions of Different Contract Categories to History Growth
These data reveal four distinct periods in Ethereum's usage patterns:
- Early Days (Purple): In the initial years of Ethereum, there was almost no on-chain activity. Most of these early contracts are now difficult to identify and are marked as "unknown" in the chart.
- ERC-20 Era (Green): The ERC-20 standard was finalized at the end of 2015 but did not gain significant traction until 2017 and 2018. ERC-20 contracts became the largest source of historical growth in 2019.
- DEX/DeFi Era (Brown): DEX and DeFi contracts appeared on-chain as early as 2016 and began to gain attention in 2017. However, it wasn't until the DeFi summer of 2020 that they became the largest category of historical growth. DeFi and DEX contracts accounted for over 50% of historical growth at times in 2021 and 2022.
- Rollup Era (Gray): In early 2023, L2 Rollups began executing more transactions than the mainnet. In the months leading up to Dencun, they generated about 2/3 of Ethereum's historical records.
Each era represents a more complex usage pattern of Ethereum than the previous one. Over time, complexity can be seen as a form of Ethereum's scaling that cannot be measured by simple metrics like transactions per second.
In the most recent data month (April 2024), Rollups no longer produce most of the historical records. It remains unclear whether future historical records will stem from DEX and DeFi or if new usage patterns will emerge.
What About Blobs?
The Dencun hard fork introduced blobs, significantly altering the dynamics of history growth, allowing Rollups to use cheap blobs instead of historical records to publish data. Figure 4 zooms in on the history growth rates before and after the Dencun upgrade. This chart is similar to Figure 2, except each vertical line represents a day instead of a month.
Figure 4: Impact of Dencun on History Growth
From this chart, we can draw several key conclusions:
- Since Dencun, Rollup's history growth has decreased by about 2/3: Most Rollups have transitioned from call data to blobs, significantly reducing the amount of historical records they generate. However, as of April 2024, some Rollups have yet to transition from call data to blobs.
- Since Dencun, total history growth has decreased by about 1/3: Dencun only reduced the history growth of Rollups. Other contract categories have seen a slight increase in history growth. Even after Dencun, history growth remains 8 times that of state growth (details in the next section).
Although blobs have reduced the speed of history growth, they are still a new feature of Ethereum. It remains unclear what level of history growth will stabilize in the presence of blobs.
How Fast of a History Growth is Acceptable?
Raising the gas limit will increase the history growth rate. Therefore, proposals to raise the gas limit (such as Pump the Gas) must consider the relationship between history growth and the hardware bottlenecks of each node.
To determine an acceptable history growth rate, it is essential to understand how long current node hardware can sustain the network and storage. Networked hardware may be able to maintain the status quo indefinitely, as the history growth rate is unlikely to return to pre-Dencun peak levels before increasing the gas limit. However, the storage burden of history will continue to grow over time. Under the current storage strategy, each node's storage drive will eventually be filled with historical records, which is inevitable.
Figure 5 shows the changing storage burden on Ethereum nodes over time and predicts the growth of storage burden over the next 3 years. The forecast references the growth rate as of April 2024. This growth rate may rise or fall with future usage patterns or gas limit changes.
Figure 5: Size of Historical Records, State, and Full Node Storage Burden
From this chart, we can draw several key conclusions:
- Historical records occupy about 3 times the storage space of state. This difference will increase over time, as the history growth rate is about 8 times that of state.
- 1.8 TiB is a critical threshold, at which many nodes will be forced to upgrade their storage drives. 2 TB is a common storage drive size, providing only 1.8 TiB of usable space. Note that TB (terabytes) and TiB (tebibytes, = 1024^4 bytes) are different units. For many node operators, the "real" critical threshold is even lower, as merged validators must run consensus clients alongside execution clients.
- The critical threshold will be reached within 2 to 3 years. Raising any amount of gas limit will correspondingly accelerate the arrival of this time. Reaching this threshold will impose a significant maintenance burden on node operators and require the purchase of additional hardware (e.g., a $300 NVME drive).
Unlike state data, historical data is only additive and accessed much less frequently. Therefore, in theory, historical data can be stored separately from state data on cheaper storage media. This can be achieved with some clients like Geth.
In addition to storage capacity, network I/O is another major limitation of history growth. Unlike storage capacity, network I/O constraints will not pose immediate problems for nodes, but these constraints will become crucial for future increases in gas limits.
To understand how much history growth a typical Ethereum node's network capacity can support, it is necessary to know the relationship between history growth and various network health metrics, such as reorganization rate, missed slots, finality misses, proof misses, sync committee misses, and block submission latency. Analyzing these metrics is beyond the scope of this article, but more information can be found in previous surveys of consensus layer health. Additionally, the Ethereum Foundation's Xatu project has been building public datasets to accelerate such analyses.
How to Solve the History Growth Problem?
History growth is a problem that is easier to solve than state growth. It can almost entirely be addressed by the candidate proposal EIP-4444. This EIP changes each node's requirement from storing the entire Ethereum historical data to only storing one year of historical data. After implementing EIP-4444, data storage will no longer be a bottleneck for Ethereum scalability, and increasing gas limits will not be constrained in the long term. EIP-4444 is necessary for the long-term sustainability of the network; otherwise, the speed of history growth will quickly necessitate regular hardware upgrades for network nodes.
Figure 6 shows the impact of EIP-4444 on each node's storage burden over the next 3 years. This is similar to Figure 4 but adds lighter lines indicating the storage burden after the implementation of EIP-4444.
Figure 6: Impact of EIP-4444 on Ethereum Node Storage Burden
From this chart, we can see some key conclusions:
- EIP-4444 will halve the current storage burden. The storage burden will decrease from 1.2 TiB to 633 GiB.
- EIP-4444 will stabilize the historical storage burden. Assuming a constant history growth rate, historical data will be discarded at the rate it is generated.
- After EIP-4444, node storage burdens will take many years to reach today's levels. This is because state growth will be the only factor increasing the storage burden, and the growth rate of state is slower than that of history growth.
After the implementation of EIP-4444, history growth will still impose some degree of storage burden, as nodes will store one year of historical records. However, even if Ethereum reaches global scale, this burden is manageable. Once the method of storing historical records proves reliable, the one-year expiration time of EIP-4444 may be shortened to months, weeks, or even shorter.
How to Preserve Ethereum's Historical Records?
EIP-4444 raises a question: If historical records are not preserved by Ethereum nodes themselves, how should they be preserved? Historical records play a central role in Ethereum's validation, accounting, and analysis, making their preservation crucial. Fortunately, preserving historical records is a straightforward issue that only requires 1/n honest data providers. This contrasts sharply with the state consensus problem, which requires 1/3 to 2/3 of participants to be honest. Node operators can verify the authenticity of the historical dataset by 1) replaying all transactions since the genesis block and 2) checking whether these transactions reproduce the same state root as the current blockchain endpoint.
There are many ways to preserve historical records.
- Torrents/P2P: Torrents are the simplest and most reliable method. Ethereum nodes can periodically package portions of historical records and share them as public torrent files. For example, a node might create a new historical torrent file every 100,000 blocks. Node clients like Erigon have already implemented this process to some extent in a non-standardized way. To standardize this process, all node clients must use the same data format, parameters, and P2P network. Nodes will be able to choose whether to participate in this network based on their storage and bandwidth capabilities. The advantage of torrents is that they utilize a high-lindy open standard that has already received substantial support from data tools.
- Portal Network: The Portal Network is a new network designed specifically for hosting Ethereum data. It is a torrent-like approach that also provides additional features to make data verification easier. The advantage of the Portal Network is that these additional verification layers provide utility for light clients to effectively verify and query shared datasets.
- Cloud Hosting: Cloud storage services like AWS's S3 or Cloudflare's R2 offer a cheap and high-performance option for preserving historical records. However, this approach brings more legal risks and business operational risks, as there is no guarantee that these cloud services will always be willing and able to host cryptocurrency data.
The remaining implementation challenges are more social than technical. The Ethereum community needs to coordinate specific implementation details to integrate them directly into each node client. In particular, executing full synchronization (rather than snapshot synchronization) from the genesis block will require retrieving historical records from historical record providers rather than Ethereum nodes. These changes do not technically require a hard fork, so they can be implemented even before Ethereum's next hard fork, Pectra.
All these historical preservation methods can also be used by L2s to preserve the blob data they publish to the mainnet. Compared to historical preservation, blob preservation is 1) more challenging due to the significantly larger total data volume; and 2) less critical, as blobs are not necessary for replaying mainnet history. However, blob preservation is still necessary for each L2 to replay its own history. Therefore, some form of blob preservation is important for the entire Ethereum ecosystem. Moreover, if L2s develop robust blob storage infrastructure, they may also be able to easily store L1 historical data.
It would be helpful to directly compare the datasets stored by various node configurations before and after EIP-4444. Figure 7 shows the storage burdens of different types of Ethereum nodes. State data refers to accounts and contracts, historical data refers to blocks and transactions, and archival data is an optional set of data indices. The byte counts in this table are based on recent reth snapshots, but the numbers for other node clients should be roughly comparable.
Figure 7: Storage Burden of Different Types of Ethereum Nodes
In other words,
- Archival nodes store state data, historical data, and archival data. Archival nodes can be used when someone wants to easily query the historical state of the chain.
- Full nodes store only historical data and state data. Most nodes today are full nodes. The storage burden of full nodes is about half that of archival nodes.
- Full nodes after EIP-4444 will store only state data and the historical data of the past year. This will reduce the storage burden of nodes from 1.2 TiB to 633 GiB and stabilize the storage space for historical data.
- Stateless nodes, also known as "light nodes," do not store any datasets and can immediately validate at the end of the chain. This type of node becomes possible once Verkle tries or other state commitment schemes are added to Ethereum.
Finally, there are some additional EIPs that can limit the history growth rate, not just adapt to the current growth rate. This helps maintain network I/O constraints in the short term and helps keep storage constraints in the long term. While EIP-4444 is necessary for the long-term sustainability of the network, these other EIPs will help Ethereum scale more effectively in the future:
- EIP-7623: Repricing call data to make certain overused call data transactions more expensive. Making these usage patterns more costly will force some of them to transition from call data to blobs. This will reduce the history growth rate.
- EIP-4488: Imposing limits on the total amount of call data that can be included in each block. This will impose stricter limits on the speed of historical growth.
These EIPs are easier to implement than EIP-4444, so they may serve as short-term expedients before EIP-4444 is put into production.
Conclusion
The purpose of this article is to understand 1) how history growth works and 2) how to address this issue through data. Many of the data in this article are difficult to obtain through traditional means, so we hope to make this data public to provide new insights into the history growth problem.
History growth, as a bottleneck for Ethereum scalability, has not received enough attention. Even without increasing the gas limit, Ethereum's current practice of preserving historical records will force many nodes to upgrade their hardware within a few years. Fortunately, this is not a difficult problem to solve. A clear solution is already present in EIP-4444. We believe that the implementation of this EIP should be expedited to allow room for future increases in gas limits.