Solana Co-founder: What are the solutions for Solana's state growth?
Author: toly, Co-founder of Solana
Compiled by: Felix, PANews
Every day, about 1 million new accounts are added to Solana, and the total state has now exceeded 500 million, with a snapshot size of about 70GB. With improvements in hardware, these numbers are completely manageable on their own, but the goal of the SVM runtime is to provide the cheapest hardware access method. To achieve this, state and memory must be managed within the current hardware limitations.
PCI Bandwidth
As of 2024, the latest PCI bandwidth can reach a throughput of 0.5 Tbs to 1 Tb, or 64GB to 128GB per second. While this sounds large, if a tx reads/writes 128MB, the 128GBps PCI bandwidth would limit the chain's TPS to around 1000. In reality, most txs access memory that has been recently loaded and cached in RAM. An ideal design should allow loading 1000 txs with 128MB of new state, plus 10k or more txs reading and writing existing cached state.
Account Indexing
Creating a new account requires proof that the account does not currently exist. This is typically done automatically on each validator, as each validator has a complete index of all valid accounts. Even if account data is not stored locally, only the hash of the data is stored, 500 million accounts would require 32 bytes for the key + 32 bytes for the data hash, or 64 bytes per entry, totaling 32 GB. This is already sufficient to ensure separation of RAM and disk.
Snapshot Size
At certain snapshot sizes, if part of the network experiences hardware failures, the time required to cold start a new system can significantly extend the worst-case reboot time. With improvements in bandwidth and hardware, the situation is changing daily, and Solana is not close to this limit, but the limit exists at any given point in time.
Summary
Memory and disk have different performance characteristics and limitations. If SVM does not differentiate, then transactions and limits must be priced for the worst-case scenario, which in turn limits performance. During transaction execution, all account keys must be available at least, and the total number of accounts will affect RAM and disk PCI bandwidth utilization. Snapshots cannot be arbitrarily increased. The ideal solution is:
Allow more txs that do not require PCI resources to be packed into blocks.
Manage the total index size and snapshot size.
Chilly, Avocado, LSR. Bad names are often a hallmark of excellent software design. The engineers of Anza and Firedancer came up with the following solutions.
Chilly
The cache for account runtime is deterministically managed by all instances. At a higher level, this is an LRU cache for accessing state. During block building and scheduling, this implementation can easily check accounts without needing to lock or iterate through the LRU cache. The cache is implemented with a very simple counter mechanism.
Total loaded bytes are tracked as Bank::loaded_bytes:u64.
Each account is marked with the current running total account::load_counter:u64 when in use.
When loading an account, if Bank::loadedbytes - Account::loadcounter > CACHESIZE, the account is considered a cold account, with its size calculated based on the LOADLIMIT per block.
New account load_counter is 0, so all new accounts are cold accounts.
The leader's scheduler treats LOAD_LIMIT as a watermark, similar to write lock CU limits.
The brilliance of this design lies in its natural fit with the current scheduler. Users only need to worry about their priority fees. The scheduler must handle the problem of packing all txs below the LOADLIMIT and account write lock limits. The highest priority txs can be loaded first and use LOADLIMIT. Once this limit is reached, all other txs can still be packed into a block. Thus, validators can maximize the caching locality of txs.
Avocado
Avocado consists of two parts: state compression and index compression. First, account data is replaced with hashes, and then the account index is migrated to a Binary Trie / Patricia Trie. New accounts must provide proof that they do not exist in the "trie."
State Compression
The rough design is as follows:
During allocation, each account binds X lamports per byte.
If X < current economic floor price, the account remains in memory and will be compressed.
Compression is a multi-step process that runs over an epoch.
Account data is replaced with hash values (data).
Account keys remain in state.
Transactions referencing compressed accounts fail.
Decompression requires uploading data similar to the loader.
The cost of decompression should be the same as allocating a new account.
It is estimated that 75% of accounts have not been accessed for over 6 months and are unlikely to ever be accessed. Compressing them can save 50% of the snapshot size.
Index Compression
This is a more challenging problem to solve. Even with state compression, validators still possess all possible valid accounts in the system. Creating new accounts requires checking this database. The cost for validators to store this database is high, while the cost for users to create new accounts is low. It must be ensured that new private keys do not conflict with existing accounts.
Binary Trie Mining
The Binary Trie is tracked as part of the snapshot.
Validators wishing to earn extra SOL can create a transaction to remove compressed account kv pairs from the state and add them to the Binary Trie.
Users can remove kv from the Trie during decompression, effectively reversing this action (this may require atomic operations during decompression to make it easier while the background service compresses accounts).
For validators, regardless of how many kv pairs it contains, the size of the Trie root is constant.
Using ZKP, each tx can compress about 30 accounts.
Assuming only one per block, compressing 500 million accounts would take about 80 days.
The key point of this process is that validators executing this will be rewarded, but not all validators must perform this. If all validators had to perform this, then all would need to maintain the contents of the current Binary Trie, meaning the entire state must be part of the snapshot. Validators wanting to maintain the entire state should submit a transaction to compress N accounts in the index into the Trie.
New Account Proof
To create a new account, users must prove that the account does not exist in the Trie. Validators maintaining the entire state can generate proof that the account is not in the Trie. This places a burden on users, who must always connect to a large state provider to generate these proofs.
Alternatively, users can prove that their account was created with the most recent PoH hash. The simplest way to support this is:
Generate a new PKI.
The account address is hash(recent PoH hash, PKI::public_key).
Given that accounts in the Trie must first undergo state compression, this requires a complete epoch. Any account in the Trie cannot use the most recent PoH hash to generate an address.
Other supporting methods could be that the PKI creation itself can provide a proof that the private key was created with hash(user hidden secret, recent PoH hash).
LSR
Lightweight Simple Rent, also known as Less Stupid Rent. How to price the cost of allocating new accounts and ensure that old abandoned accounts are eventually compressed, reducing the overall load on the system and the price for new users?
The rent system needs to be restored. Rent refers to the fee that accounts should pay at the current state of X dollars/byte/day, just like accounts pay for storage on AWS.
Rent Rate Bonding Curve
RentRate = K*(state_size)^N
Regardless of the current state size, if it is small, the rate should be low; if it approaches the snapshot limit, the rate should be very high.
Allocation Minimum Bonding Price
Accounts must exist for at least one epoch. Allocation requires bringing the account into a Hot state. Hot accounts should exist during the caching period.
New Account bond = Epoch Slots * RentRate * Account::size
There must be at least this much lamports in the balance of new accounts to create them.
Hot Account Burn
lruturnverrate = average time each account occupies in the LRU cache, with a maximum of 1 epoch. This value can be a constant or calculated off-chain and reported to SVM as a median equity-weighted constant.
Compression
When (current slot - account::creation_slot) * RentRate * account::size > account::lamports, compress the account and burn all lamports.
The above solutions should make the state very cheap, as over time, unused accounts will eventually reach lamports 0 and be compressed. Thus, data overhead will decrease, and even index overhead will decrease, reducing the size of the current state. Reducing the size of the state will lower the cost of super-quadratic allocation.