Analysis of Solana's Scalability Mechanism: An Extreme Attempt to Sacrifice Usability for High Efficiency | CatcherVC Research
Author: SA, CatcherVC
Technical Advisor: Liu Yang, Author of "Embedded System Security"
Abstract
- Solana's scalability is mainly based on: efficient use of network bandwidth, reducing communication between nodes, and accelerating node computation speed. These measures directly shorten the block production and consensus communication time, but also reduce system availability (security).
- Solana publicly discloses the list of block producers (Leaders), revealing a single trusted data source, which reduces consensus communication overhead. However, this also brings security risks such as bribery and targeted attacks.
- Solana treats consensus communication (voting information) as a transaction event, with over 70% of TPS components being consensus messages, the TPS related to user transactions is about 500---1000;
- Solana's Gulf Stream mechanism eliminates the global transaction pool, which improves transaction processing speed but reduces the efficiency of filtering out spam transactions, making Leaders prone to downtime.
- Solana's Leader nodes publish transaction sequences rather than actual blocks. Combined with the Turbine transmission protocol, transaction sequences can be fragmented and distributed to different nodes, achieving extremely fast data synchronization.
- POH (Proof Of History) is essentially a method of timing and counting, which assigns serial numbers to different transaction events, generating transaction sequences. The Leader effectively publishes a globally consistent timer (clock) within the transaction sequence. During a very short window, the ledger advancement and time progression of different nodes are consistent;
- Solana has 132 nodes occupying 67% of the staking share, with 25 of these nodes holding 33% of the staking share, essentially forming an "oligarchy" or "senate." If these 25 nodes conspire, they could cause chaos in the network;
- Solana has high hardware requirements for nodes, achieving vertical scalability at the cost of device expenses. Most individuals running Solana nodes are whales or institutions, which is not conducive to true decentralization.
- In summary, Solana pushes Layer 1 scalability to extremes with high-end node devices, disruptive consensus mechanisms, and data transmission protocols, essentially reaching the TPS bottleneck that a non-sharded public chain can maintain. However, multiple downtimes have already indicated the outcome of sacrificing availability/security for efficiency.
Introduction
2021 was a turning point for blockchain and crypto. With concepts like Web3 becoming mainstream, the public chain sector experienced the strongest traffic growth in history. In this external environment, Ethereum, with its sufficient decentralization and security, became the North Star of the Web3 world, but its efficiency issues became its "Achilles' heel." Compared to VISA, which easily exceeds a thousand TPS, Ethereum's mere tens of transactions per second seem like a baby in a cradle, far from its grand vision of being a "world-class decentralized application platform."
In response, new public chains like Solana, Avalanche, Fantom, and Near, which focus on scalability, became major players in the Web3 narrative, attracting massive capital. Taking Solana as an example, this so-called "Ethereum killer" saw its market cap soar 170 times in 2021, reaching its peak and even briefly surpassing established public chains like Polkadot and Cardano, showing signs of competing with Ethereum.
However, this aggressive momentum did not last long. On September 14, 2021, Solana experienced its first downtime due to performance issues, lasting 17 hours, causing the price of the SOL token to drop rapidly by 15%; in January 2022, Solana experienced another downtime lasting 30 hours, sparking widespread discussion; in May, Solana faced two downtimes, and another in early June. According to Solana's official statements, its mainnet has experienced at least eight performance degradations or downtimes.
(Comment by Liu Feng, co-founder of Chain News on Solana)
With the emergence of various issues, critics led by Ethereum supporters have repeatedly questioned Solana, with some even dubbing it "SQLana" (SQL is a system for managing centralized databases), leading to a plethora of comments and analyses. To this day, discussions about Solana's real usability seem never to cease, attracting countless curious observers. Driven by interest and attention to mainstream public chains, CatcherVC will provide a brief interpretation of Solana's scalability mechanism and some reasons for its downtimes from its own perspective in this article.
Solana System Architecture, Consensus Mechanism, and Block Transmission Process
The efficiency of a public chain mainly refers to its ability to process transactions, which is the throughput TPS (transactions per second). This metric is influenced by block production speed and block capacity, and it also affects transaction fees and user activity. From the much-discussed EOS in 2018 to the recently launched Optimism, all scalability solutions almost invariably revolve around the most critical element of "accelerating block production."
To enhance block production speed, it is often necessary to "manipulate" the block production process, and Solana is no exception. Its scalability approach mainly focuses on efficient use of network bandwidth, reducing the number of communications between nodes, and increasing the speed of node transaction processing. These measures directly shorten the time for block production and consensus communication. Solana's founder Anatoly Yakovenko and his team have meticulously crafted every detail, sacrificing system usability (security) to improve efficiency, essentially reaching the practical TPS limit of a non-sharded public chain, ultimately resulting in an "expensive" innovation.
Compared to other public chains that use POS, Solana's biggest innovation lies in its unique consensus protocol and network node communication method. This consensus protocol is based on POS and PBFT (Practical Byzantine Fault Tolerance), introducing the original POH (Proof of History) as a mechanism for advancing the blockchain ledger, creating its own unique consensus system.
From a purely formal perspective, Solana's consensus protocol is similar to Cardano's earliest Ouroboros algorithm, both containing two major time units: Epoch and Slot. Each Slot lasts about 0.4 to 0.8 seconds, equivalent to the time interval of one block. Each Epoch cycle contains 432,000 Slots (blocks), lasting 2 to 4 days.
In Solana's system architecture, the most important roles are divided into two categories: Leader (block producer) and Validator (verifier). Both are full nodes that have staked SOL tokens, but in different Slots (block production cycles), different full nodes will act as Leaders, while those not elected as Leaders will become Validators.
At the beginning of each new Epoch cycle, the Solana network randomly selects a block producer Leader rotation list based on the staking weight of each node, "designating" the block producers for different times in the future. Throughout the entire Epoch (2 to 4 days), block producers will rotate according to the specified order in the list, changing the Leader node every 4 Slots (block production cycles).
(A portion of the block producer rotation list for Solana's 313th Epoch)
By publicly disclosing the future block producer nodes in advance, the Solana network effectively gains a reliable new block data source, greatly facilitating the consensus process.
Overview of Solana's Block Production Process
To better understand Solana's scalability mechanism, let's start with the logic of block production and analyze Solana's general structure:
After a user initiates a transaction, it is directly forwarded to the Leader node by the client, or it is first received by a regular node and then immediately forwarded to the Leader;
The block producer Leader receives all pending transactions in the network, executing them while sorting the transaction instructions into a transaction sequence (similar to a block). After a certain period, the Leader sends the sorted transaction sequence to the Validator nodes for verification;
- Validators execute transactions in the order specified by the transaction sequence (block), generating corresponding state information (executing transactions changes the state of nodes, such as altering the balances of certain accounts);
4. After sending N transaction sequences, the Leader periodically publishes its local state, and Validators will compare it with their own state, casting affirmative/negative votes. This step is similar to the "checkpoint" in Ethereum 2.0 or other POS public chains.
If, within the specified time, the Leader collects affirmative votes from nodes accounting for 2/3 of the staking weight in the network, the previously published transaction sequence and state will be finalized, and the "checkpoint" will pass, equivalent to the block achieving final confirmation;
Generally, the Validator nodes that cast affirmative votes will have executed the same transactions as the block producer Leader, and the data will be synchronized.
Every 4 Slot cycles, the Leader will switch, meaning the Leader has about 1.6 to 3.2 seconds of "supreme authority" over the network each time.
Detailed Analysis of Solana's Scalability Mechanism
On the surface, Solana's block production logic is generally consistent with other public chains that use POS mechanisms, involving a process of publishing blocks and voting on them. However, if we observe each step closely, it is not difficult to find that there are significant differences between Solana and other public chains, which is the root of its high TPS and low availability:
1. The most important point: Solana publicly discloses the Leader for each Slot cycle in advance, significantly reducing the workload of the consensus process. In other POS public chains, due to the lack of a single, trusted block producer, the consensus communication efficiency of the network is extremely low, resulting in time complexity often several orders of magnitude higher than Solana, which becomes a bottleneck for most public chains in terms of TPS.
Taking mainstream POS consensus protocols or PBFT algorithms as examples, most of these algorithms adopt the same time units and role divisions as Solana, with similar settings for Epochs, Slots, Leaders, Validators, and Votes, differing only in parameter settings and terminology. The biggest difference is that these algorithms often prioritize security (availability) and do not disclose the Leader list in advance.
(For example, Cardano also generates a Leader rotation list in advance, but it is not publicly disclosed. Each selected Leader only knows when to produce a block but does not know who the block producers are at other times. This makes the block producer unpredictable to the outside world.)
Without a public block producer, nodes will "distrust each other" and "act independently." At this point, if a node claims to be a legitimate block producer, others may not trust it and must require it to provide relevant Proof. However, the generation, propagation, and verification of such Proof will waste bandwidth resources and create additional workloads (even potentially involving ZK zero-knowledge proofs). Solana's public disclosure of each Leader for each period avoids such troubles.
More importantly, in the vast majority of POS consensus protocols or PBFT-type algorithms, voting on new blocks (a block must receive affirmative votes from 2/3 of the nodes in the network to be finalized) is often conducted by each node through a "gossip protocol," sending or collecting votes in a one-on-one manner, somewhat akin to viral random diffusion, essentially equivalent to requiring communication between every two nodes, with complexity and time consumption far exceeding Solana's consensus protocol.
In PBFT algorithms like Tendermint, a single Validator node must collect a single vote from at least 2/3 of the nodes in the network. If the total number of nodes in the network is N, then each node must receive at least 2/3×N votes, resulting in a total communication count of at least 2/3×N² across the entire network, which is clearly too large (proportional to the square of N). If the number of nodes is high, the time taken for the consensus process will often increase sharply.
(In a typical public chain, each node's voting propagation method is similar; each node will perform this process once, and each block production will require N such propagations.)
(For related popular science, see: "Avalanche DEX Developers Explain the Avalanche Consensus Mechanism")
In response, Solana and Avalanche have improved the communication process for collecting votes in different ways, reducing time complexity. Simply put, the Leader aggregates all votes cast by Validators, then packages these votes together (writing them into the transaction sequence) and pushes them to the network in one go.
As a result, nodes no longer need to frequently exchange vote information through "gossip protocols," reducing the number of communications to a constant N or even logN, which significantly shortens block production time and greatly increases TPS.
Currently, Solana's block production cycle is essentially consistent with the duration of a single Slot, lasting 0.4 to 0.8 seconds, even three times faster than Avalanche. (The blocks displayed on the Solana browser are essentially the transaction sequences published by the Leader within each Slot.)
However, this also brings another problem: the Leader publishing the voting information of nodes within the transaction sequence (block) occupies block space. In Solana's setup, the Leader essentially treats consensus voting as a type of transaction event, and the transaction sequences it publishes contain node votes, which are the main components of Solana's TPS (generally accounting for over 70%).
According to data statistics from the Solana browser, its actual TPS maintains around 2000 to 3000, with over 70% being consensus voting messages unrelated to ordinary users, while the actual TPS related to user transactions remains at 500 to 1000. Although this is still an order of magnitude higher than high-performance public chains like BSC, Polygon, and EOS, it still fails to reach the tens of thousands level touted by officials.
At the same time, if Solana continues to increase its level of decentralization in the future, allowing more nodes to participate in consensus voting (currently there are nearly 2000 Validators), the transaction sequences published by the Leader will inevitably contain more voting messages, continuously compressing the TPS space related to user transactions. This indicates that under the premise of no sharding, Solana will find it difficult to achieve higher TPS.
To some extent, Solana's transaction processing capacity of 500 to 1000 transactions per second has reached the peak of a non-sharded public chain. Given a larger number of nodes, no sharding, and support for smart contracts, it will be challenging for new public chains to surpass Solana's TPS level unless they adopt a "committee" model, allowing only a small number of nodes to participate in consensus, or regress to centralized servers. As long as the number of nodes participating in consensus is high, it will be difficult to achieve a higher "verifiable TPS" than Solana.
It is particularly noteworthy that because the list of block producers within each Epoch (2 to 4 days) is disclosed in advance, Solana's consensus protocol is not fundamentally different from the original Tendermint algorithm, as it effectively does not grant block producers unpredictability. Everyone can predict who will produce blocks at a future time, which creates many potential issues regarding availability/security.
(The Leader is vulnerable to premeditated DDoS attacks, increasing the failure rate; if several Leaders fail consecutively, the network is likely to go down; and users can bribe Leaders in advance, etc.)
2. Gulf Stream and Network Downtime: The public disclosure of the Leader list in Solana serves another important purpose: to work in conjunction with its unique Gulf Stream mechanism to improve the speed of transaction processing in the network.
After a user initiates a transaction, it is often directly forwarded to the designated Leader by the client program, or it is first received by a regular node and then quickly sent to the Leader. This method allows the Leader to receive transaction requests as quickly as possible, improving response speed. (This is called the Gulf Stream mechanism, which is one of the main reasons for Solana's downtimes.)
Solana's setup is a completely different transaction submission method compared to other public chains. Gulf Stream eliminates the "global transaction pool" setup of Bitcoin and Ethereum, and regular nodes do not run large-capacity transaction pools. When a node receives a user's pending transaction, it only needs to pass it to the Leader without sending it to other nodes, significantly improving efficiency. However, by eliminating the transaction pool, regular nodes cannot efficiently intercept spam transactions, making it easy for Leader nodes to go down.
To deeply understand this, we can compare it to ETH:
· Each Ethereum full node has a storage area called the transaction pool (mempool) for storing unconfirmed, pending transaction instructions.
· When a node receives a new transaction request, it first filters it, determining whether the transaction instruction is compliant (whether it is a duplicate/spam transaction), then stores it in the transaction pool before forwarding it to other nodes (viral diffusion).
· Ultimately, a legitimate pending transaction will spread throughout the network, entering the transaction pools of all nodes, allowing different nodes to access the same data, demonstrating "consistency."
Ethereum and Bitcoin adopt this mechanism for clear reasons: they do not know who the future block producers will be, and everyone has a chance to package new blocks. Therefore, it is necessary to ensure that different nodes receive the same pending transactions to prepare for block packaging.
If a mining pool node publishes a new block, the receiving nodes will parse the transaction sequence within it, execute it in order, and then remove this portion of transaction instructions from the transaction pool. Thus, a batch of pending transactions can be added to the blockchain.
Solana has eliminated the type of transaction pool that Ethereum uses, where pending transactions do not need to randomly diffuse throughout the network but are quickly submitted to the designated Leader, then packaged into transaction sequences and distributed in one go (similar to the earlier method of distributing votes). Ultimately, a transaction only needs to be included in the transaction sequence and spread once throughout the network (whereas Ethereum actually requires two spreads). In cases of high transaction volume, this subtle difference can significantly improve propagation efficiency.
However, according to the technical description of the transaction pool TxPool, the transaction pool/mempool essentially serves as a data buffer and filter, enhancing the usability of public chains. All nodes run transaction pools, collecting all pending transactions in the network, allowing different nodes to independently filter out spam requests and intercept duplicate transactions in real-time, alleviating traffic pressure. Although using a transaction pool may slow down block production speed, if a user initiates a duplicate transaction (a request already recorded in the transaction pool), or other types of spam requests, the receiving nodes can filter it locally and will not forward it, distributing the filtering workload across the entire network.
(In the Ethereum network, malicious users' duplicate transactions are more easily intercepted directly by various nodes.)
Solana takes the opposite approach. Under the Gulf Stream mechanism, regular nodes do not operate a globally consistent transaction pool and cannot efficiently intercept duplicate/spam transactions. The only thing regular nodes can do is check whether the transaction data packet conforms to the correct format, unable to identify malicious duplicate requests. At the same time, since regular nodes "push" transaction instructions to the Leader all at once, it effectively shifts the burden of filtering transactions onto the Leader itself. In cases of high traffic and numerous duplicate transactions, the Leader node may become overwhelmed and unable to produce blocks smoothly, causing consensus votes to fail to propagate, leading to potential network crashes.
In response, Solana's founder Anatoly Yakovenko stated on January 27 of this year that during the public sale period of certain popular projects, nearly 2 million transaction requests per second reached the same Leader node, with over 90% being completely identical duplicate transactions, ultimately causing Solana to go down.
(Reference: "In-depth Investigation: Why Do New Public Chains Frequently Experience Downtime?")
In summary, Ethereum essentially sacrifices efficiency for security, while Solana sacrifices security for efficiency. The issues it faces can be summarized as follows:
Since the order of Leader rotation is predetermined, it must continue to follow this rotation chain. However, due to the imperfect traffic distribution mechanism, the likelihood of Leader node failures is high. If user traffic is excessive during a certain period (such as during the launch of certain hot NFTs), multiple Leaders may fail in succession (for example, if the Leaders for the next 40 Slots cannot produce blocks smoothly), thus the consensus process is obstructed, the network may fork, and the Leader rotation chain may completely break, ultimately leading to a total collapse.
3. Turbine Transmission Protocol Similar to BT Seeds: In conjunction with the Gulf Stream mechanism mentioned earlier, the Leader quickly receives all transaction requests within a certain period, checks their legitimacy, and then executes the transactions. At the same time, the Leader uses a mechanism called POH (Proof of History) to assign a serial number to each transaction and order the transaction events. (Details will be elaborated later)
After the Leader orders the transaction events, it will split the transaction sequence into X different fragments and send them to the X Validators with the most staked assets, which will then propagate them to other Validators. The Validator group will exchange the received fragments among themselves, locally piecing together the complete transaction sequence (block).
To facilitate understanding, we can view each different fragment as a small block with reduced data volume. When the Leader distributes X fragments externally, it is equivalent to issuing X different small blocks, allowing different nodes to receive and further disseminate them.
Solana's message distribution method is quite unique, inspired by BT seeds. (The principle is not easy to express in words; the main idea is to utilize the idle bandwidth of multiple nodes simultaneously for parallel data transmission.) Generally speaking, the more fragments a transaction sequence is split into, the faster the node group can disseminate the fragments and piece together the transaction sequence, significantly improving data synchronization efficiency.
In contrast, in other public chains, block producers send the same block to X neighboring nodes, equivalent to copying a block X times and sending it out, rather than distributing X different fragments (small blocks). This approach results in significant data redundancy and bandwidth waste. The root cause lies in the traditional block structure being indivisible, making flexible transmission impossible, while Solana simply replaces the block structure with a transaction sequence, combined with a Turbine protocol similar to BT seeds, achieving high-speed data distribution and greatly enhancing throughput TPS.
Solana has stated that under the Turbine transmission protocol, with 40,000 nodes in the network, a transaction sequence can be synchronized to all nodes within 0.5 seconds.
At the same time, under the Turbine protocol, nodes are categorized into different levels (priorities) based on the weight of their staked assets. Validators with more staked assets receive data from the Leader first, which is then passed on to the next layer. In this mechanism, the group of nodes accounting for 2/3 of the total staked assets will be the first to record the transaction sequence issued by the Leader, accelerating the confirmation speed of the ledger (block).
According to data disclosed by the Solana browser, currently, 2/3 of the staking weight is divided among 132 nodes. Combined with the previously mentioned propagation mechanism, these nodes will receive the transaction sequence issued by the Leader first and will also be the first to cast votes. As long as the transaction sequence published by the Leader receives votes from these 132 nodes, it can be finalized. From a certain perspective, these nodes get ahead of others, and if they conspire, they could create malicious scenarios.
More importantly, currently, 25 nodes in Solana occupy 1/3 of the staking weight. According to Byzantine fault tolerance theory, as long as these 25 nodes conspire (for example, deliberately not sending votes to a certain Leader), they could throw the Solana network into chaos. To some extent, the "oligarchy" issue that Solana faces is something all public chains using POS voting systems should pay attention to.
4. POH (Proof Of History): As mentioned earlier, the Turbine protocol allows the Leader to fragment the transaction sequence and publish different fragments. This approach requires a guarantee: the transaction sequence must be easily reconstructable after being fragmented. To address this issue, Solana deliberately incorporates erasure codes (to prevent data loss) into the data packets and introduces the original POH (Proof Of History) mechanism to order transaction events.
In Solana's white paper, Yakovenko uses the hash function SHA256 as an example to demonstrate the principle of POH. For ease of understanding, this article will explain the POH mechanism with the following example:
(Since the POH and corresponding time evolution logic are difficult to describe in words, it is recommended to first read the Chinese white paper of Solana for an interpretation of POH, and then use the following segments of this article as supplementary reading.)
· The input and output values of the SHA256 function are uniquely mapped (one-to-one). After inputting parameter X, there is only one unique output result SHA256(X)=?; different X will yield different ?=SHA256(X);
· If we recursively compute SHA256, for example:
Define X2 = SHA256(X1), then calculate X3 = SHA256(X2), and then X4 = SHA256(X3), and so on, iterating this process, Xn = SHA256(X[n-1]);
· By repeatedly executing this process, we will ultimately obtain a sequence of X1, X2, X3……Xn, which has a characteristic: Xn = SHA256(X[n-1]), where the later Xn is the "descendant" of the earlier X[n-1].
· Once this sequence is publicly released, if someone wants to verify the correctness of the sequence, for example, to determine whether Xn is indeed the "legitimate descendant" of X[n-1], they can directly substitute X[n-1] into the SHA256 function and check if the result matches Xn.
· In this mode, without X1, X2 cannot be obtained; without X2, X3 cannot be obtained…… without Xn, the subsequent X[n+1] cannot be obtained. This way, the sequence gains continuity and uniqueness.
· The most critical point: transaction events can be inserted into the sequence. For example:
After X3 is born and before X4 appears, a transaction event T1 can be used as an external input, combined with X3, yielding X4 = SHA256(X3+T1). Here, X3 appears slightly earlier than T1, and X4 is the descendant born from (T1+X3). T1 is essentially sandwiched between the "birthdays" of X3 and X4.
Following this logic, T2 can be introduced after X8 is generated, as an external input, calculating X9=SHA256(T2+X8), thus placing T2's occurrence time between the "birthdays" of X8 and X9;
• In the above scenario, the actual POH sequence takes the following form:
In this sequence, the transaction events T1 and T2 are data inserted into the sequence from the outside, and in the POH sequence's time record, they occur after X3 and before X4.
As long as the order number of T1 in the POH sequence is provided, it can be determined how many SHA256 computations occurred before it (T1 is preceded by X1, X2, X3, which means 3 SHA256 computations occurred).
The same reasoning applies to T2, which is preceded by eight Xs (X1~X8), indicating eight computations occurred.
The above process can be explained in layman's terms: a person uses a stopwatch to count seconds, and whenever they receive a letter, they record the time on the envelope according to the stopwatch reading. After receiving ten letters, the recorded seconds on these letters will definitely differ, allowing for a distinction of order, and the time intervals between letters can also be determined.
· When the Leader publishes the transaction sequence, as long as it provides the value of X3 in T1's data packet and indicates that X3 is the third one, the receiving Validator can parse the complete POH sequence prior to T1;
As long as the value of X8 and its order number (8) are provided in T2's data packet, the Validator can parse the complete POH sequence prior to T2;
· According to the POH setup, as long as each transaction's order in the POH sequence (Counter) is marked and the adjacent X value (Last Valid Hash) is provided, the order of each transaction can be disclosed. Due to the nature of the SHA256 function, this order determined through hash computation is difficult to tamper with.
At the same time, Validators know how the Leader derives the POH sequence; they can perform the same operations to restore the complete POH sequence and verify the correctness of the data published by the Leader.
For example, if the data packet of the transaction sequence published by the Leader is:
T1, order number 3, adjacent to X3;
T2, order number 5, adjacent to X5;
T3, order number 8, adjacent to X8;
T4, order number 10, adjacent to X10;
Initial value of POH sequence X1;
When Validators receive the above data packets, they can use X1 as the initial parameter, iteratively substitute it into the SHA256 function, and parse the complete POH sequence as follows:
Thus, as long as the total number of Xs in the sequence is known, the number of SHA256 computations performed by the calculator can be determined. By estimating the time taken for each hash computation, the time intervals between different transactions can be roughly determined (for example, if T1 and T2 are separated by two Xs, then they are separated by two SHA256 computations, approximately ?? milliseconds). Knowing the time intervals between different transactions allows for easier determination of the occurrence time of each transaction, saving a lot of trouble.
· Generally speaking, the Leader will continuously execute the SHA256 function to obtain new Xs, pushing the sequence forward. If there are transaction events, they will be used as external inputs and inserted into the sequence;
· If a node attempts to publish a tampered sequence in the network, replacing the version published by the Leader, for example, replacing X2 with X2', resulting in the sequence X1, X2', X3……Xn, it is evident that others can compare X3 and SHA256(X2') to discover that the two do not match, indicating sequence forgery.
Thus, the forger must replace all Xs after X2', but doing so is costly, especially when the number of Xs is large, making forgery very time-consuming. In this scenario, the best approach is not to forge; after receiving the sequence issued by the Leader, simply forward it unchanged to other nodes.
Additionally, considering that the Leader adds its own digital signature to each data packet published, the sequences propagated within the network are actually "unique" and "difficult to tamper with."
5. Consistent Time Advancement Across the Network: Solana's founder Yakovenko has emphasized that the greatest role of POH is to provide a "globally consistent single clock" (which should actually be translated as: consistent time advancement across the network).
This statement can be understood as follows:
The Leader node publishes a unique, tamper-resistant transaction sequence within the network. Based on the data packets of this transaction sequence, nodes can parse the complete POH sequence, and the POH sequence is Solana's original timing method, serving as a time reference.
As mentioned earlier, since the Leader continuously executes SHA256 hash computations, pushing the POH sequence forward, this sequence records the results of N hash computations, corresponding to N computation processes, and contains time progression. Solana treats the number of computations as a unique timing method.
In the original parameter setup, it is assumed that 2 million hash computations correspond to 1 second in reality, with each Slot block production cycle being 400 ms, meaning that each Slot generates a POH sequence containing 800,000 hash computations. Solana also created a term called Tick, analogous to the ticking sound of a clock's hand moving forward. According to the setup, each Tick should contain 12,500 hash computations, with each Slot cycle containing 64 Ticks, and every 160 Ticks corresponding to 1 second in reality.
The above is merely the ideal state setup; in actual operation, the number of hash computations that can be produced per second is often not fixed, so the actual parameters should be dynamically adjusted. However, the above explanation can clarify the general logic of the POH mechanism, allowing Solana nodes to determine whether a Slot has ended and whether it is time for the next Leader to appear (every 4 Slots is a rotation) based on the number of hash computations contained within the POH sequence.
Since the starting point of each Slot can be determined, Validators will classify the transaction sequences sandwiched between the starting points as a block. Once confirmed, it is equivalent to advancing the ledger by one block, and the system moves forward by one Slot.
This can be summarized in one sentence:
"The clock's hand does not turn back, but we can push it forward with our own hands." --- Shinji Ikari, "Neon Genesis Evangelion"
In other words, as long as nodes receive the same transaction sequence, the POH sequence they parse and the corresponding time advancement will be consistent. This creates a "globally consistent single clock" (consistent time advancement across the network).
(In the original Tendermint algorithm, each node adds the same block to its local ledger, maintaining consistent block height and rarely forking, so according to Solana's explanation of "time advancement," the time advancement of different nodes in Tendermint should also be consistent.)
Furthermore, since the order of transaction events in the POH sequence is predetermined, nodes can independently determine how many hash computations (Xs) separate different transactions, allowing for a rough estimation of the time difference △T between different transactions.
With a rough estimate of △T and a certain initial timestamp TimeStamp 0, it becomes possible to roughly estimate the occurrence time of each event (timestamp) like a line of dominoes falling.
For example:
If T1 occurs at 01:27:01, and T2 is separated from T1 by 10,000 hash computations (10,000 Xs), if 10,000 hash computations take about 1 second, then T2 likely occurs 1 second after T1, which is 01:27:02. Following this logic, the occurrence times (timestamps) of all transaction events can be roughly estimated, providing immense convenience and allowing nodes to independently confirm the delivery time of certain data.
At the same time, the POH mechanism also facilitates the timing of when each node casts its vote. The Solana white paper states that Validators should provide votes within 0.5 seconds after the Leader publishes the State status information.
If 0.5 seconds corresponds to 1 million hash computations (the aforementioned 1 million Xs), and after the Leader publishes the State, if the subsequent sequence does not include a Validator's vote for 1 million consecutive hash computations, it can be inferred that this node is slacking and has not fulfilled its voting obligation within the specified time, at which point the system can implement corresponding punitive measures (Tower BFT).
6. Similarities with Optimism: The above describes Solana's original POH (Proof Of History), which is similar to the transaction ordering forms of Optimism and Arbitrum, both establishing an "immutable, uniquely determined" sequence of transaction events through computations related to hash functions, which are then published by the Leader/Sequencer to the Validator/Verifier nodes.
In Optimism, there is also a role similar to the Leader, called the Sequencer, which also eliminates block structures in data transmission, periodically publishing transaction sequences to a specific Ethereum contract address, allowing Validators to read and execute them themselves. As long as different Validators receive the same transaction sequence, the states they obtain after execution must also be the same. At this point, by comparing the state of the Sequencer, each Validator can verify its correctness without needing to communicate with other nodes.
In Optimism's consensus mechanism, there is no requirement for interaction between different Validators, nor is there a step for collecting votes; "consensus" is actually implicit. If a Validator finds that the state information provided by the Sequencer/Leader is incorrect after executing the transaction sequence, they can initiate a "challenge" to question the Sequencer/Leader. However, in this model, Optimism provides a 7-day finalization window for transaction events; after the Sequencer publishes the transaction sequence, it needs to go unchallenged for 7 days to be finally confirmed, which is clearly unacceptable to Solana.
Solana requires Validators to provide votes as quickly as possible, aiming to enable the network to quickly reach consensus and finalize transaction sequences, thus achieving higher efficiency than Optimism.
Additionally, Solana's method of distributing and verifying transaction sequences is more flexible, allowing a sequence to be fragmented and distributed, creating a perfect environment for the implementation of the Turbine protocol;
At the same time, Solana allows nodes to run multiple computational components simultaneously, verifying different fragments in parallel, significantly saving time. In OP and Arbitrum, such practices are not permitted; Optimism directly maps one transaction to one executed state, providing transaction sequences in a Transaction---State mapping format, which can only be computed step by step by a single CPU core from start to finish to verify the correctness of the entire sequence, making it relatively cumbersome and inefficient. Solana's POH sequence can be verified from any position, and multiple computing units can simultaneously verify different POH segments, laying the foundation for a multi-threaded parallel verification model.
7. Vertical Scalability Targeting Nodes Themselves: The above describes Solana's improvements in block production processes, consensus mechanisms, and data transmission protocols. In addition, Solana has created mechanisms called Sealevel, Pipeline, and Cloudbreak, supporting multi-threaded, parallel, and concurrent execution modes, and allowing GPUs to serve as execution computing components, significantly increasing the speed at which nodes process instructions and optimizing hardware resource utilization, falling under the category of vertical scalability. Due to the complexity of the relevant technical details and their lack of relevance to the focus of this article, they will not be elaborated upon here.
Although Solana's vertical scalability has greatly improved the speed at which node devices process transaction instructions, it has also raised the hardware configuration requirements. Currently, Solana's node configuration requirements are very high, with many people rating it as "enterprise-level hardware," and it has been criticized as "the public chain with the most expensive node devices."
The following are the hardware requirements for Solana's Validator nodes:
CPU 12 or 24 cores, at least 128 GB of memory, 2T SSD hard drive, network bandwidth of at least 300 MB/s, generally 1GB/s.
In comparison, the current hardware requirements for Ethereum nodes (before transitioning to POS):
CPU over 4 cores, at least 16 GB of memory, 0.5 T SSD hard drive, network bandwidth of at least 25 MB/s.
Considering that the hardware requirements for Ethereum nodes will be lowered after transitioning to POS, Solana's requirements for node hardware are far higher than those of Ethereum. According to some reports, the hardware cost of a Solana node is equivalent to that of hundreds of Ethereum nodes after transitioning to POS. Due to the high operational costs of nodes, the operation of the Solana network has largely become the domain of whales and professional institutions and enterprises.
In this regard, Gavin Wood, former CTO of Ethereum and founder of Polkadot, commented after Solana's first downtime last year that true decentralization and security are more valuable than high efficiency. If users cannot run full nodes of the network themselves, then such projects will be no different from traditional banks.
Full Text Summary
- Solana's scalability is mainly based on: efficient use of network bandwidth, reducing communication between nodes, and accelerating node transaction processing speed. These measures directly shorten the time for block production and consensus communication, but also reduce system availability (security).
- Solana publicly discloses the list of block producers (Leaders) for each block production cycle, effectively revealing a single trusted data source, which significantly streamlines the consensus communication process. However, disclosing Leader information brings potential security risks such as bribery and targeted attacks.
- Solana treats consensus communication (voting information) as a transaction event, with over 70% of TPS components often being consensus messages, the actual TPS related to user transactions is about 500---1000;
- Solana's Gulf Stream mechanism effectively eliminates the global transaction pool, which, while improving transaction processing speed, means that regular nodes cannot efficiently intercept spam transactions, putting immense pressure on Leaders, making them prone to downtime. If a Leader goes down, consensus messages cannot be published normally, leading to potential network forks or even crashes;
- Solana's Leader nodes publish transaction sequences rather than actual blocks. Combined with the Turbine transmission protocol, transaction sequences can be fragmented and distributed to different nodes, achieving extremely fast data synchronization.
- POH (Proof Of History) is essentially a method of timing and counting, which can assign immutable serial numbers to different transaction events, generating transaction sequences. At the same time, since only a single Leader publishes transaction sequences at any given time, the POH timing sequence contained within means that the Leader effectively publishes a globally consistent timer (clock). Within a very short window, the ledger advancement and time progression of different nodes are consistent;
- Solana has 132 nodes occupying 67% of the staking share, with 25 of these nodes holding 33% of the staking share, essentially forming an "oligarchy" or "senate." If these 25 nodes conspire, they could cause chaos in the network;
- Solana has high hardware requirements for nodes, achieving vertical scalability at the cost of device expenses. However, this also means that individuals running Solana nodes are mostly whales or institutions, which is not conducive to true decentralization.
From a certain perspective, Solana has become the most distinctive presence among public chains, pushing the narrative of Layer 1 scalability to extremes with high-end node hardware, disruptive consensus mechanisms, and network transmission protocols, essentially reaching the TPS bottleneck that a non-sharded public chain can maintain in the long term. However, Solana's multiple downtimes seem to indicate the ultimate outcome of sacrificing availability/security for efficiency.
In the long run, decentralization and security remain the core narratives in the public chain field. Although Solana, buoyed by temporary TPS figures and financial giants like SBF, once became a treasure embraced by capital, the fate of EOS has already shown that the Web3 world does not need mere marketing and high efficiency; only truly usable entities can stand firm against the tides of history and endure forever.
(Special thanks to Mr. Liu Yang, author of "Embedded System Security," the Rebase community, and the W3.Hitchhiker team for their assistance to the author of this article)
References
1. Solana White Paper (Chinese Version)
2. Gulf Stream: Solana's Mempool-less Transaction Forwarding Protocol
6. How the High-Performance Public Chain Solana Collaborates with Binance Wallet to Speed Up?
8. The Comeback Path of Solana - SQLANA
9. Explanation of PoS and Tendermint Consensus
10. Ethereum Source Code Analysis: Transaction Buffer Pool txpool
11. Ethereum Transaction Pool Architecture Design
13. Tendermint-2 Consensus Algorithm: Detailed Explanation of Tendermint-BFT
14. Cardano (ADA) Consensus Algorithm Ouroboros
15. In-Depth Investigation: Why Do New Public Chains Frequently Experience Downtime?