Detailed Explanation of Layer1 Parallel Execution: How do Aptos, Sui, Linera, and Fuel Achieve It?
Original Title: 《The Case for Parallel Processing Chains》
Original Author: Mohamed Fouda
Original Compilation: 深潮 TechFlow
As we revisit the evolution of blockchain technology, a strong trend is emerging, where new L1s focus on parallel execution.
This is not a new technology; Solana is currently using it in the Sealevel execution environment.
However, the impressive performance of DeFi and NFTs during the last bull market has made it clear that the technology urgently needs improvement.
In the next market cycle, several notable projects adopting the concept of parallel execution are set to emerge, including Aptos, Sui, Linera, and Fuel.
This article will discuss the similarities and differences among these projects, as well as the challenges they face.
The Problem
Smart contract platforms can create a wide range of decentralized applications. To execute these applications, a shared computing engine is needed. Every node in the network runs this computing engine, executing applications and facilitating user interactions with them. When nodes arrive at the same result from execution, they reach consensus and drive the chain's operation.
The Ethereum Virtual Machine (EVM) is the primary execution engine for smart contracts (SC), with about 20 different implementations.
Since the invention of the EVM, it has established a critical mass of adoption among developers.
In addition to Ethereum and Ethereum's L2s, several other chains, including Polygon, BNB Smart Chain, and Avalanche C-Chain, have adopted the EVM as their execution engine, focusing on changing the consensus mechanism to improve network throughput.
One major limiting feature of the EVM is the sequential execution of transactions. The EVM essentially processes one transaction at a time, putting all other transactions on hold until the transaction is executed and the blockchain state is updated. Even if two transactions are independent, such as a payment from Alice to Bob and another payment from Carol to Dave, the EVM cannot execute these transactions in parallel. While this execution model allows for interesting use cases like flash loans, it is neither efficient nor scalable.
This sequential execution of transactions is one of the main bottlenecks for network throughput:
- First, it leads to longer execution times for transactions in a block, limiting block time;
- Additionally, it restricts the number of transactions that can be added to a block, allowing nodes to execute transactions and confirm blocks.
Ethereum's average throughput is about 17 tx/second. This low throughput means that during periods of high activity, such as NFT mints, network miners/validators cannot process all transactions, leading to fee bidding wars to ensure priority execution, driving transaction fees up. Ethereum's average fees have at times exceeded 0.2 ETH (about $800), causing many users to hesitate in using Ethereum.
The second problem with sequential execution is the inefficiency of network nodes. Sequential instruction execution cannot benefit from multiple processor cores, leading to low hardware utilization and inefficiency. This hinders scalability and results in unnecessary energy consumption.
Can parallel execution solve this problem?
The limitations of the EVM structure create conditions for a new realm of L1s focused on parallel execution (PE). Parallel execution allows for the division of transaction processing across multiple processor cores, improving hardware utilization and thus achieving better scalability. In high-throughput chains, the increase in hardware resources is directly related to the number of transactions that can be executed.
During periods of high activity, validator nodes can delegate more cores to handle additional transaction loads. The dynamic scaling of computational resources allows the network to achieve higher throughput during times of high demand, significantly improving user experience.
Another advantage of this approach is the improved latency of transaction confirmations; the dynamic scaling of node resources makes it possible to confirm low-latency transactions for all potential network loads.
Transactions do not need to wait for dozens or hundreds of blocks, nor do they need to incur excessive fees for priority confirmation. Improved confirmation times enhance the finality of transactions, opening the door for low-latency blockchains. The guarantee of low-latency execution of transactions makes several previously impossible use cases feasible.
Changing the chain execution model to allow for PE is not a new idea; some projects have already explored this. One approach is to replace the accounting model used by the EVM from the Accounts model to the Unspent Transaction Output (UTXO) model. The UTXO execution model used in Bitcoin allows for parallel processing of transactions, making it an ideal choice for payments.
However, due to the limited functionality of UTXO, it needs to be extended to accommodate the complex interactions required by smart contracts. For example, Cardano uses an extended UTXO model for this purpose, while Findora employs a hybrid UTXO model that implements both accounting models and allows users to change asset types between the two models.
Another approach to PE does not change the account model but focuses on improving the architecture and modification of the chain state. For example, Solana's Sealevel framework.
How does parallel execution work?
The way parallel execution works is by identifying independent transactions and executing them simultaneously. If the execution of one transaction affects the execution of another, then the two transactions are considered related. For example, AMM transactions in the same pool are related and must be executed sequentially.
While the concept of parallel processing sounds simple, the difficulty lies in the details, with the main challenge being how to effectively identify "independent" transactions. Classifying independent transactions requires understanding how each transaction alters the blockchain memory or chain state; transactions interacting with the same smart contract (like an AMM pool) can simultaneously change the contract state and therefore cannot be executed concurrently.
Given the current level of composability between applications, identifying whether transactions are related is a challenging task. Imagine an AMM transaction that swaps UNI for USDC, where the AMM finds that the most efficient route to execute it is UNI -> ETH -> DAI -> AAVE -> USDC. All pools involved in that transaction cannot process any other transactions until that transaction is fully executed, after which the states of all participating pools can be updated.
Identifying Independent Transactions
In this section, the methods used by different parallel execution engines are compared. The focus is on methods for controlling state (memory) access. The blockchain state can be thought of as a RAM storage, where each account or smart contract on the chain has a set of memory locations that can be modified. Related transactions are those that attempt to change the same memory location in the same block; different chains utilize different memory architectures and mechanisms to identify these transactions.
Several chains in this category are built on the technology developed for the now-defunct Facebook blockchain project Diem. The Diem team created the smart contract language Move, specifically to improve SC execution. Aptos, Sui, and Linera are three high-profile projects belonging to this group. In addition to this group, Fuel is another well-known project focused on PE, using its own smart contract language.
Aptos
Aptos is built on Diem's Move language and MoveVM, creating a high-throughput chain that implements parallel execution.
Aptos's approach is to detect relationships while remaining transparent to users/developers, meaning that transactions are not required to explicitly declare which part of the state (memory location) they are using.
Aptos uses a modified version of Software Transactional Memory (STM), called Block-STM.
In Block-STM, transactions are pre-sorted within blocks and divided among processor threads for execution.
During the process, the execution of transactions assumes no relationships exist. The memory locations modified by transactions are recorded, and after execution, the results of all transactions are validated. During the validation process, if a transaction is found to have accessed a memory location modified by a previous transaction, that transaction is aborted. The results of that transaction are refreshed and then re-executed.
This process repeats until all transactions in the block have been executed.
When using multiple processor cores, Block-STM accelerates execution, with the degree of acceleration depending on the interdependence of transactions.
The results from the Aptos team indicate that using 32 cores can increase high interdependence performance by 8 times and low interdependence performance by 16 times. If all transactions in a block are interdependent, then compared to sequential execution, Block-STM results in a slight performance loss. Aptos claims that this method can achieve a throughput of 160,000 TPS.
Sui
Another PE approach requires transactions to explicitly declare which parts of the chain state they modify, a method currently used by Solana and Sui.
Solana refers to memory units as accounts, and transactions must specify which accounts they modify. Sui also employs a similar approach.
Sui is also built on Diem's technology using MoveVM. However, Sui uses a different version of the Move language.
The implementation of Sui Move alters Diem's core storage model and asset permissions, representing a significant difference from Aptos, which uses core Diem Move.
Sui Move defines a state storage model that allows for easier identification of independent transactions.
In Sui, state storage is defined as Objects. Objects typically represent assets and can be shared, meaning multiple users can modify the same object. Each Object in the Sui execution environment has a unique ID and an internal pointer to the owner's address. By using these concepts, it becomes easier to identify relationships by checking whether transactions use the same Objects.
By shifting the responsibility of declaring relationships to developers, the implementation of the execution engine becomes easier, theoretically allowing for better performance and scalability. However, this comes at the cost of a less than ideal developer experience.
Sui has not yet launched and has recently just released its testnet.
The founders of Sui claim that the implementation of parallel execution, along with the use of the Narwhal and Tusk consensus mechanisms, leads to a throughput exceeding 100,000 tx/second. If true, this throughput could represent a significant improvement over Solana's current throughput of about 2,400 tx/second and would surpass the throughput of Visa and Mastercard.
Linera
Linera is the latest member in the field of parallel processing, recently announcing their first round of funding led by a16z. There are few details about the project's implementation. However, based on their funding announcement post, we know it is based on the FastPay protocol, also developed at Facebook.
FastPay is based on a technology called Byzantine Consistent Broadcast, which focuses on accelerating independent payments, such as those occurring in point-of-sale networks. It allows a group of validators to ensure the integrity of payments as long as more than two-thirds of the validators are honest. FastPay is a variant of a real-time gross settlement (RTGS) system used for networks between banks and financial institutions.
Building on FastPay, Linera is planning to establish a blockchain that focuses on fast settlement and low latency by executing payment transactions in parallel. Notably, Sui also uses a Byzantine Consistent Broadcast approach for simple payments. For other transactions, Sui's own consensus mechanisms, Narwhal and Tusk, are used for efficiently processing more complex and relational transactions like DeFi transactions.
Fuel
Fuel focuses on being the execution layer in a modular blockchain, meaning Fuel does not implement consensus or store blockchain data on the Fuel chain. For functional blockchains, Fuel interacts with other chains to achieve consensus and data availability, such as Ethereum or Celestia.
Fuel uses UTXO to create strict access lists, controlling access to the same state with a single list. This model is built on the concept of regulating transaction ordering. In this scheme, the ordering of transactions within a block significantly simplifies the detection of relationships between transactions. To implement this architecture, the Fuel team has developed a new virtual machine called FuelVM and a new language called Sway.
FuelVM is a compatible and simplified version of the EVM, effectively allowing developers to join the Fuel ecosystem.
Additionally, since Fuel focuses on modular blockchains, the execution of Fuel SC can be resolved on the Ethereum mainnet. This approach aligns with the vision of post-merge Ethereum as a rollup-centric settlement and data availability layer. In this architecture, Fuel can achieve high-throughput execution for batching and settlement on Ethereum.
To validate this concept, the Fuel team has created an AMM called SwaySwap, similar to Uniswap, and is running it on the testnet. The goal is to demonstrate that FuelVM outperforms EVM.
Challenges of Parallel Execution Approaches
While the methods of parallel execution seem logical and straightforward, we still face several challenges. The first is estimating the actual percentage of transactions that can be accelerated using this parallel execution method. The second challenge is the decentralization of the network, meaning if validators can easily scale computational power to increase throughput, how can full nodes keep up to ensure the correctness of the chain?
Percentage of Parallelizable Transactions
Accurately estimating the percentage of on-chain transactions that can be executed in parallel on any chain is challenging. Furthermore, this percentage can vary significantly between blocks depending on the type of network activity.
For example, an NFT mint might lead to a surge of highly interrelated transactions. That is, we can use some assumptions to obtain a rough estimate of the average percentage of parallelizable transactions.
For instance, we can assume that most ETH and ERC20 transfers are independent, meaning initiated from different addresses and received by different addresses. So we can assume that about 25% of ETH and ERC20 transfers are interrelated, such as deposits to SCs and aggregating assets from exchange hot wallets to cold wallets.
On the other hand, all AMM transactions in the same pool are related. Given that most AMMs are typically dominated by a few pools, and AMM transactions have high composability and interact with multiple pools, we can safely assume that at least 50% of AMM transactions are interrelated.
By analyzing the categories of transactions on Ethereum, we can find that among Ethereum's approximately 1.2 million daily transactions, 20-30% are ETH transfers, 10-20% are stablecoin transfers, 10-15% are DEX transfers, 4-6% are NFT transactions, 8-10% are ERC20 approvals, and 12-15% are other ERC20 transfers.
Using these figures and assumptions, we can estimate that PE could accelerate about 70-80% of transactions on SC platforms.
This means that the sequential execution of related transactions accounts for 20-30% of all transactions. In other words, if using the same Gas limit, it is possible to achieve a 3-5 times increase in throughput through PE.
Some experiments on building a parallel execution EVM have shown similar estimates, where a sustained throughput increase of 3-5 times can be achieved.
In practice, high-throughput chains use higher Gas limits and shorter block times to achieve throughput improvements of at least 100 times compared to Ethereum. The increased throughput requires robust validator nodes to handle these blocks, leading to the second challenge of network centralization.
Network Centralization
In high-throughput networks, the network can process tens of thousands of transactions per second.
Validator nodes are incentivized by fees and network rewards to process these transactions and invest in dedicated servers or scalable cloud architectures to handle this transaction load. However, for companies or individuals using the chain and needing to run full nodes to interact with the chain, the situation is different. These entities cannot afford complex servers to handle such a large transaction load. This will drive on-chain users to rely on specialized RPC node providers, such as Infura, leading to greater centralization.
If they do not choose to use consumer-grade hardware to run full nodes, high-throughput chains may become closed systems, with a small number of entities holding absolute power over the network. In this scenario, these entities could coordinate the censorship of transactions, entities, or even applications, such as Tornado Cash, potentially turning these chains into permissioned systems not much different from Web 2.
Currently, the requirements for operating a full node on the Sui testnet are lower than those for Aptos testnet nodes. However, we expect these requirements to change significantly when the mainnet launches and applications begin to appear on the chain.
Decentralization advocates have been proposing solutions to address these anticipated issues. These solutions include using lightweight nodes to verify the correctness of blocks through ZK validity proofs or fraud proofs.
The Fuel team is proactive in this regard, aligning with the Ethereum community's spirit regarding the importance of decentralization. It is unclear whether the Aptos and Sui teams prioritize implementing these methods or other measures to promote decentralization. The Linera team briefly discussed these issues in their introductory post, but the protocol implementation has yet to confirm this commitment.
Conclusion
Parallel execution engines are a promising solution for increasing the throughput of smart contract platforms.
Combined with innovations in consensus mechanisms, the parallel execution of transactions could bring the throughput of chains close to or exceeding 100,000 TPS, a performance that could rival Visa and Mastercard, enabling some of today's most challenging use cases, such as fully on-chain gaming and decentralized micropayments.
These impressive throughput improvements do not come without challenges, particularly regarding how to ensure decentralization, and we look forward to founders committed to addressing these issues.