In-depth Dialogue: Understanding Sui's Design Philosophy and Network Scalability from the Ground Up
Author: Sui Network
Recently, we interviewed George Danezis to discuss the complexity and scalability of the Sui infrastructure, as well as how Sui's transaction processing system facilitates a high-performance network. George Danezis is the co-founder and chief scientist of Mysten Labs (the initial contributor to Sui) and a professor in the field of security and privacy engineering at University College London.
Here is the content of the interview: Q1 You come from an academic background; can you introduce your research focus?
I am a professor at University College London (UCL), and my research focus broadly falls under security and privacy. In the early 2000s, I conducted quite a bit of research on peer-to-peer systems and anonymous systems, many of which were large distributed systems focused on storage. As the entire blockchain became more execution-focused, especially with Ethereum as a representative, I became interested in distributed ledgers and blockchains and how to execute smart contracts. I was already familiar with the permissionless nature from my early work on peer-to-peer systems. So, my research group at UCL began exploring how to build higher-performance systems. We founded Chainspace to commercialize some of our ideas, and later the team was acquired by Facebook. Then, we helped Facebook propose solutions for scaling the blockchain Libra/Diem. But when the project did not progress, I left to seek other opportunities to realize the idea of high-performance blockchains. Q2 You are still a professor; what do you think is the difference between application and research?
In fact, there is not much difference. When we conduct research, we consider all the possibilities for achieving specific goals, such as building a high-performance blockchain or specific functionalities. Of course, when building a blockchain or choosing specific features to use in a real system, we must select one of the possibilities. We must continually make judgments about which of these good ideas is actually most useful to people. Which one are people seeking? What are the bottlenecks in blockchain adoption? What prevents people from doing what they want to do? When building systems, you still consider all possibilities and try to understand the possible scenarios from the academic literature, then choose the most relevant ones. It is not just about intellectual curiosity; it is about creating value for users. Q3 How do you determine which problems to solve when moving from theory to practical application?
The main problem I address in my research is how to scale different functionalities of blockchains. I focus on the system aspects of blockchains, such as how to increase transaction throughput and reduce latency. The issues in this area are evident; whenever we see a contract on Ethereum becoming very popular, the Ethereum platform cannot handle such a large transaction volume, leading to transaction congestion and skyrocketing fees. Whenever a blockchain succeeds, we see that the transaction volume it can handle exceeds its existing capacity. Therefore, it is clear that the problem lies in not having enough capacity to meet what people want to do on these blockchains. This is not just based on our ideas; we see this happening time and again. For a while, this was considered a valuable challenge, not just within my team but actually across the entire academic community, as everyone was trying to solve this problem in different ways. Now, quite a few technologies have been developed to scale the capabilities of blockchains to address these challenges. But at that time, it was well known that many people were trying to solve it in different ways. Q4 L2 networks are one proposed solution to the scalability problem; what are the differences and benefits compared to building a new L1 network like Sui?
L2 is a solution for scaling within the Ethereum ecosystem. However, for application developers, using L2 networks can be a bit tricky. When an L2 network tries to interact with Ethereum, bridging activities must occur, which is true for any L2/L1 relationship. The state representing coins, assets, or other content on L1 must be mirrored on L2 and vice versa. In addition, L2 must have some mechanism for L1 to verify everything that happens within it. But this is just the first part; any asset that exists on L1 needs to be transferred to L2, some activity must occur on L2, and then somehow the assets must be transferred back to L1. This is quite cumbersome.
For tokens, which are fungible assets, this bridging activity is relatively smooth because people have two accounts and a bridging middleware. However, for more general assets, it does not work well. To actually use L2 networks to develop more complex applications than tokens on Ethereum, you need smart contracts on both sides, one for minting and the other for burning. They must shuttle between two different ecosystems, which is a custom activity for each contract. You cannot simply say, "I will create an L2 network, then take all the assets away, operate as I wish, and then bring them back," as there is no such concept. It is a manual process and very prone to errors. Therefore, it is not a good experience. Imagine you have assets on multiple different L2 networks, and you have these custom smart contracts on different L2 networks. Every time you want to operate on a state located on another L2 network, you must bridge all the way back to L1 and then back to L2. You cannot easily say, "I just did something on this blockchain, and now I want to do something else on another blockchain without considering which L1 or L2 it is on." Everything is here; I have it in hand, ready to perform more transactions on any state I want to access. This is why the experience of having states scattered across L2 networks is poor. Moving assets between different chains is very cumbersome and obvious to users. This is why L2 networks have never really interested me.
Another example is Cosmos, which has a very interesting ecosystem that takes a different approach by using different blockchains for different apps to scale. We can have different transaction speeds on different chains, and when operations need to occur between different apps, assets can be bridged between chains, but it faces the same issues. Every time you want to use different apps, you first have to bridge, which is subtle and obvious for users, and then you can use that app and bridge back. You find yourself spending more time transferring assets from one chain to another than doing what you really want to do.
On Sui, our solution is to build a large database that actually contains the state replicated by all validated nodes. Once you complete a transaction, all states in the same database can be used for the next transaction, and users do not have to constantly move asset states between L1 and L2. Q5 Sui Lutris is the foundation of the Sui protocol; what are its key innovations that enable Sui to have high throughput and low latency?
Sui Lutris consists of two key ideas: (1) many operations on the blockchain do not actually require consensus; (2) when you do need consensus, there is a very high-throughput method that combines these two approaches. Sui Lutris is the core of the Sui distributed system, ensuring that when transactions occur on a distributed network, two different validating nodes following the protocol will never be in an inconsistent state. Thus, there will not be a situation where one validating node thinks you spent a coin and sent it to Alice, while another validating node thinks the same coin was actually sent to Bob.
? Sui Lutris:
https://tech.mystenlabs.com/sui-lutris-the-distributed-system-protocol-at-the-heart-of-sui
Two different paths: one does not require consensus (fast path), and the other requires consensus (consensus path). When the objects you are operating on belong solely to you, such as your own NFT character and the hat you want to combine it with so that your character can wear the hat, theoretically, no one else should be able to operate on them. In these cases, Sui uses the fast path, which means you can operate on your own objects, and you can achieve transaction finality without waiting for consensus, ensuring the transaction occurs and the hat is on your NFT's head.
But in some cases, transactions involve objects that are not solely yours; they are shared by many people. For example, if there is an auction selling small hats, this type of auction is represented in Sui as a shared object. People can bid, and the highest bidder wins the hat. This auction is an object that does not belong to a single entity; everyone must be able to bid, share, and update the state regarding the latest bids, and these types of operations require additional consensus. Sui Lutris allows you to have shared objects and perform transactions on them, enabling you to own other objects, change the state of shared objects, or create new shared objects. It allows both paths to coexist and interact between exclusively owned objects by specific individuals and shared objects owned by multiple people.
These two different paths have different advantages. The fast path for exclusively owned objects has extremely low latency, taking less than a second, and can scale widely. The consensus path has higher latency, usually over a second, and a fairly high capacity, but it is more challenging to scale compared to the first path. On Sui, those who drive on-chain apps with millions of transactions daily typically use the first path and largely structure their apps to conduct the most transactions primarily on exclusively owned objects rather than shared transactions. On the other hand, protocols performing complex tasks (such as DeFi) usually implement the second type of transaction because they must combine bids or liquidity from many different people to execute operations. Q6 Can app developers on Sui design their apps to take advantage of the fast path?
Yes, absolutely. I think this is a core task for app designers to scale. Smart contract developers have complete control over whether the objects they operate on in their contracts are exclusively owned objects or shared objects at any given time. One trick to scaling apps on Sui is to ensure that most operations are essentially performed on exclusively owned objects, as Sui can manage many operations you want at very low latency, which is a great experience. Operations necessary for games should fall into this category, as their latency is very low compared to operations that require mediation through shared states and shared objects. Once clicked, transactions can be completed immediately on the network.
Smart contract designers have complete control over this; they can essentially specify exactly what transactions fall into each category. Of course, the first version of the contract might treat everything as shared state, and everything would go through the higher-latency consensus path, but as the need for scaling arises, developers need to consider to what extent they can avoid those parts. Q7 How do programmable transaction blocks play a role in this?
Programmable transaction blocks can operate on either the fast path or the consensus path. If a programmable transaction block only involves your exclusively owned objects, it means you can perform multiple operations in a single transaction on-chain. For example, suppose you are a CEX app where many people buy and sell different coins; you can perform a single transaction on-chain that conceptually corresponds to what people are buying and selling. But because you are an exchange, they all belong to you, so you can settle a thousand transactions simultaneously, which is the fast path. On the other hand, if some objects within the programmable transaction block are shared, it enters the consensus path, where the latency will be slightly higher, not less than a second but several seconds.
Q8 The mainnet has been live for over 100 days; has Sui's performance validated your research theories? Is there anything that surprised you?
Several things have validated Sui's design, but there are also some thought-provoking aspects. One is that during times of particularly high transaction volume, even at a specific moment, daily transaction volumes exceeded 60 million, most of which were on the fast path. Sui Lutris is highly scalable and has very low latency. Before that, it was unclear whether anyone would use this path, but when a large number of transactions and low latency were needed, it was used and very effectively! It is easy to see that this method works. On those days, Sui's transaction volume exceeded the total of all other blockchains. This is an interesting validation that proves Sui's design is sound.
At the same time, the Sui community found the fast path to be somewhat subtle. Because the owners of objects must manage the order of operations on their own objects to some extent, mistakes can occur. Sometimes they might even use libraries that do not help them, and the library itself might have bugs, causing objects to be locked. Typically, they will be unlocked at the end of the day, i.e., at the end of an epoch, but this is not a good experience. Designers of smart contracts may feel apprehensive about this, fearing that such situations could lead to errors, which prevents them from fully utilizing the facilities of low latency and scalability. A whole set of technologies is being developed to allow those with mistakenly locked objects to quickly unlock them within seconds. Therefore, if you try to use the fast path and an error occurs, locking your object, you can immediately use the consensus path to unlock it without waiting for the end of an epoch.
Moreover, strangely, this is not just about avoiding errors; it also allows developers to express more through the fast path, as there are some potential technologies where some objects are not solely owned by one party. Perhaps there is an object that you and I jointly own because it is shared, and typically transactions on that object must go through the consensus path. However, if Sui has a way to quickly unlock objects, developers can actually attempt to transact through the fast path. In the case where you and I happen to transact on the same object at the same time, the system will be locked, unable to decide which transaction occurs next, and then Sui can unlock it and route it through the consensus path, making it shared and resolving it. But this situation is unlikely to occur unless people intentionally try to compete. Once Sui has the capability to allow unlocking objects, it should be able to let objects owned by multiple people transact through the fast path. This is a game trying to pass as many transactions as possible through the fast path, which is a type of initiative being developed to assist the builder community. Q9 Can you share more details about the current reasons for object locking?
When an object belongs to you, it does not need to go through consensus to tell Sui the order of a series of operations that occur because no one else can operate on your object. Sui relies on you to tell the system that action A will occur first, action B will occur next, and action C will occur last. The system still needs to check whether A, B, and C are seen in the same order by everyone. The system achieves this through a distributed protocol that only checks whether we all see A, B, and C in sequence. The problem arises if you make a mistake or if your software makes an error. For example, if your phone controls your assets and your computer controls your assets, your phone indicates that A occurs first, while your computer indicates that B occurs first. You have mistakenly ordered two different things. This is a contradiction. In this case, Sui would say, "Well, the entity I entrusted to tell me the order seems to have given me two contradictory things, so I don't know what to do. I don't know how to resolve this issue." Because Sui typically resolves this issue through the consensus path. But here, you are trying to use the fast path. So Sui raises its hand and says, "Well, there is an error here."
The initial assumption was that this situation would not occur frequently, but it turns out it happens often because people use different devices or try to transact multiple times on the same object simultaneously. Currently, when these objects are locked, Sui waits until the end of an epoch to unlock them, which is very concerning. Imagine if your assets are unusable for a day; this could actually be a serious problem.
Therefore, Sui now needs to evolve to take the correct action when something is locked. If the entity entrusted to provide the correct order gives an ambiguous sequence, Sui will resolve the entire situation through consensus. This will happen within seconds rather than at the end of an epoch. Q10 Much of your research revolves around privacy. What are your thoughts on how public chains can best balance transparency, traceability, and privacy?
In public chains, how to balance transparency, traceability, and privacy is a question very relevant to applications, and my perspective on privacy is that what needs to be kept private largely depends on the application itself. For example, on Sui, it makes sense to let application developers develop contracts to protect their users' privacy. Because some people just want to develop games and may not be as concerned about privacy issues. Others may want to handle financial transactions on the blockchain, where privacy may be more concerning, but at the same time, it also involves other types of regulatory issues. So Sui's stance is that we will provide you with a good platform, and you need to build privacy on this platform.
To help people build privacy, Sui offers some cryptographic native support that may be useful when designing smart contracts. One of the most important is the ability to verify zero-knowledge proofs on Sui. There is a native function to verify one of the most widely used and understood schemes, the Groth16 scheme developed by my colleague Jens Groth. This means that, in practice, app designers can verify certain events off-chain without revealing what those events are. This is a fundamental building block for constructing a whole class of privacy-friendly applications that keep some states off-chain, but on-chain, you can verify that anything occurring off-chain is correct.
Application developers determine what kind of privacy protection their applications need and use these native supports to combine on-chain, off-chain, and on-chain encryption strategies to address the privacy issues they may encounter. Q11 Is there more native support for privacy on Sui?
The community is considering the support developers need to write smart contracts in a more privacy-friendly manner, with zero-knowledge proofs being one of them. Some may think Sui needs more general mathematical or cryptographic functions on-chain. We would be happy to see smart contract designers provide feedback on what is missing, and there are other whole classes of technologies that can be used to protect privacy, such as multi-party computation or trusted hardware. Different blockchains have been moving in these directions, which require very complex additional systems. There needs to be sufficient evidence in the community to show that people want these technologies, as they represent some fundamental changes to the Sui architecture. However, if the community wants to move in this direction, there will be a process for proposing the addition of privacy protection methods. Q12 How do you see Sui developing in the next 6 to 12 months?
It depends on what kind of applications people develop on Sui. In the short term, many improvements will target the applications that people are actually building. From a very long-term perspective, under blockchain standards, 6 to 12 months can be seen as a long time, and we will improve the Sui Lutris protocol to achieve lower latency, simpler protocols, and better scalability for Sui. Additionally, it will make the economy more efficient, allowing validating nodes to run on more constrained hardware and utilize existing hardware for actual transaction execution rather than for cryptographic or other blockchain overhead. This is what we expect to see.