Decoding the Holy Grail: Challenges and Solutions of On-Chain Fully Homomorphic Encryption
Written by: Jeffrey Hu, Arnav Pagidyala
Core Insights:
- Fully Homomorphic Encryption (FHE) is hailed as the "holy grail of cryptography," but its applications are currently limited by performance, development experience, and security concerns.
- As shown in the image above, to build a truly confidential and secure shared state system, FHE needs to be combined with Zero-Knowledge Proofs (ZKPs) and Multi-Party Computation (MPC).
- FHE is rapidly evolving; the development of new compilers, libraries, hardware, etc., along with the R&D efforts of Web2 companies (such as Intel, Google, DARPA, etc.), has greatly accelerated the advancement of FHE.
- As FHE and its surrounding ecosystem mature, we expect "verifiable FHE" to become the standard. Decentralized applications (DApps)/rollups may choose to outsource computation and verification to FHE co-processors.
- A fundamental limitation of on-chain FHE is "who holds the decryption key." Threshold decryption and MPC provide solutions to this limitation, but they often involve trade-offs between performance and security.
Introduction:
The transparency of blockchain is a double-edged sword. While its openness and observability are appealing, they are also precisely the concerns enterprises have when adopting blockchain technology.
On-chain privacy has been one of the most challenging issues in the field of cryptography for nearly a decade. Although many teams are building systems based on Zero-Knowledge Proofs (ZKPs) to achieve on-chain privacy, they cannot support shared encrypted states. The reason is that these solutions are a series of functions of ZKPs, making it impossible to apply arbitrary logic to the current state. This means we cannot simply use ZKPs to build encrypted shared state applications (like private Uniswap).
However, recent technological breakthroughs indicate that combining ZKPs with Fully Homomorphic Encryption (FHE) can achieve fully generalized, encrypted decentralized finance (DeFi). How is this achieved? FHE is an emerging area of cryptography that allows arbitrary computations on encrypted data. As shown in the image above, ZKPs can prove the integrity of user inputs and computations, while FHE can handle the computations themselves.
While FHE is hailed as the "holy grail of cryptography," we will attempt to provide an objective analysis of this field and its various challenges and potential solutions. This technical report will cover the following on-chain FHE topics:
- FHE Schemes, Libraries, and Compilers
- Secure Threshold Decryption
- ZKPs for User Inputs + Computing Party
- Programmable, Scalable DA Layer
- FHE Hardware
Fully Homomorphic Encryption (FHE) Schemes, Libraries, and Compilers
Challenges: Emerging FHE Schemes, Libraries, and Compilers
The fundamental bottleneck of on-chain FHE lies in the lag of its development tools and infrastructure. Unlike Zero-Knowledge Proofs (ZKPs) or Multi-Party Computation (MPC), FHE has developed relatively recently since 2009, resulting in lower maturity.
The main limitations of the FHE development experience include:
- A lack of user-friendly front-end languages that allow developers to code without needing to deeply understand back-end cryptography.
- A lack of fully functional FHE compilers that can handle all complex tasks (such as parameter selection, SIMD optimization, and parallel optimization for BGV/BFV).
- Existing FHE schemes (especially TFHE) are approximately 1000 times slower compared to ordinary computation.
To truly understand the complexity of integrating FHE, let’s look at the process developers must go through:
The first step in integrating FHE into an application is to choose an FHE scheme. The table below explains the main schemes:
As shown in the table above, Boolean circuits (such as FHEW and TFHE) have the fastest bootstrapping speeds, avoiding complex parameter selection processes and can be used for arbitrary/general computation, but are relatively slower; while BGV/BFV is suitable for general DeFi applications because they are more efficient in high-precision arithmetic computations, but developers must know the depth of the circuit in advance to set all parameters of the scheme. On the other hand, CKKS supports homomorphic multiplication and allows for precision errors, making it suitable for imprecise use cases like machine learning.
As a developer, you need to be very careful in choosing an FHE scheme, as it will affect all other design decisions and future performance. Additionally, several key parameters are crucial for correctly setting up the FHE scheme, such as the choice of modulus size and the role of polynomial degree.
Next, we discuss libraries, and the table below shows the features and capabilities of the currently widely used FHE libraries:
However, the relationships of these libraries with different FHE schemes and compilers vary, as shown below:
After selecting an FHE scheme, developers need to set parameters. Correctly choosing parameters will have a significant impact on the performance of the FHE scheme. The difficulty lies in the fact that this requires an understanding of abstract algebra, FHE-specific operations (such as re-linearization and modulus switching), and arithmetic or binary circuits. Finally, to effectively choose parameters, one needs to conceptually understand how they affect the FHE scheme.
At this point, developers may ask:
How large does my plaintext space need to be? What ciphertext size is acceptable? Where can I perform parallel computations? And so on…
Moreover, although FHE can support arbitrary computations, developers need to change their mindset when writing FHE programs. For example, you cannot write a branch (if-else) based on a variable in the program because the program cannot treat variables as ordinary data for direct comparison. Instead, developers need to change the code from branches to some computation that can encompass all branching conditions. Similarly, this also prevents developers from writing loops in the code.
In short, for developers unfamiliar with FHE, integrating FHE into their applications is nearly impossible. This will require significant development tools and infrastructure to abstract the complexity presented by FHE.
Solutions:
1. Web3-Specific FHE Compilers
While there are existing FHE compilers built by companies like Google and Microsoft, they:
- Are not designed with performance as a goal, adding a 1000x "overhead" compared to directly writing circuits.
- Are optimized for CKKS (i.e., ML), which is not as beneficial for the BFV/BGV needed for DeFi.
- Are not built for Web3. They do not support compatibility with ZKP schemes, programmable blockchains, etc.
Until the emergence of the Sunscreen FHE compiler. It is a Web3-native compiler that provides optimal performance for arithmetic operations (such as DeFi) without requiring hardware accelerators. As mentioned, parameter selection can be the most challenging part of implementing an FHE scheme. Sunscreen not only automates parameter selection but also implements data encoding, key selection, re-linearization, modulus switching, circuit setup, and SIMD operations.
As technology continues to advance, we look forward to seeing not only in the Sunscreen compiler but also other teams building their own further optimizations that can support other high-level languages.
2. New FHE Libraries
While researchers are continually exploring new effective schemes, FHE libraries can also enable more developers to integrate FHE.
Take FHE smart contracts as an example. Although choosing from different FHE libraries can be challenging, it becomes easier when we talk about on-chain FHE, as there are only a few dominant programming languages in Web3.
For instance, Zama's fhEVM integrates Zama's open-source library TFHE-rs into a typical EVM, exposing homomorphic operations as precompiled contracts. This effectively enables developers to use encrypted data in their contracts without making any changes to the compilation tools.
For other specific scenarios, some additional infrastructure may also be needed. For example, the TFHE library written in C++ is not maintained as well as its Rust version. Although TFHE-rs can support exports to C/C++, if C++ developers want better compatibility and performance, they must write their own version of the TFHE library.
Finally, a core reason for the lack of FHE infrastructure in the market is that we lack effective ways to build FHE-RAM. One possible direction for research is to study an FHE-EVM that lacks certain opcodes.
Secure Threshold Decryption
Challenges: Insecure/Centralized Threshold Decryption
Every confidential shared state system is based on encryption and decryption keys. Since no single party can be trusted, the decryption key is shared among network participants through Multi-Party Computation (MPC).
One of the most challenging aspects of on-chain Fully Homomorphic Encryption (FHE) is building a secure and high-performance threshold decryption protocol. There are two main bottlenecks: (1) FHE-based computations introduce significant "overhead," requiring high-performance nodes, which essentially reduces the potential scale of the validator set; (2) as the number of nodes participating in the decryption protocol increases, latency also increases.
At least for now, FHE protocols rely on a majority of honest validators; however, as mentioned above, on-chain FHE implies a smaller validator set, thus increasing the likelihood of collusion and malicious behavior.
What if threshold nodes collude maliciously? They would be able to bypass the protocol, essentially decrypting anything, including user inputs to any on-chain data. Furthermore, it is crucial to note that in the current system, decryption protocols can occur unnoticed (i.e., "silent attacks").
Solutions: Improved Threshold Decryption or Dynamic MPC
There are several possible approaches to address the shortcomings of threshold decryption. (1) Enable n/2 thresholds, which would make collusion more difficult; (2) Run threshold decryption protocols within hardware security modules (HSMs); (3) Use a threshold decryption network based on trusted execution environments (TEEs), controlled by a decentralized chain for authentication, allowing dynamic key management.
Rather than relying on threshold decryption, a more likely scenario is to use MPC. A phenomenal example of an MPC protocol that can be used in on-chain FHE is Odsy's new 2PC-MPC, which is the first non-colluding and massively decentralized MPC algorithm.
Another approach might be to encrypt data using only the user's public key. Then the validator handles the homomorphic operations, and the user can decrypt the results themselves if necessary. While this only applies to specific use cases, we can entirely avoid threshold assumptions.
In summary, on-chain FHE requires an efficient MPC implementation that has the following characteristics: (1) works even in the presence of malicious actors; (2) introduces minimal latency; (3) allows permissionless/flexible node joining.
ZKPs for User Inputs and Computing Party
Challenges: Verifiability of User Inputs and Computation:
In the Web2 world, when we request a computation task, we fully trust a company to run the task behind the scenes as they promised. In Web3, this model is completely reversed. We no longer rely on a company's reputation and simply trust that they will act honestly; instead, we operate in a trustless environment, meaning users should not need to trust anyone.
While Fully Homomorphic Encryption (FHE) allows for computations on encrypted data, it cannot verify the correctness of user inputs or computations. Without the ability to verify, FHE is nearly useless in the context of blockchain.
Solutions: ZKPs for User Inputs and Computing Party:
While FHE allows anyone to perform arbitrary computations on encrypted data, ZKPs allow people to prove something is true without revealing the underlying information itself. So how are they related?
FHE and ZKPs combine in three key ways:
- Users need to submit a proof that their input ciphertext is in the correct format. This means the ciphertext conforms to the requirements of the encryption scheme, not just any arbitrary string of data.
- Users need to submit a proof that their input plaintext meets the conditions of the given application. For example, the account balance is greater than the transfer amount.
- The validating node (i.e., the computing party) needs to prove that they have correctly executed the FHE computation. This is referred to as "verifiable FHE," which can be seen as an analogy to the ZKPs required for rollups.
To further explore the applications of ZKPs:
1. ZKP for Ciphertext
FHE is built on lattice-based cryptography, which means the construction of encryption primitives involves lattices that are post-quantum secure. In contrast, popular ZKPs, such as SNARKs, STARKs, and Bulletproofs, do not rely on lattice-based cryptography.
To prove that a user's FHE ciphertext is in the correct format, we need to show that it satisfies a matrix-vector equation with entries from rings, and that the values of these entries are small. Essentially, we need a proof system designed for handling lattice-based relations that is cost-efficient for on-chain verification, which is an open research area.
2. ZKP for Plaintext Inputs
In addition to proving the format correctness of ciphertext, users also need to prove that their input plaintext meets the application's requirements. Fortunately, unlike the previous step, we can leverage general SNARKs to prove the validity of user inputs, allowing developers to utilize existing ZKP schemes, libraries, and infrastructure.
However, the challenge lies in the need to "connect" these two proof systems. Connecting means we need to ensure that users used the same input for both proof systems. Otherwise, a malicious user could deceive the protocol. It is important to emphasize that this is an extremely difficult cryptographic feat and an open area of early research.
Sunscreen has already laid the groundwork for this, as their ZKP compiler supports Bulletproofs, making it easier to connect with SDLP. Research on connecting FHE and ZKP compilers is also ongoing.
Mind Network has been leading the integration of ZKPs to ensure zero-trust inputs and outputs while leveraging FHE for secure computation.
3. ZKP for Correct Computation
Running FHE on existing hardware is extremely inefficient and expensive. As we have seen in the dynamic manifestation of the scalability trilemma, networks with high resource demands on nodes cannot scale to sufficient levels of decentralization. One possible solution is a process known as "verifiable FHE," where the computing party (validator) submits a ZKP to prove the honest execution of transactions.
An early prototype of verifiable FHE can be demonstrated through a project called SherLOCKED. Essentially, FHE computations are loaded onto Risc ZERO's Bonsai zkVM, which processes computations on encrypted data off-chain and returns results with ZKPs, updating the state on-chain.
A recent example utilizing FHE co-processors is Aztec's on-chain auction demonstration. As discussed earlier, existing FHE chains use threshold key encryption for the entire state, meaning the strength of the system relies solely on its threshold committee. In contrast, in Aztec, users own their data, thus not being constrained by threshold security assumptions.
Finally, it is important to understand where developers can first build FHE applications. Developers can build their applications on FHE-supported L1, FHE rollups, or on any EVM chain while leveraging FHE co-processors. Each design has its own trade-offs, but we prefer well-designed FHE rollups (pioneered by Fhenix) or FHE co-processors, as they inherit security and other benefits from Ethereum.
Until recently, implementing fraud proofs on FHE encrypted data was complex, but the team at Fhenix.io recently demonstrated how to achieve fraud proofs using the Arbitrum Nitro stack with FHE logic compiled to WebAssembly.
Data Availability (DA) Layer for FHE
Challenges: Lack of Customizability and Throughput
While "storage" has been commoditized in Web2, the same is not true in Web3. Given that Fully Homomorphic Encryption (FHE) maintains over 10,000 times data inflation, we need to determine where to store large amounts of FHE ciphertext. If every node operator in a given DA layer has to download and store every piece of data from each FHE chain, then only institutional operators can keep up with the demand, increasing the risk of centralization.
Regarding throughput, Celestia's maximum speed is 6.7 MB/s, and if we want to run any form of FHEML, we will need several GB of bandwidth per second. According to recent data shared by 1k(x), existing DA layers cannot support FHE due to design decisions that intentionally limit throughput.
Solutions: Horizontal Scalability + Customizability
We need a DA layer that can support horizontal scalability. Existing DA architectures propagate all data to every node in the network, which is a major limitation for scalability. In contrast, as more DA nodes enter the system, horizontal scalability means that the amount of data stored by each node decreases, improving performance and cost as more block space becomes available.
Additionally, considering the large size of data associated with FHE, it makes sense to provide developers with higher levels of customizable erasure coding thresholds. In other words, to what extent do developers feel comfortable with the guarantees of the DA system? Is it based on weighted equity or decentralized weighting?
The architecture of EigenDA can serve as a foundation for some possibilities of a DA layer for FHE. Its attributes of horizontal scalability, structural cost reduction, and decoupling of DA and consensus pave the way for supporting scalability levels for FHE.
Finally, a higher-dimensional idea might be to build optimized storage slots for storing FHE ciphertext, as ciphertext follows specific encoding schemes, so having optimized storage slots may help efficiently use storage space and enable faster retrieval.
Fully Homomorphic Encryption (FHE) Hardware
Challenges: Low-Performance Fully Homomorphic Encryption (FHE) Hardware
In promoting on-chain Fully Homomorphic Encryption (FHE) applications, a major issue is the significant latency caused by computational overhead, making it impractical to run FHE on any standard processing hardware. This limitation arises from the extensive interactions with memory, as processors need to handle massive amounts of data. In addition to memory interactions, complex polynomial computations and time-consuming ciphertext maintenance operations also incur substantial overhead.
To gain a deeper understanding of the state of FHE accelerators, we need to unveil the specifics of their designs: application-specific FHE accelerators versus generalized FHE accelerators.
The computational complexity of FHE and hardware design is always closely related to the number of multiplications required by a given application, referred to as "the depth of homomorphic operations." In the case of specific applications, the depth is known, so system parameters and related computations are fixed. Therefore, hardware design for specific applications will be easier and can often be optimized for better performance than generalized accelerator designs. In general, if FHE needs to support an arbitrary number of multiplications, bootstrapping techniques must be introduced to remove some of the accumulated noise in homomorphic operations. Bootstrapping techniques are costly and require substantial hardware resources, including chip memory, memory bandwidth, and computation, meaning that hardware design will differ significantly from that of specific applications. Thus, while significant industry players like Intel, Duality, SRI, and DARPA are undoubtedly raising the ceiling with their work on GPU and ASIC designs, they may not be directly applicable to supporting blockchain scenarios.
In terms of development costs, generalized hardware requires more capital for design and manufacturing than application-specific hardware, but its flexibility allows the hardware to be used across a broader range of applications. On the other hand, for specific application designs, if the application's demands change and require support for deeper homomorphic computations, the application-specific hardware must be combined with some software techniques (like MPC) to achieve the same objectives as generalized hardware.
It is worth noting that accelerating on-chain FHE is much more challenging than for specific application use cases (like FHEML) because it needs to handle more general computations rather than a more specific set. For example, the current transaction processing speed of the fhEVM development network is limited to single-digit TPS.
Given the need for blockchain-customized FHE accelerators, another consideration is: how transferable is the hardware from ZKP to FHE?
There are some common modules between ZKP and FHE, such as Number Theoretic Transforms (NTT) and underlying polynomial operations. However, the bit widths of NTT used in these two cases differ. In ZKP, hardware needs to support multiple bit widths for NTT, such as 64-bit and 256-bit, while in FHE, due to the use of residue number systems, the operands for NTT are shorter. Secondly, the NTT degree processed in ZKP is typically higher than in FHE. For these two reasons, designing an NTT module that satisfies developers' performance needs for both ZKP and FHE is not an easy task.
In addition to NTT, there are other computational bottlenecks in FHE, such as automorphisms, which do not exist in ZKPs. One way to simply transfer ZKP hardware to an FHE setting is to load NTT computation tasks onto ZKP hardware and use a CPU or other hardware to handle the remaining computations in FHE.
Summarizing these challenges, using FHE to compute on encrypted data was once 100,000 times slower than on plaintext data. However, due to newer encryption schemes and recent developments in FHE hardware, current performance has significantly improved to about 100 times slower than plaintext data.
Solutions:
1. Improvements in FHE Hardware
Many teams are actively building FHE accelerators, but they are not focusing on FHE accelerators specific to generalized blockchain computation (like TFHE). An example of a hardware accelerator specific to blockchain is FPT, a fixed-point FPGA accelerator. FPT is the first hardware accelerator to heavily utilize the inherent noise in FHE computations, fully implementing TFHE bootstrapping using approximate fixed-point arithmetic. Another project that may be useful for FHE is BASALISC, a series of hardware accelerator architectures aimed at significantly accelerating FHE computations in the cloud.
As mentioned earlier, a major bottleneck in FHE hardware design is the extensive interactions with memory. To circumvent this barrier, a potential solution is to minimize interactions with memory as much as possible. Processing-in-Memory (PIM) or near-memory computation may be helpful in this case. PIM is a promising solution to the "memory wall" challenge facing future computer systems, allowing memory to simultaneously perform computation and storage functions, promising a fundamental reform of the relationship between computation and memory.
In the past decade, significant progress has been made in designing non-volatile memory solutions to address this issue, such as resistive RAM, spin-transfer torque magnetic RAM, and phase-change memory. Using this special memory, existing research has shown significant improvements in computational performance in machine learning and lattice-based public-key encryption.
2. Optimized Software and Hardware
In recent developments, software has been recognized as a key component in the hardware acceleration process. For example, well-known FHE accelerators like F1 and CraterLake use compilers for mixed hardware-software co-design.
As this field evolves, we can expect fully functional compilers co-designed with FHE compilers tailored for blockchain. FHE compilers can optimize FHE programs based on the cost models of the corresponding FHE schemes. These FHE compilers can be integrated with FHE accelerator compilers, improving end-to-end performance by combining hardware-level cost models.
3. Networked FHE Accelerators: From Vertical Scaling to Horizontal Scaling
Existing efforts for FHE hardware acceleration primarily focus on "vertical scaling," which emphasizes vertical improvements to individual accelerators. In contrast, "horizontal scaling" focuses on horizontally connecting multiple FHE accelerators through a network, which could significantly enhance performance, similar to parallel proof generation with Zero-Knowledge Proofs (ZKPs).
A major difficulty in FHE acceleration is the data inflation problem: the significant increase in data size that occurs during encryption and computation, posing challenges for efficient data transfer between on-chip and off-chip memory.
Data inflation presents a significant challenge for horizontally connecting multiple FHE accelerators through a network. Therefore, the co-design of FHE hardware and networks will be a promising direction for future research. For example, adaptive network routing for FHE accelerators: implementing a routing mechanism that dynamically adjusts data paths between FHE accelerators based on real-time computational load and network traffic. This would ensure optimal data transfer rates and efficient resource utilization.
Conclusion
FHE will fundamentally reshape how we protect data across platforms, paving the way for an unprecedented era of privacy. Building such a system will require significant advancements in FHE, Zero-Knowledge Proofs (ZKPs), and Multi-Party Computation (MPC).
As we enter this new domain, collaborative efforts among cryptographers, software engineers, and hardware experts will be essential. Not to mention legislators, regulators, and others, as compliance is the only path to mainstream adoption.
Ultimately, FHE will stand at the forefront of the wave of digital sovereignty transformation, heralding a future where data privacy and security are no longer mutually exclusive but inseparable.
Special Thanks
Special thanks to Mason Song (Mind Network), Guy Itzhaki (Fhenix), Leo Fan (Cysic), Kurt Pan, Xiang Xie (PADO), and Nitanshu Lokhande (BananaHQ) for their reviews. We hope readers will explore the impressive work and efforts these individuals have made in this field!