Advanced Formal Verification of Zero-Knowledge Proofs: How to Verify a ZK Instruction
Written by: Certik
To gain a deeper understanding of how formal verification techniques are applied to zkVM (Zero-Knowledge Virtual Machine), this article will focus on the verification of a single instruction. For an overview of advanced formal verification in ZKP (Zero-Knowledge Proof), please refer to our concurrently published article "Advanced Formal Verification of Zero-Knowledge Proof Blockchain."
What is the verification of ZK instructions?
zkVM (Zero-Knowledge Virtual Machine) can create short proof objects as evidence to demonstrate that a specific program can run on certain inputs and successfully terminate. In the Web3.0 domain, the application of zkVM increases throughput because L1 nodes only need to verify a short proof of the transition from input state to output state of the smart contract, while the actual execution of the contract code can be done off-chain.
The zkVM prover first runs the program to generate an execution record for each step, and then converts the data of the execution record into a set of numerical tables (a process known as "arithmeticization"). These numbers must satisfy a set of constraints (i.e., circuits), which include equations relating specific table cells, fixed constants, database lookup constraints between tables, and polynomial equations that must be satisfied between each pair of adjacent table rows (i.e., "gates"). On-chain verification can confirm that there indeed exists a table that satisfies all constraints while ensuring that the specific numbers in the table are not revealed.
The execution of each VM instruction faces many such constraints, and we will refer to this set of constraints for VM instructions as its "ZK instruction." Below is an example of a constraint for a memory load instruction in zkWasm written in Rust.
The formal verification of ZK instructions is accomplished through formal reasoning on this code, which we first translate into formal language.
Even a single erroneous constraint can allow an attacker to submit a malicious ZK proof. The data table corresponding to the malicious proof does not correspond to the legitimate execution of the smart contract. Unlike non-ZK chains like Ethereum, where many nodes run different EVM (Ethereum Virtual Machine) implementations, making it unlikely for the same error to occur simultaneously in the same place, a zkVM chain has only a single VM implementation. From this perspective, ZK chains are more vulnerable than traditional Web3.0 systems.
Worse still, unlike non-ZK chains, since the computational details of zkVM transactions are not submitted and stored on-chain, it becomes extremely challenging not only to discover the specific details of an attack after it occurs but also to identify the attack itself.
The zkVM system requires extremely rigorous scrutiny; unfortunately, the correctness of zkVM circuits is difficult to guarantee.
Why is the verification of ZK instructions difficult?
The VM (Virtual Machine) is one of the most complex parts of the Web3.0 system architecture. The powerful capabilities of smart contracts are at the core of Web3.0 capabilities, and their strength comes from the underlying VMs, which are both flexible and complex: to accomplish general computing and storage tasks, these VMs must support numerous instructions and states. For example, the geth implementation of EVM requires over 7,500 lines of Go code. Similarly, the ZK circuits that constrain the execution of these instructions are also large and complex. In the zkWasm project, the implementation of ZK circuits requires over 6,000 lines of Rust code.
zkWasm circuit architecture
Compared to the dedicated ZK circuits used in ZK systems designed for specific applications (such as private payments), zkVM circuits are much larger: the number of constraint rules may exceed the former by one or two orders of magnitude, and their arithmetic tables may include hundreds of columns with millions of numerical cells.
What does the verification of ZK instructions mean?
Here, we want to verify the correctness of the XOR instruction in zkWasm. Technically, the execution table of zkWasm corresponds to a legitimate Wasm VM execution sequence. So, from a macro perspective, what we want to verify is that each execution of this instruction always produces a new legitimate zkVM state: each row in the table corresponds to a legitimate state of the VM, and the immediately following row must always be generated by executing the corresponding VM instruction. The following diagram presents the formal theorem of the correctness of the XOR instruction.
Here, "staterel i st" indicates that the state "st" is a legitimate zkVM state of the smart contract at step "i." As you might guess, proving "staterel (i+1) …" is not an easy task.
How to verify ZK instructions?
Although the computational semantics of the XOR instruction are quite simple—calculating the bitwise XOR of two integers and returning the integer result—there are many aspects involved: first, it needs to retrieve two integers from the stack memory, perform the XOR calculation, and then store the resulting new integer back into the same stack. Additionally, the execution steps of this instruction should be integrated into the overall execution flow of the smart contract, and its execution order and timing must be correct.
Thus, the effect of executing the XOR instruction should be to pop two numbers from the data stack, push their XOR result, and increment the program counter to point to the next instruction of the smart contract.
It is not difficult to see that the correctness properties defined here are generally very similar to those we face when verifying traditional bytecode VMs (such as the EVM interpreter in Ethereum L1 nodes). It relies on a high-level abstract definition of machine states (here referring to stack memory and execution flow) and definitions of the expected behavior of each instruction (here referring to arithmetic logic).
However, as we will see below, due to the particularities of ZKP and zkVM, the process of verifying correctness often differs significantly from that of conventional VMs. Just verifying a single instruction here relies on the correctness of many tables in zkWASM: one table is used to constrain the size of values, another table is for intermediate results of binary bit calculations (bit table), one execution table stores VM states of constant size in each row (similar to data in registers and latches in a physical CPU), and there are memory and jump tables representing dynamically variable sizes of VM states (memory, data stack, and call stack).
Step 1: Stack Memory
Similar to traditional VMs, we need to ensure that the two integer parameters of the instruction can be read from the stack and that their XOR result is correctly written back to the stack. The formal representation of the stack also looks quite familiar (there is also global memory and heap memory, but the XOR instruction does not use them).
zkVM uses a complex scheme to represent dynamic data because ZK provers do not natively support data structures like stacks or arrays. Instead, for each value pushed onto the stack, the memory table records it in a separate row, with some columns indicating the activation time of that table entry. Of course, this data can be controlled by attackers, so additional constraints must be imposed to ensure that the table entries genuinely correspond to the actual push instructions in the contract execution. This is achieved by carefully calculating the number of pushes during the program execution process. When verifying each instruction, we need to ensure that this count remains correct. Additionally, we have a series of lemmas that relate the constraints generated by a single instruction to the table lookups and time range checks that implement stack operations. From the top level, the counting constraints for memory operations are defined as follows.
Step 2: Arithmetic Operations
Similar to traditional VMs, we want to ensure that the bitwise XOR operation is performed correctly. This seems easy, as our physical computer CPUs can perform this operation in one go.
However, for zkVM, this is not straightforward. The only two arithmetic instructions natively supported by ZK provers are addition and multiplication. To perform binary bit operations, the VM uses a rather complex scheme, where one table stores fixed byte-level operation results, and another table acts as a "draft" to show how to decompose a 64-bit number into 8 bytes across multiple table rows and then recombine the result.
Snippet of zkWasm bit table specification
The XOR operation, which is very simple in traditional programming languages, requires many lemmas here to verify the correctness of these auxiliary tables. For our instruction, we have:
Step 3: Execution Flow
Similar to traditional VMs, we need to ensure that the value of the program counter is correctly updated. For sequential instructions like XOR, the program counter needs to increment by one after each execution step.
Since zkWasm is designed to run Wasm code, it must also ensure that the invariant properties of Wasm memory are maintained throughout the execution process.
Traditional programming languages have native support for data types like boolean values, 8-bit integers, and 64-bit integers, but in ZK circuits, variables always change within the range of integers modulo a large prime (≈ 2254). Since VMs typically operate with 64-bit numbers, circuit developers need to use a set of constraint systems to ensure they have the correct numerical ranges, while formal verification engineers need to track the invariant properties regarding these ranges throughout the verification process. Reasoning about execution flow and execution tables will involve all other auxiliary tables, so we need to check whether the ranges of all table data are correct.
Similar to the management of memory operation counts, zkVM requires a similar set of lemmas to verify control flow. Specifically, each call and return instruction needs to use the call stack. The call stack is implemented using a table scheme similar to that of the data stack. For the XOR instruction, we do not need to modify the call stack; however, when verifying the entire instruction, we still need to track and verify whether the counts of control flow operations are correct.
Verifying this instruction
Let us integrate all the steps to verify the end-to-end correctness theorem of the zkWasm XOR instruction. The following verification is completed in an interactive proof environment, where each formal construct and logical reasoning step has undergone the strictest machine checks.
As seen above, formal verification of zkVM circuits is feasible. However, it is a daunting task that requires understanding and tracking many complex invariant properties. This reflects the complexity of the software being verified: every lemma involved in the verification needs to be correctly handled by the circuit developers. Given the high stakes involved, having them undergo machine checks by formal verification systems rather than relying solely on careful manual review is highly valuable.