Second Arithmetic Program: Discussion on Zero-Knowledge Proofs

2022-08-15 16:31:28

Collection

Author: Vitalik Buterin

Original Title: 《Quadratic Arithmetic Programs: from Zero to Hero》

Published on: December 10, 2016

Recently, there has been a lot of interest in the technology behind zk-SNARKs (zero-knowledge proofs), as people increasingly try to unveil the mystery of what many refer to as "moon math," which is considered to be very complex and difficult to understand. Understanding zk-SNARKs is indeed quite challenging, especially because there are so many moving parts that need to be assembled for the entire system to work, but if we break this technology down piece by piece, it becomes easier to understand.

The purpose of this article is not to provide a complete introduction to zk-SNARKs; it assumes you have the following background knowledge:

1- You know what zk-SNARKs are and their general principles;

2- You have enough mathematical knowledge to understand some basic polynomial concepts. (For example, if P(x) + Q(x) = (P + Q)(x), where P and Q represent polynomials, if you are very familiar with this type of polynomial representation, you meet the requirements to continue reading).

zk-SNARK Knowledge Pipeline Diagram, drawn by Eran Tromer

As shown in the diagram above, the zero-knowledge proof can be divided into two stages from top to bottom. First, zk-SNARKs cannot be directly applied to any computational problem; instead, you must convert the problem into the correct "form" for operation. This form is called a "Quadratic Arithmetic Program" (QAP), and converting the function's code into this code itself is very important. Along with the process of converting function code into QAP, there is another process that runs so that if there is input to the code, a corresponding solution (sometimes referred to as the "witness" of the QAP) can be created. This is what this article needs to discuss.

After this, there is another quite complex process to create the actual "zero-knowledge proof" for this QAP, and a separate process to verify the evidence sent to you by others, but these details are beyond the scope of this article.

In the example below, we will choose a very simple problem:

Find the solution to a cubic equation: x**3 + x + 5 == 35 (Hint: the answer is 3).

This problem is simple, but importantly, you can see how all the functionality works from this case.

Describing the above equation in a programming language looks like this:

def qeval(x):  
    y = x\*\*3  
    return x + y + 5

The simple programming language we are using here supports basic arithmetic (+, -, *, /), identity power exponentiation (x7 but not x*y), and variable assignment, which is powerful enough to theoretically perform any computation (as long as the number of computational steps is bounded; loops are not allowed). Note that modulo (%) and comparison operators (<, >, ≤, ≥) are not supported, because there is no effective way to do modulo or directly compare finite cyclic group algorithms (thankfully; if there were any way to do this, the speed of breaking elliptic curve cryptography would surpass "binary search" and "Chinese remainder theorem").

You can extend the language to support modulo and comparisons through bit decomposition (for example: 13 = 2**3 + 2**2 + 1 = 8 + 4 + 1) as auxiliary input, proving the correctness of these decompositions, and performing mathematical operations in binary circuits; in finite field algorithms, performing equality (==) checks is also feasible and actually a bit easier, but we will not discuss these two details now. We can extend the language to support conditional statements (for example, converting the statement: if x < 5: y = 7; else: y = 9; into arithmetic form: y = 7 * (x < 5) + 9 * (x >= 5);) However, note that both "paths" of the condition need to be executed, and if you have many nested conditions, this can lead to a lot of overhead.

Now let’s go through this process step by step. If you want to try any code yourself, I have implemented a piece of code in Python here (for educational purposes only; it is not yet ready to make QAPs for real-world zk-SNARKs!)

Step 1: Flattening

The first step is a "flattening" process, where we break down the original code (which may contain arbitrarily complex statements and expressions) into the simplest expressions, which come in two forms:

1- x = y (where y can be a variable or a number)
2- x = y(op)z (where op can be +, -, *, /, and y and z can be variables, numbers, or sub-expressions).

You can think of these statements as logic gates in a circuit. The flattening process of the expression x**3 + x + 5 results in the following:

sym_1 = x * x  
y = sym_1 * x // equivalent to implementing the power function y = x\*\*3  
sym_2 = y + x  
~out = sym_2 + 5

You can think of each line of the above declarations as a logic gate in a circuit. Compared to the original code, we have introduced two intermediate variables sym1 and sym2, as well as a redundant variable ~out representing the output. It is not difficult to see that the sequence of declarations after "flattening" is equivalent to the original code.

Step 2: Convert to R1CS

Now, we convert it into something called R1CS (Rank-1 Constraint System). R1CS consists of a sequence of three vectors (a, b, c), and the solution to R1CS is a vector s, where s must satisfy the equation

s . a * s . b - s . c = 0

where . represents the inner product operation.

For example, here is a satisfying R1CS:

a = (5,0,0,0,0,1),  
b = (1,0,0,0,0,0),  
c = (0,0,1,0,0,0),  
s = (1,3,35,9,27,30),

(Translator's note: The first 35 = 1*5 + 30*1, the second 35 = 35 * 1)

The above example is just one constraint; next, we will convert each logic gate (i.e., each declaration statement after "flattening") into a constraint (i.e., a group of three vectors (a, b, c)). The conversion method depends on what operation the declaration is (+, -, *, /) and whether the parameters of the declaration are variables or numbers. In our example, in addition to the five variables after "flattening" ('x', '~out', 'sym1', 'y', 'sym2'), we also need to introduce a redundant variable ~one at the first component position to represent the number 1. For our system, a vector corresponding to 6 components is (it can be in any other order as long as they correspond):

'~one', 'x', '~out', 'sym1', 'y', 'sym2'

First Gate

sym1 = x * x, i.e., x * x - sym1 = 0

We can obtain the following vector group:

a = [0, 1, 0, 0, 0, 0]  
b = [0, 1, 0, 0, 0, 0]  
c = [0, 0, 0, 1, 0, 0]

If the second scalar of the solution vector s is 3 and the fourth scalar is 9, it holds true regardless of what the other scalars are, because: a = 3 * 1, b = 3 * 1, c = 9 * 1, i.e., a * b = c. Similarly, if the second scalar of s is 7 and the fourth scalar is 49, it will also pass the check; the first check is merely to verify the consistency of the inputs and outputs of the first gate.

Second Gate

y = sym1 * x, i.e., sym1 * x - y = 0
We can obtain the following vector group:

a = [0, 0, 0, 1, 0, 0]  
b = [0, 1, 0, 0, 0, 0]  
c = [0, 0, 0, 0, 1, 0]

Third Gate

sym2 = y + x, the addition gate needs to be converted to: (x + y) * 1 - sym2 = 0
We get the following vector group:

a = [0, 1, 0, 0, 1, 0]  
b = [1, 0, 0, 0, 0, 0] // corresponding to constant 1, using ~one  
c = [0, 0, 0, 0, 0, 1]

Fourth Gate

~out = sym2 + 5, i.e., (sym2 + 5) * 1 - ~out = 0
We get the following vector group:

a = [5, 0, 0, 0, 0, 1]  
b = [1, 0, 0, 0, 0, 0]  
c = [0, 0, 1, 0, 0, 0]

Now, assuming x = 3, we get sym1 = 9 from the first gate, y = 27 from the second gate, sym2 = 30 from the third gate, and ~out = 35 from the fourth gate. Therefore, based on: '~one', 'x', '~out', 'sym1', 'y', 'sym2', we can obtain:

s = [1, 3, 35, 9, 27, 30]

If we assume a different x, we can get different s, but all s can be used to verify (a, b, c).

Now we have the R1CS for four constraints, and the complete R1CS is as follows:

[0, 1, 0, 0, 0, 0]  
[0, 0, 0, 1, 0, 0]  
[0, 1, 0, 0, 1, 0]  
[5, 0, 0, 0, 0, 1]

[0, 1, 0, 0, 0, 0]  
[0, 1, 0, 0, 0, 0]  
[1, 0, 0, 0, 0, 0]  
[1, 0, 0, 0, 0, 0]

[0, 0, 0, 1, 0, 0]  
[0, 0, 0, 0, 1, 0]  
[0, 0, 0, 0, 0, 1]  
[0, 0, 1, 0, 0, 0]

Step 3: From R1CS to QAP

The next step is to convert this R1CS into QAP form, which implements exactly the same logic, but uses polynomials instead of inner products. We do this by transforming the four groups of length six of three vectors into six groups of degree three polynomials, evaluating the polynomials at each x coordinate to represent a constraint. That is, if we evaluate the polynomial at x=1, we get the first group of vectors; if we evaluate it at x=2, we get the second group of vectors, and so on.

We can use Lagrange interpolation to perform this transformation. The problem solved by Lagrange interpolation is: if you have a set of points (i.e., (x, y) coordinate pairs), then performing Lagrange interpolation on these points yields a polynomial that passes through all these points. We break down the problem: for each x coordinate, we create a polynomial, using the x coordinates and y coordinates of the required y coordinates at all other x coordinates we are interested in, and then let the final result be the sum of all the polynomials.

Let’s do an example. Suppose we want a polynomial that passes through (1,3), (2,2), and (3,4). We first create a polynomial that passes through (1,3), (2,0), and (3,0). It turns out that a polynomial that "stretches out" at x = 1 and 0 at the other points of interest is easy to create; we just need to do the following polynomial:

y = (x - 2) * (x - 3)

As shown in the figure:

Then, we "stretch" it in the y-axis using the following equation:

y = (x - 2) * (x - 3) * 3 / ((1 - 2) * (1 - 3))

After simplification, we get:

y = 1.5 * x**2 - 7.5 * x + 9

This satisfies passing through the points (1,3), (2,0), and (3,0), as shown in the figure:

Substituting the points (2,2) and (3,4) into the above equation, we can get:

y = 1.5 * x**2 - 5.5 * x + 7

This is the coordinate equation we want. The above algorithm requires O(n^3) time because there are n points, and each point requires O(n^2) time to multiply the polynomials. With a little thought, this can be reduced to O(n^2) time, and with further thought, using fast Fourier transform algorithms, it can be further reduced—this is a key optimization when functions used in zk-SNARKs typically have thousands of gates.

Here, I will directly give the Lagrange interpolation formula:

The (n-1) degree polynomial passing through n points (x1,y1), (x2,y2), (x3,y3),…, (xn,yn) is:

For example, in the above case, the polynomial passing through the points (1,3), (2,2), and (3,4) is:

Once you learn to use this formula, we can continue with our steps. Now we need to convert the four groups of three vectors of length six into six groups of polynomials, each group containing three degree three polynomials, and we will evaluate different constraints at each x point. Here, we have four constraints, so we will evaluate the polynomials at x = 1, 2, 3, and 4 for these four vector groups.

Now we use the Lagrange interpolation formula to convert R1CS into QAP form. We first find the polynomial corresponding to the first value of each a vector for the four constraints, that is, using the Lagrange interpolation theorem to find the polynomial passing through the points (1,0), (2,0), (3,0), (4,0). Similarly, we can find the polynomial for the i-th value of each vector corresponding to the other four constraints.

Here, I will directly give the answers:

A polynomials

[-5.0, 9.166, -5.0, 0.833]  
[8.0, -11.333, 5.0, -0.666]  
[0.0, 0.0, 0.0, 0.0]  
[-6.0, 9.5, -4.0, 0.5]  
[4.0, -7.0, 3.5, -0.5]  
[-1.0, 1.833, -1.0, 0.166]

B polynomials

[3.0, -5.166, 2.5, -0.333]  
[-2.0, 5.166, -2.5, 0.333]  
[0.0, 0.0, 0.0, 0.0]  
[0.0, 0.0, 0.0, 0.0]  
[0.0, 0.0, 0.0, 0.0]  
[0.0, 0.0, 0.0, 0.0]

C polynomials

[0.0, 0.0, 0.0, 0.0]  
[0.0, 0.0, 0.0, 0.0]  
[-1.0, 1.833, -1.0, 0.166]  
[4.0, -4.333, 1.5, -0.166]  
[-6.0, 9.5, -4.0, 0.5]  
[4.0, -7.0, 3.5, -0.5]

These coefficients are sorted in ascending order; for example, the first polynomial is 0.833 * x**3 - 5 * x**2 + 9.166 * x - 5. If we substitute x=1 into the above eighteen polynomials, we can obtain the three vectors of the first constraint:

(0, 1, 0, 0, 0, 0),  
(0, 1, 0, 0, 0, 0),   
(0, 0, 0, 1, 0, 0),  
...

Similarly, substituting x = 2, 3, 4 into the above polynomials can recover the remaining parts of R1CS.

Step 4: Check QAP

By converting R1CS into QAP, we can check all constraints simultaneously through polynomial inner product operations rather than checking each constraint individually as in R1CS. As shown in the figure:

Because in this case, the dot product check is a series of additions and multiplications of polynomials, and the result itself is a polynomial. If the resulting polynomial, at each x coordinate we used to represent the logic gates, equals 0, it means all checks have passed; if the resulting polynomial has at least one non-zero value, it means the values entering and exiting the logic gates are inconsistent.

It is worth noting that the resulting polynomial itself does not necessarily have to be zero; in fact, in most cases, it is not; it can behave arbitrarily at points that do not satisfy any logic gates, as long as the result is zero at all points that satisfy certain gates. To verify correctness, we do not compute the polynomial t = A . s * B . s - C . s at each point corresponding to a gate; instead, we divide t by another polynomial Z and check if Z divides t evenly, meaning the division t / Z has no remainder.

Z is defined as (x - 1) * (x - 2) * (x - 3)… the simplest polynomial that equals 0 at all points corresponding to the logic gates. This is a fundamental fact of algebra: any polynomial that equals zero at all these points must be a multiple of this minimal polynomial; if a polynomial is a multiple of Z, then its value at any of these points is zero; this equivalence makes our work much easier.

Now, let’s perform the inner product check using the above polynomials.

First, we obtain the intermediate polynomials:

A . s = [43.0, -73.333, 38.5, -5.166]  
B . s = [-3.0, 10.333, -5.0, 0.666]  
C . s = [-41.0, 71.666, -24.5, 2.833]

*(Translator's note: The above calculations:
43.0 = -5 * 1 + 8 * 3 + 0 * 35 - 6 * 9 + 4 * 27 - 1 * 30,
-73.333 = 9.166 * 1 - 11.333 * 3 + 0 * 35 + 9.5 * 9 - 7 * 27 + 1.833 * 30,
…
-3 = 3 * 1 - 2 * 3 + 0 * 35 + 0 * 9 + 0 * 27 + 0 * 30
…)**

After calculating A . s * B . s - C . s, we get:

t = [-88.0, 592.666, -1063.777, 805.833, -294.777, 51.5, -3.444]

*(Translator's note: Calculation process:
A . s = [43.0, -73.333, 38.5, -5.166] = -5.166 * x3 + 38.5 * x2 - 73.333 * x + 43,
B . s = [-3.0, 10.333, -5.0, 0.666] = 0.666 * x3 - 5 * x2 + 10.333 * x - 3.0,
C . s = [-41.0, 71.666, -24.5, 2.833] = 2.833 * x3 - 24.5 * x2 + 71.666 * x - 41.0
A . s * B . s - C . s is the calculation of the above polynomials, and after calculating, we arrange the coefficients from low to high powers, obtaining: [-88.0, 592.666, -1063.777, 805.833, -294.777, 51.5, -3.444]**

The minimal polynomial is:

Z = (x - 1) * (x - 2) * (x - 3) * (x - 4)

That is:

Z = [24, -50, 35, -10, 1]

The above calculation process can be viewed here%20%5Ccdot%20%5Cleft(x%20-%203%5Cright)%20%5Ccdot%20%5Cleft(x%20-%204%5Cright)).

Now we calculate the polynomial division:

h = t / Z = [-3.666, 17.055, -3.444]

h must be an exact division with no remainder.

You can click here to verify%20%5Ccdot%5Cleft(x%20-%203%5Cright)%20%5Ccdot%5Cleft(x%20-%204%5Cright)%5Ccdot%5Cleft(-3.444x%5E%7B2%7D%2B17.055x-3.666%5Cright)).

We have the solution to the QAP. If we try to forge the variables in R1CS, and this R1CS derives a QAP solution— for example, setting the last number of s to 31 instead of 30, we will get a t polynomial that fails the check (specifically, at x = 3 = 1 instead of 0), and it will not be a multiple of Z; instead, dividing t / Z will yield [-5.0, 8.833, -4.5, 0.666] as the remainder.

Note that the above is just a very simple example; in the real world, addition, subtraction, multiplication, and division operations usually involve unconventional numbers, so all the algebraic laws we know and love still apply, but all answers are some elements of a certain size, usually integers in the range from 0 to n - 1. For example, if n = 13, then 1 / 2 = 7 (7 * 2 = 1), 3 * 5 = 2, and so on. Using finite field algorithms eliminates concerns about rounding errors and allows the system to work well with elliptic curves, which ultimately makes zk-SNARK protocols truly secure.

Related tags

Vitalik zk-SNARKs zero-knowledge proof

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.

Vitalik Buterin

Vitalik Chinese Dialogue Community: Ethereum needs new stories and new users, EF is undergoing internal reforms

Under the pressure of public opinion, Vitalik posted a message urging L2: Come back and support ETH

Belonging Knowledge Base

Vitalik Buterin文集

第 2 章：去中心化金融 (DEFI) 是什么?

Related tags

Vitalik zk-SNARKs zero-knowledge proof