Interpretation of Arweave Version 17 White Paper (4): Storing Complete Data Copies is the Key

PermaDAO
2024-04-03 10:27:10
Collection
Arweave builds its core consensus protocol through the SPoRes mechanism, based on the assumption that, under incentives, both individual miners and collaborative mining groups will execute the best strategy of maintaining complete data copies as part of their mining efforts. Storing complete data copies is the key!

In the interpretation (III), we demonstrated the feasibility of #SPoRes through mathematical derivation. In the text, Bob participated in this proof game along with Alice. In #Arweave mining, the protocol deployed a modified version of this SPoRes game. During the mining process, the protocol acted as Bob, while all miners in the network collectively played the role of Alice. Each valid proof of the SPoRes game is used to create the next block of Arweave. Specifically, the generation of Arweave blocks is related to the following parameters:

Where:

BI = Block Index of the Arweave network;

800*np = The maximum number of hash attempts that can be unlocked per checkpoint per partition, where np is the number of partitions that the miner stores, each with a size of 3.6 TB. The product represents the maximum number of hash operations that the miner can attempt per second.

d = Difficulty of the network.

A successful valid proof is one that exceeds the difficulty value, which is adjusted over time to ensure that a block is mined on average every 120 seconds. If the time difference between block i and block (i+10) is t, then the adjustment from the old difficulty di to the new difficulty d{i+10} is calculated as follows:

Where:

Formula notes: From the above two formulas, it can be seen that the adjustment of network difficulty mainly relies on the parameter r, which indicates the offset parameter of the actual time required for block generation relative to the system's expected standard time of 120 seconds per block.

The newly calculated difficulty determines the probability of successfully mining a block based on each generated SPoA proof, as follows:

Formula notes: From the above derivation, it can be concluded that the probability of mining success under the new difficulty is the success probability under the old difficulty multiplied by the parameter r.

Similarly, the difficulty of VDF will also be recalculated to ensure that the checkpoint cycle occurs once per second.

Incentive Mechanism for Complete Copies

Arweave generates each block through the SPoRes mechanism based on the assumption that:

Under incentives, whether individual miners or collaborative mining groups will execute maintaining complete data copies as the best strategy for mining.

In the previously introduced SPoRes game, the number of SPoA hashes released by two copies of the same part of the stored dataset is the same as that of a complete copy of the entire dataset, which leaves room for speculative behavior by miners. Therefore, when Arweave actually deployed this mechanism, it made some modifications. The protocol divides the number of SPoA challenges unlocked per second into two parts:

  • One part specifies a partition in the miner's stored partitions to release a certain number of SPoA challenges;
  • The other part randomly specifies a partition among all data partitions in Arweave to release SPoA challenges. If the miner does not store a copy of this partition, they will lose this portion of the challenge count.

You may wonder what the relationship is between SPoA and SPoRes. The consensus mechanism is SPoRes; why are the challenges released SPoA? In fact, there is a subordinate relationship between them. SPoRes is the general term for this consensus mechanism, which includes a series of SPoA proof challenges that miners need to perform.

To understand this, we will examine how the VDF described in the previous section is used to unlock SPoA challenges.

The above code details how to unlock a range of backtracking composed of a certain number of SPoA through VDF (verifiable delay function).

  1. Approximately every second, the VDF hash chain outputs a checkpoint (Check);
  2. This checkpoint Check will be used with the mining address (addr), partition index (index(p)), and the original VDF seed (seed) to compute a hash value H0 using the RandomX algorithm, which is a 256-bit number;
  3. C1 is the backtracking offset, which is obtained by taking the remainder of H0 divided by the size of the partition size(p), and it will be the starting offset of the first backtracking range;
  4. The 400 data blocks of 256 KB within a continuous range of 100 MB starting from this starting offset are the first backtracking range SPoA challenges that are unlocked.
  5. C2 is the starting offset of the second backtracking range, which is obtained by taking the remainder of H0 divided by the sum of the sizes of all partitions, and it also unlocks 400 SPoA challenges in the second backtracking range.
  6. The constraint for these challenges is that the SPoA challenges in the second range need to have corresponding SPoA challenges in the first range.

Performance of Each Packaged Partition

The performance of each packaged partition refers to the number of SPoA challenges generated by each partition at each VDF checkpoint. When miners store unique replicas of the partition, the number of SPoA challenges will be greater than when miners store multiple copies of the same data.

The concept of "unique replica" here is significantly different from that of "backup." For more details, you can refer to the past article "Arweave 2.6 Perhaps More Aligned with Satoshi's Vision."

If a miner only stores the unique replica data of a partition, then each packaged partition will generate all the challenges of the first backtracking range, and then generate the second backtracking range based on the number of stored partition replicas. If there are m partitions in the entire Arweave weaving network, and the miner stores n unique replicas of those partitions, then the performance of each packaged partition is:

When the partitions stored by the miner are backups of the same data, each packaged partition will still generate all the challenges of the first backtracking range. However, only in 1/m of the cases will the second backtracking range be located within this partition. This brings a significant performance penalty to this storage strategy, resulting in a ratio of the number of SPoA challenges generated of only:

Figure 1: When a miner (or a group of collaborating miners) completes packaging their dataset, the performance of the given partition will increase.

The blue line in Figure 1 represents the performance perf_{unique}(n,m) of storing unique replicas of the partition, which intuitively shows that when miners only store a few partition replicas, the mining efficiency of each partition is only 50%. When all parts of the dataset are stored and maintained, i.e., n=m, the mining efficiency reaches its maximum of 1.

Total Hash Rate

The total hash rate (as shown in Figure 2) is given by the following equation, obtained by multiplying the value per partition by n:

The above formula indicates that as the size of the weaving network (Weave) increases, if unique replica data is not stored, the penalty function (Penalty Function) grows quadratically with the increase in the number of stored partitions.

Figure 2: Total mining hash rate of unique datasets and backup datasets

Marginal Partition Efficiency

Based on this framework, we explore the decision problem miners face when adding new partitions, namely whether to choose to copy an existing partition or to obtain new data from other miners and package it as a unique replica. When they have already stored n unique replicas from the maximum possible m partitions, their mining hash rate is proportional to:

Thus, the additional benefit of adding a unique replica of a new partition is:

While the (smaller) benefit of copying a packaged partition is:

Dividing the first quantity by the second, we obtain the miner's relative marginal partition efficiency (relative marginal partition efficiency):

Figure 3: Miners are incentivized to build a complete replica (Option 1) rather than making additional copies of data they already have (Option 2)

The rmpe value can be seen as a penalty for miners when adding new data by copying existing partitions. In this expression, we can let m approach infinity and then consider the efficiency trade-offs at different n values:

  • When miners have nearly complete dataset replicas, the reward for completing a replica is highest. Because if n approaches m and m approaches infinity, the value of rmpe will be 3. This means that when close to a complete replica, the efficiency of seeking new data is three times that of repackaging existing data.
  • When miners store half of the weaving network (Weave), for example, when n= 1/2 m, rmpe is 2. This indicates that the benefit of miners seeking new data is twice that of copying existing data.
  • For lower n values, the rmpe value approaches but is always greater than 1. This means that the benefits of storing unique replicas are always greater than those of copying existing data.

As the network grows (m approaches infinity), the motivation for miners to build complete replicas will increase. This promotes the formation of collaborative mining groups that collectively store at least one complete copy of the dataset.

✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨

This article mainly introduces the details of the Arweave consensus protocol construction, and of course, this is just the beginning of the core content of this part. From the mechanism introduction and code, we can intuitively understand the specific details of the protocol. I hope it helps everyone understand.

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
ChainCatcher Building the Web3 world with innovators