zkMIPS Beta: A Competitive Performance Report

Home

Blog

News Tech Vision

ZKM

Sep 23, 2024

Overview and Rationale

In this work, we begin to publish a general and fair zkVM benchmark framework based on previous work by a16z, providing a comparison on proving time and energy cost between ZKM (zkMIPS) and other zkVM projects, like RISC Zero (R0) and SP1.

This initial publication focuses on comparing proving time with R0, with the result showing that the performance of zkMIPS is close in a range 76% to 317%, and we also provide analysis about the slow cases and upcoming optimizations.

We aim for zkMIPS to become one of the most production-ready zkVM’s on the market and the most performant to leverage the MIPS instruction set.‍

Metrics‍

We seek to investigate the following questions:

General-Purpose VM Use Cases: Evaluating which proof system is best for general-purpose VM use.
Faster Proofs: Comparing ZKM vs R0 to determine which implementation offers the faster proof generation.

Methodology

Note: By presenting the specific instance type and benchmarks, we implemented the zkVM-benchmark (https://github.com/zkMIPS/zkvm-benchmarks) based on a16z/zkvm-benchmark, updating R0 to the latest v1.0.5 to ensure the results were directly comparable and fair.

For pure CPU machines, zkVM can still utilize Intel’s AVX instruction set to speed up the Goldilocks operations. This feature can help achieve a 6-10% speedup, based on our previous experiences. zkVM enabled this feature during the benchmark by adding runtime flags RUSTFLAGS="-C target-cpu=native".

Benchmarks

All benchmarks can be found at https://github.com/zkMIPS/zkvm-benchmarks

sha2 and sha2-chain: the sample sha2 digests different sizes of bytes and the sample sha2-chain runs n-times sha2 function with fixed-size(32-byte) input.
sha3 and sha3-chain: the sample sha3 digests different sizes of bytes and the sample sha2-chain runs n-times sha3 function with fixed-size(32-byte) input.
Fibonacci: classic Fibonacci sequence calculation with different length
bigmem: allocate an 128000-byte array, and use block_box to avoid the compiler's dead code elimination optimization.
Rust EVM: A Rust EVM implementation by Bluealloy at commit

bef9edd. (https://github.com/zkMIPS/zkm/tree/main/prover/examples/revme )‍

Environment

CPU Instance: AWS r6a.8xlarge, 32 vCPU and 256G RAM, AMD EPYC 7R13 Processor

GPU Instance: 64-vCPU，480G RAM, AMD EPYC 9354 32-Core Processor, NV GeForce RTX 4090X4

CPU Instance

segment size of zkMIPS is 262144(2^21).

SHA2:

SHA2-chain:

SHA3:

SHA3-chain:

bigmem:

Fibonacci:

Summary

From this comparison, we can see that zkMIPS is competitive with other top-tier zkVMs regarding the proving performance and energy cost.

Furthermore, the analysis of our time distribution over the proof generation is depicted in Figure 1:

*Figure 1: time/cost distribution for a single segment*

Observing the different segments, we found that it takes up about 10%-22% of the total time to generate the traces, and 25%-32% to compute the commitment of each table, but the CTL (cross table lookup, which is built on GKR-optimized LogUp scheme) time is negligible.

By design, the traces’ generation should only be run on a single CPU, but the commitments calculation for different tables, like CPU, Arithmetic etc, can be executed in parallel on multiple CPUs, which reduces 2/3 of the time usage from now.

The proof generation takes up to about 45%. Regarding the proof calculation, we have a detailed pie chart below in Figure 2 - memory operation and cpu operations take about 72.8% of the total time, it matches along with the trace table’s size.

*Figure 2: time/cost distribution for a proof calculatio*

For each STARK proof generation, the time usage distribution is shown in the tables 4,5 & 6 below. It’s easy to observe that the ‘compute auxiliary polynomials commitment’ and ‘compute openings proof’ are the most time consuming to compute. The ‘compute auxiliary polynomial commitment’ includes Poseidon-based Merkel Hash and the FFT to calculate the point-value format polynomial from coefficient polynomials. And ‘compute openings proof’ is calculating the final polynomial on the opening points by polynomial multiplication and division(inversion) operations.

Table 4: time usage for computing single STARK proof

Compute single STARK proof	Time used(s)	Ratio
Compute lookup helper columns	2.562	3.38%
Compute auxiliary polynomials commitment	16.8139	22.21%
Compute quotient polys	7.0605	9.33%
Split quotient polys	0.1736	0.23%
Compute quotient commitment	10.5241	13.90%
Compute openings proof	38.5683	50.95%

Table 5: time usage for computing compute auxiliary polynomials commitment

Compute auxiliary polynomials commitment	Time used(s)	Ratio
IFFT	0.9216	5.68%
FFT + blinding	3.6157	22.28%
Transpose LDEs	0.2759	1.70%
Build Merkle tree	11.4121	70.34%

Table 6: time usage for compute openings proof

Compute openings proof	Time used(s)	Ratio
Reduce batch	24.145	66.06%
Perform final FFT	8.72	23.86%
Fold codewords in the commitment phase	3.6842	10.08%

Closing

Differentiating from ZKM’s use of Plonky2, in which the FFT and Poseidon are performed over field Goldilocks, RISC0 and SP1 are using a more efficient hash function and smaller field, which can greatly benefit the proving time. We have implemented GPU Plonky2 and realized a speedup of 3. Meanwhile, we are looking to integrate the more efficient hash function and smaller field on Plonky2 in-place, expecting this to reduce the time and cost by roughly half.

With the planned optimizations, we’re confident that we can significantly increase the performance of zkMIPS to be evermore closely competitive with other leading zkVMs. ‍

We have strived to be as accurate as possible in these benchmark comparisons. If any discrepancies are identified, please reach out to us at contact@zkm.io, and we will address any necessary corrections.

‍

Hello World - May Newsletter

We were delighted to announce a successful $5M Pre-A funding round in November 2023, led by OKX Ventures, with support from Polygon Ventures, Crypto.com, Amber Group, Leland Ventures, Waterdrip Capital, DFG, JSquare, Contribution Capital, and Metis Foundation 🔥

Entangled Rollups: Multi-chain Interoperability Without Bridges

We recently introduced a new trust-minimized multi-chain Interoperability infrastructure called Entangled Rollup.‍In this work, we implement an interoperability protocol by judiciously entangling the underlying primitives under standard security assumptions of zkRollups, leveraging our state-of-the-art recursive zkVM (zkMIPS). ‍The Entangled Rollup protocol is trustless, and a step forward to addressing liquidity fragmentation, in addition to simplifying the user and developer experience as major adoption barriers of the multi-chain world. ‍