Zilkworm in the ERE Benchmark
A C++ zkEVM guest benchmarked across 1077 fixtures
The ERE zkEVM benchmark workload offers a neutral, multi-client framework for measuring the correctness and performance of zkEVM implementations. Recently, we added support for Zilkworm and ran it head-to-head against Reth over (a subset of) the ERE benchmark suite.
This post presents the results. It covers what we integrated, how the comparison measurements were performed, and — without cherry-picking — where Zilkworm wins, where it falls short, and how you can reproduce the comparison yourself.
Background: a C++ EVM inside a zkVM
Most zkEVM projects are built around Rust execution engines. Zilkworm takes a different approach: the guest program that runs inside the zkVM is evmone, a battle-tested production-grade C++ EVM interpreter, cross-compiled to a bare-metal RISC-V ELF. Rust is still involved, but only as an orchestration layer around proving and verification.
Implementing a zkEVM involves a wide range of correctness challenges. We previously wrote about some of the harder parts, particularly around correctly verifying the execution witnesses.
Getting all the functional details right while maintaining strong performance is critical, but it's difficult to communicate how many trade-offs are at play. Benchmarking helps make them more visible and easier to reason about.
What the ERE benchmark measures
The ERE workload takes Ethereum Execution Spec Test (EEST) fixtures — self-contained blocks with their state witnesses — and runs each one through a stateless validation guest program inside a zkVM. Each execution client (Reth, Ethrex, and now Zilkworm) provides its own guest; the framework feeds all of them the same fixtures and records the total cycle count.
Two properties make this a fair comparison:
Same inputs, same fork. Every client validates identical blocks at the same fork, so any differences come from the guest, not the workload.
Cycles, not wall-clock. zkVM cycle counts are deterministic and host-independent. They ultimately determine proving cost, and they don't care whether you ran on a datacenter Linux box or, as we did, an emulated container on a laptop.
The integration
Adding Zilkworm to the workload took a small, self-contained change:
a new
Zilkwormvariant of the stateless-validation execution client;a host-side adapter that re-encodes each fixture's stateless input into Zilkworm's unified-RLP bundle format, the wire format consumed by the C++ guest;
a download step that fetches the guest ELF and its verifying key directly from a pinned Zilkworm GitHub release tag.
That last point matters: the benchmark doesn't build Zilkworm from source, it consumes released artifacts. A successful run therefore validates the entire pipeline — the Zilkworm guest, its host adapter, and the release plumbing — in a single pass.
The run
Zilkworm currently consumes only the legacy ERE fixture format: support for the new EEST format will be added soon. To make the ERE benchmark runs more tractable in terms of execution time, we filtered the entire fixture set with --include 10M. We then executed the resulting 1077-fixture set (tests-benchmark@v0.0.9, 10M gas, Osaka) in SP1 execute mode for Zilkworm v0.1.0-alpha.2-ere and Reth v2.1.0, running on an Apple M1 Max under OrbStack. Full environment details and step-by-step instructions are described in our ERE integration doc.
Correctness first
| Client | Completed | Outputs matched | Mismatches |
|---|---|---|---|
Zilkworm v0.1.0-alpha.2-ere |
1077 / 1077 | 1077 | 0 |
Reth v2.1.0 |
1077 / 1077 | 1077 | 0 |
Both clients complete every fixture and produce byte-identical public outputs. Before comparing performance, this is the baseline — and it holds.
Cycle comparison
| Metric | Value |
|---|---|
| Total cycles — Reth | 301,880,882,267 |
| Total cycles — Zilkworm | 159,156,117,864 |
| Ratio (Zilkworm / Reth) | 0.527 — Reth uses 1.90× as many cycles |
| Median per-fixture ratio | 0.654 |
| Fixtures where Zilkworm is cheaper | 906 / 1077 (84%) |
Across the suite, Zilkworm executes the same blocks in just over half the cycles and is cheaper on the vast majority of fixtures.
Where the difference comes from
The aggregate hides a more interesting story. Grouping by test family (ratio < 1 means Zilkworm is cheaper):
modexp 0.15 ← u256x2048 multiply syscall
bls12_381 0.23
alt_bn128 0.47
sha256 0.54
arithmetic 0.60
keccak 0.84
call_context 0.85
memory 0.98
-------------------------------- Reth cheaper below
block_context 1.97
ripemd160 3.06
p256verify 16.13 ← no dedicated syscall (yet)
The largest gains come from precompiles Zilkworm accelerates via SP1 syscalls. The most extreme case is modexp, where a hardware-accelerated 256×2048-bit multiply turns a ~1.2-billion-cycle Reth execution into ~26 million cycles — roughly a 48× reduction.
The losses are just as informative. p256verify (RIP-7212 secp256r1) costs Zilkworm ~16× more than Reth; RIPEMD-160 shows the same pattern with ~3×. A handful of very lightweight blocks (block_context opcodes) also favor Reth, where its smaller fixed per-block overhead wins on workloads that do little else.
We are reporting these openly because they directly shape the performance roadmap: the benchmark now tells us exactly what we should tackle next.
Reproduce it yourself
Everything here is reproducible from the Zilkworm repository. Two make targets drive the workload harness (cloning the fork, generating fixtures, running the SP1 guests):
# Validate Zilkworm only: 1077/1077 success + output match
make ere-validate
# Run both Zilkworm and Reth and print the cycle comparison
make ere-compare
make ere-compare prints the summary, per-family ratios, and best/worst fixtures. The detailed walkthrough — exact versions, fixture generation flags, and how to run memory-heavy fixtures on resource-constrained machines — is in our ERE integration doc.
Takeaways
Zilkworm validates the same 1077 EEST blocks as Reth, with identical outputs and ~1.9× fewer SP1 cycles overall.
The advantage is broad, but primarily seen in precompile-heavy execution.
The remaining gaps are specific, measured, and actionable.
The results are reproducible and speak for themselves.
Note: Fixture witnesses are regenerated on every run by the harness's
witness-generator-cli, and that step is not bitwise deterministic — the accessed trie nodes are serialized in unordered-set order. Every variant is a valid witness, so absolute cycle totals may shift by ~0.01% run-to-run while the ratio and per-family breakdown stay stable.
Zilkworm is open source at github.com/erigontech/zilkworm.
Support for Zilkworm in the ERE benchmark was introduced in PR #291.