ZKM Prover: Running an Emulator
Share on

1. Emulator

The emulator primarily implements the simulation of the MIPS instruction set and provides interfaces for running MIPS ELF programs and generating segments. All code can be found in the zkm/emulator directory.

2. Execution Process

The execution process of a MIPS program is as follows (the left side represents the conventional execution process, while the right side represents the segmented execution process):

elf_execuition_process.png

The main steps are:

  • load_elf: Loads the MIPS program into the emulated memory.
  • patch_elf: Hides certain ignorable processes (e.g., some runtime checks).
  • patch_stack: Initializes the runtime stack, including filling program arguments into the stack.
  • step: Executes a single instruction. In a conventional execution process, after executing one step, the system directly checks whether an exit condition is triggered. If triggered, it proceeds to the exit process; otherwise, it continues executing the next step. In a segmented execution process, after checking the exit condition, it also verifies whether the number of executed steps has reached the segment threshold. If the threshold is met, it enters the split_seg process. If an exit condition is triggered, it enters the split_seg + exit process.
  • split_seg: Generates a pre-memory snapshot (including system architecture state) and pre/post snapshot IDs for the current segment, then uses this information to construct the segment data structure and writes it to the corresponding segment output file.
  • Exit: Terminates program execution.

3. Main Data Structures

The main data structures include:

  • InstrumentedState: Maintains overall emulation system information, including the MIPS architectural state, the current segment ID, and the pre-execution state of the current segment (such as PC, snapshot ID, hash root, input, etc.).
pub struct InstrumentedState {
   /// state stores the state of the MIPS emulator
   pub state: Box<State>,

   /// writer for stdout
   stdout_writer: Box<dyn Write>,
   /// writer for stderr
   stderr_writer: Box<dyn Write>,

   pub pre_segment_id: u32,
   pre_pc: u32,
   pre_image_id: [u8; 32],
   pre_hash_root: [u8; 32],
   block_path: String,
   pre_input: Vec<Vec<u8>>,
   pre_input_ptr: usize,
   pre_public_values: Vec<u8>,
   pre_public_values_ptr: usize,
}

  • State: Maintains the MIPS architectural state within the emulation system, including registers, memory, heap pointers, etc.
pub struct State {
   pub memory: Box<Memory>,

   /// the 32 general purpose registers of MIPS.
   pub registers: [u32; 32],
   /// the pc register stores the current execution instruction address.
   pub pc: u32,
   /// the next pc stores the next execution instruction address.
   next_pc: u32,
   /// the hi register stores the multiplier/divider result high(remainder) part.
   hi: u32,
   /// the low register stores the multiplier/divider result low(quotient) part.
   lo: u32,

   /// heap handles the mmap syscall.
   heap: u32,

   /// brk handles the brk syscall
   brk: u32,

   /// tlb addr
   local_user: u32,

   /// step tracks the total step has been executed.
   pub step: u64,
   pub total_step: u64,

   /// cycle tracks the total cycle has been executed.
   pub cycle: u64,
   pub total_cycle: u64,

   /// A stream of input values (global to the entire program).
   pub input_stream: Vec<Vec<u8>>,

   /// A ptr to the current position in the input stream incremented by HINT_READ opcode.
   pub input_stream_ptr: usize,

   /// A stream of public values from the program (global to entire program).
   pub public_values_stream: Vec<u8>,

   /// A ptr to the current position in the public values stream, incremented when reading from public_values_stream.
   pub public_values_stream_ptr: usize,

   pub exited: bool,
   pub exit_code: u8,
   dump_info: bool,
}

  • Memory: Maintains the system’s current memory snapshot and access tracking information for the current segment.
pub struct Memory {
   /// page index -> cached page
   pages: BTreeMap<u32, Rc<RefCell<CachedPage>>>,

   // two caches: we often read instructions from one page, and do memory things with another page.
   // this prevents map lookups each instruction
   last_page_keys: [Option<u32>; 2],
   last_page: [Option<Rc<RefCell<CachedPage>>>; 2],

   // for implement std::io::Read trait
   addr: u32,
   count: u32,

   rtrace: BTreeMap<u32, [u8; PAGE_SIZE]>,
   wtrace: [BTreeMap<u32, Rc<RefCell<CachedPage>>>; 3],
}

  • Segment: Maintains segment-related information.
pub struct Segment {
   pub mem_image: BTreeMap<u32, u32>,  // initial memory image of segment
   pub pc: u32,                        // initial pc
   pub segment_id: u32,                // segment id
   pub pre_image_id: [u8; 32],         // image id of segment pre state 
   pub pre_hash_root: [u8; 32],       // hash root of segment pre memory image      
   pub image_id: [u8; 32],            // image id of segment post state 
   pub page_hash_root: [u8; 32],      // hash root of segment post memory image
   pub end_pc: u32,                   // end pc
   pub step: u64,                     // step number of cur segment
   pub input_stream: Vec<Vec<u8>>,
   pub input_stream_ptr: usize,
   pub public_values_stream: Vec<u8>,
   pub public_values_stream_ptr: usize,
}

4. Instruction Emulation

The emulator executes instructions using an instruction parsing approach: it first fetches the instruction, then decodes and executes the corresponding function based on the instruction encoding, updating the system state/memory accordingly.

The main function, mips_step(), can be found in state.rs.

The supported ISA can be found in mips_isa.

5. Memory Emulation and Image ID Computation

Memory is organized as a hash tree, with pages (4KB) as nodes. The hash page starts at address 0x8000000, the program address space ranges from 0 to 0x8000000, and the root hash page address is 0x81020000, as illustrated below.

memory.png

The computation process for page hashes and the snapshot ID is as follows:

  1. Organize memory (Mem) into 4KB pages, compute the corresponding hash, and store it in the respective hash page.
  2. Recursively compute the hash values of the hash pages until only a single hash page remains, which serves as the root hash page.
  3. Write register information at offset 0x400 in the hash page, compute the root hash page’s hash value, and await the root hash result.
  4. Concatenate the root hash value with the corresponding PC value, compute the final hash value, and obtain the snapshot ID.

To minimize the frequency of page hash updates, the emulator tracks modified memory pages during instruction execution. Consequently, during snapshot ID computation, only the hash pages corresponding to the modified memory pages need to be recursively updated, leading to the computation of the root hash and the retrieval of the snapshot ID.

Subscribe to the ZKM Blog to stay updated with the latest research by ZKM. If you have any questions about the article, you can contact the ZKM team on discord: discord.com/zkm

More articles
Hello World: April Newsletter
Hello Q2! Our team is beyond excited to begin the new quarter, and we grow stronger everyday. We grew slowly after our initial launch in July 2023, but during Q1 of 2024 our community exploded to nearly 50,000 members. With this newsletter we hope to communicate our gratitude and appreciation to our community and partners who have been such an important part of the journey.
Jolt and Lasso: A New Approach to Building zkVMs
Using zkMIPS, a client can outsource a program written using the MIPS instruction set, have it executed, and in addition to getting the result, receive an easily verifiable proof that this result is correct. The software component that creates this proof is called a zkVM.
ZKM Prover: Running an Emulator

1. Emulator

The emulator primarily implements the simulation of the MIPS instruction set and provides interfaces for running MIPS ELF programs and generating segments. All code can be found in the zkm/emulator directory.

2. Execution Process

The execution process of a MIPS program is as follows (the left side represents the conventional execution process, while the right side represents the segmented execution process):

elf_execuition_process.png

The main steps are:

  • load_elf: Loads the MIPS program into the emulated memory.
  • patch_elf: Hides certain ignorable processes (e.g., some runtime checks).
  • patch_stack: Initializes the runtime stack, including filling program arguments into the stack.
  • step: Executes a single instruction. In a conventional execution process, after executing one step, the system directly checks whether an exit condition is triggered. If triggered, it proceeds to the exit process; otherwise, it continues executing the next step. In a segmented execution process, after checking the exit condition, it also verifies whether the number of executed steps has reached the segment threshold. If the threshold is met, it enters the split_seg process. If an exit condition is triggered, it enters the split_seg + exit process.
  • split_seg: Generates a pre-memory snapshot (including system architecture state) and pre/post snapshot IDs for the current segment, then uses this information to construct the segment data structure and writes it to the corresponding segment output file.
  • Exit: Terminates program execution.

3. Main Data Structures

The main data structures include:

  • InstrumentedState: Maintains overall emulation system information, including the MIPS architectural state, the current segment ID, and the pre-execution state of the current segment (such as PC, snapshot ID, hash root, input, etc.).
pub struct InstrumentedState {
   /// state stores the state of the MIPS emulator
   pub state: Box<State>,

   /// writer for stdout
   stdout_writer: Box<dyn Write>,
   /// writer for stderr
   stderr_writer: Box<dyn Write>,

   pub pre_segment_id: u32,
   pre_pc: u32,
   pre_image_id: [u8; 32],
   pre_hash_root: [u8; 32],
   block_path: String,
   pre_input: Vec<Vec<u8>>,
   pre_input_ptr: usize,
   pre_public_values: Vec<u8>,
   pre_public_values_ptr: usize,
}

  • State: Maintains the MIPS architectural state within the emulation system, including registers, memory, heap pointers, etc.
pub struct State {
   pub memory: Box<Memory>,

   /// the 32 general purpose registers of MIPS.
   pub registers: [u32; 32],
   /// the pc register stores the current execution instruction address.
   pub pc: u32,
   /// the next pc stores the next execution instruction address.
   next_pc: u32,
   /// the hi register stores the multiplier/divider result high(remainder) part.
   hi: u32,
   /// the low register stores the multiplier/divider result low(quotient) part.
   lo: u32,

   /// heap handles the mmap syscall.
   heap: u32,

   /// brk handles the brk syscall
   brk: u32,

   /// tlb addr
   local_user: u32,

   /// step tracks the total step has been executed.
   pub step: u64,
   pub total_step: u64,

   /// cycle tracks the total cycle has been executed.
   pub cycle: u64,
   pub total_cycle: u64,

   /// A stream of input values (global to the entire program).
   pub input_stream: Vec<Vec<u8>>,

   /// A ptr to the current position in the input stream incremented by HINT_READ opcode.
   pub input_stream_ptr: usize,

   /// A stream of public values from the program (global to entire program).
   pub public_values_stream: Vec<u8>,

   /// A ptr to the current position in the public values stream, incremented when reading from public_values_stream.
   pub public_values_stream_ptr: usize,

   pub exited: bool,
   pub exit_code: u8,
   dump_info: bool,
}

  • Memory: Maintains the system’s current memory snapshot and access tracking information for the current segment.
pub struct Memory {
   /// page index -> cached page
   pages: BTreeMap<u32, Rc<RefCell<CachedPage>>>,

   // two caches: we often read instructions from one page, and do memory things with another page.
   // this prevents map lookups each instruction
   last_page_keys: [Option<u32>; 2],
   last_page: [Option<Rc<RefCell<CachedPage>>>; 2],

   // for implement std::io::Read trait
   addr: u32,
   count: u32,

   rtrace: BTreeMap<u32, [u8; PAGE_SIZE]>,
   wtrace: [BTreeMap<u32, Rc<RefCell<CachedPage>>>; 3],
}

  • Segment: Maintains segment-related information.
pub struct Segment {
   pub mem_image: BTreeMap<u32, u32>,  // initial memory image of segment
   pub pc: u32,                        // initial pc
   pub segment_id: u32,                // segment id
   pub pre_image_id: [u8; 32],         // image id of segment pre state 
   pub pre_hash_root: [u8; 32],       // hash root of segment pre memory image      
   pub image_id: [u8; 32],            // image id of segment post state 
   pub page_hash_root: [u8; 32],      // hash root of segment post memory image
   pub end_pc: u32,                   // end pc
   pub step: u64,                     // step number of cur segment
   pub input_stream: Vec<Vec<u8>>,
   pub input_stream_ptr: usize,
   pub public_values_stream: Vec<u8>,
   pub public_values_stream_ptr: usize,
}

4. Instruction Emulation

The emulator executes instructions using an instruction parsing approach: it first fetches the instruction, then decodes and executes the corresponding function based on the instruction encoding, updating the system state/memory accordingly.

The main function, mips_step(), can be found in state.rs.

The supported ISA can be found in mips_isa.

5. Memory Emulation and Image ID Computation

Memory is organized as a hash tree, with pages (4KB) as nodes. The hash page starts at address 0x8000000, the program address space ranges from 0 to 0x8000000, and the root hash page address is 0x81020000, as illustrated below.

memory.png

The computation process for page hashes and the snapshot ID is as follows:

  1. Organize memory (Mem) into 4KB pages, compute the corresponding hash, and store it in the respective hash page.
  2. Recursively compute the hash values of the hash pages until only a single hash page remains, which serves as the root hash page.
  3. Write register information at offset 0x400 in the hash page, compute the root hash page’s hash value, and await the root hash result.
  4. Concatenate the root hash value with the corresponding PC value, compute the final hash value, and obtain the snapshot ID.

To minimize the frequency of page hash updates, the emulator tracks modified memory pages during instruction execution. Consequently, during snapshot ID computation, only the hash pages corresponding to the modified memory pages need to be recursively updated, leading to the computation of the root hash and the retrieval of the snapshot ID.

Subscribe to the ZKM Blog to stay updated with the latest research by ZKM. If you have any questions about the article, you can contact the ZKM team on discord: discord.com/zkm