The emulator primarily implements the simulation of the MIPS instruction set and provides interfaces for running MIPS ELF programs and generating segments. All code can be found in the zkm/emulator directory.
The execution process of a MIPS program is as follows (the left side represents the conventional execution process, while the right side represents the segmented execution process):
The main steps are:
The main data structures include:
pub struct InstrumentedState {
/// state stores the state of the MIPS emulator
pub state: Box<State>,
/// writer for stdout
stdout_writer: Box<dyn Write>,
/// writer for stderr
stderr_writer: Box<dyn Write>,
pub pre_segment_id: u32,
pre_pc: u32,
pre_image_id: [u8; 32],
pre_hash_root: [u8; 32],
block_path: String,
pre_input: Vec<Vec<u8>>,
pre_input_ptr: usize,
pre_public_values: Vec<u8>,
pre_public_values_ptr: usize,
}
pub struct State {
pub memory: Box<Memory>,
/// the 32 general purpose registers of MIPS.
pub registers: [u32; 32],
/// the pc register stores the current execution instruction address.
pub pc: u32,
/// the next pc stores the next execution instruction address.
next_pc: u32,
/// the hi register stores the multiplier/divider result high(remainder) part.
hi: u32,
/// the low register stores the multiplier/divider result low(quotient) part.
lo: u32,
/// heap handles the mmap syscall.
heap: u32,
/// brk handles the brk syscall
brk: u32,
/// tlb addr
local_user: u32,
/// step tracks the total step has been executed.
pub step: u64,
pub total_step: u64,
/// cycle tracks the total cycle has been executed.
pub cycle: u64,
pub total_cycle: u64,
/// A stream of input values (global to the entire program).
pub input_stream: Vec<Vec<u8>>,
/// A ptr to the current position in the input stream incremented by HINT_READ opcode.
pub input_stream_ptr: usize,
/// A stream of public values from the program (global to entire program).
pub public_values_stream: Vec<u8>,
/// A ptr to the current position in the public values stream, incremented when reading from public_values_stream.
pub public_values_stream_ptr: usize,
pub exited: bool,
pub exit_code: u8,
dump_info: bool,
}
pub struct Memory {
/// page index -> cached page
pages: BTreeMap<u32, Rc<RefCell<CachedPage>>>,
// two caches: we often read instructions from one page, and do memory things with another page.
// this prevents map lookups each instruction
last_page_keys: [Option<u32>; 2],
last_page: [Option<Rc<RefCell<CachedPage>>>; 2],
// for implement std::io::Read trait
addr: u32,
count: u32,
rtrace: BTreeMap<u32, [u8; PAGE_SIZE]>,
wtrace: [BTreeMap<u32, Rc<RefCell<CachedPage>>>; 3],
}
pub struct Segment {
pub mem_image: BTreeMap<u32, u32>, // initial memory image of segment
pub pc: u32, // initial pc
pub segment_id: u32, // segment id
pub pre_image_id: [u8; 32], // image id of segment pre state
pub pre_hash_root: [u8; 32], // hash root of segment pre memory image
pub image_id: [u8; 32], // image id of segment post state
pub page_hash_root: [u8; 32], // hash root of segment post memory image
pub end_pc: u32, // end pc
pub step: u64, // step number of cur segment
pub input_stream: Vec<Vec<u8>>,
pub input_stream_ptr: usize,
pub public_values_stream: Vec<u8>,
pub public_values_stream_ptr: usize,
}
The emulator executes instructions using an instruction parsing approach: it first fetches the instruction, then decodes and executes the corresponding function based on the instruction encoding, updating the system state/memory accordingly.
The main function, mips_step(), can be found in state.rs.
The supported ISA can be found in mips_isa.
Memory is organized as a hash tree, with pages (4KB) as nodes. The hash page starts at address 0x8000000, the program address space ranges from 0 to 0x8000000, and the root hash page address is 0x81020000, as illustrated below.
The computation process for page hashes and the snapshot ID is as follows:
To minimize the frequency of page hash updates, the emulator tracks modified memory pages during instruction execution. Consequently, during snapshot ID computation, only the hash pages corresponding to the modified memory pages need to be recursively updated, leading to the computation of the root hash and the retrieval of the snapshot ID.
Subscribe to the ZKM Blog to stay updated with the latest research by ZKM. If you have any questions about the article, you can contact the ZKM team on discord: discord.com/zkm
The emulator primarily implements the simulation of the MIPS instruction set and provides interfaces for running MIPS ELF programs and generating segments. All code can be found in the zkm/emulator directory.
The execution process of a MIPS program is as follows (the left side represents the conventional execution process, while the right side represents the segmented execution process):
The main steps are:
The main data structures include:
pub struct InstrumentedState {
/// state stores the state of the MIPS emulator
pub state: Box<State>,
/// writer for stdout
stdout_writer: Box<dyn Write>,
/// writer for stderr
stderr_writer: Box<dyn Write>,
pub pre_segment_id: u32,
pre_pc: u32,
pre_image_id: [u8; 32],
pre_hash_root: [u8; 32],
block_path: String,
pre_input: Vec<Vec<u8>>,
pre_input_ptr: usize,
pre_public_values: Vec<u8>,
pre_public_values_ptr: usize,
}
pub struct State {
pub memory: Box<Memory>,
/// the 32 general purpose registers of MIPS.
pub registers: [u32; 32],
/// the pc register stores the current execution instruction address.
pub pc: u32,
/// the next pc stores the next execution instruction address.
next_pc: u32,
/// the hi register stores the multiplier/divider result high(remainder) part.
hi: u32,
/// the low register stores the multiplier/divider result low(quotient) part.
lo: u32,
/// heap handles the mmap syscall.
heap: u32,
/// brk handles the brk syscall
brk: u32,
/// tlb addr
local_user: u32,
/// step tracks the total step has been executed.
pub step: u64,
pub total_step: u64,
/// cycle tracks the total cycle has been executed.
pub cycle: u64,
pub total_cycle: u64,
/// A stream of input values (global to the entire program).
pub input_stream: Vec<Vec<u8>>,
/// A ptr to the current position in the input stream incremented by HINT_READ opcode.
pub input_stream_ptr: usize,
/// A stream of public values from the program (global to entire program).
pub public_values_stream: Vec<u8>,
/// A ptr to the current position in the public values stream, incremented when reading from public_values_stream.
pub public_values_stream_ptr: usize,
pub exited: bool,
pub exit_code: u8,
dump_info: bool,
}
pub struct Memory {
/// page index -> cached page
pages: BTreeMap<u32, Rc<RefCell<CachedPage>>>,
// two caches: we often read instructions from one page, and do memory things with another page.
// this prevents map lookups each instruction
last_page_keys: [Option<u32>; 2],
last_page: [Option<Rc<RefCell<CachedPage>>>; 2],
// for implement std::io::Read trait
addr: u32,
count: u32,
rtrace: BTreeMap<u32, [u8; PAGE_SIZE]>,
wtrace: [BTreeMap<u32, Rc<RefCell<CachedPage>>>; 3],
}
pub struct Segment {
pub mem_image: BTreeMap<u32, u32>, // initial memory image of segment
pub pc: u32, // initial pc
pub segment_id: u32, // segment id
pub pre_image_id: [u8; 32], // image id of segment pre state
pub pre_hash_root: [u8; 32], // hash root of segment pre memory image
pub image_id: [u8; 32], // image id of segment post state
pub page_hash_root: [u8; 32], // hash root of segment post memory image
pub end_pc: u32, // end pc
pub step: u64, // step number of cur segment
pub input_stream: Vec<Vec<u8>>,
pub input_stream_ptr: usize,
pub public_values_stream: Vec<u8>,
pub public_values_stream_ptr: usize,
}
The emulator executes instructions using an instruction parsing approach: it first fetches the instruction, then decodes and executes the corresponding function based on the instruction encoding, updating the system state/memory accordingly.
The main function, mips_step(), can be found in state.rs.
The supported ISA can be found in mips_isa.
Memory is organized as a hash tree, with pages (4KB) as nodes. The hash page starts at address 0x8000000, the program address space ranges from 0 to 0x8000000, and the root hash page address is 0x81020000, as illustrated below.
The computation process for page hashes and the snapshot ID is as follows:
To minimize the frequency of page hash updates, the emulator tracks modified memory pages during instruction execution. Consequently, during snapshot ID computation, only the hash pages corresponding to the modified memory pages need to be recursively updated, leading to the computation of the root hash and the retrieval of the snapshot ID.
Subscribe to the ZKM Blog to stay updated with the latest research by ZKM. If you have any questions about the article, you can contact the ZKM team on discord: discord.com/zkm