risc386: Restricted Instruction Set i386 Simulator
(C) 2013, Andreas Abel, Ludwig-Maximilians-University Munich
The main purpose of this simulator is to test i386 code
generated by a compiler before register allocation. Therefore,
it supports temporaries, an potentially infinite amount of extra registers
t2, ... (Of course, it can also be used to execute symbolic
assembler after register allocation.)
The supported instruction set is very restricted but sufficient to write a compiler for MiniJava [Andrew Appel, Modern Compiler Implementation in Java].
You need a recent version of the Haskell Platform.
Install using Haskell's packet manager cabal
or using stack
Running the simulator
Format of the input file
The input file must be symbolic assembler in Intel format.
Here is a small example. Note that in addition to actual
registers, it uses register variables, which are named
Lmain: push %ebp mov %ebp, %esp L3: push 0 call L_halloc add %esp, 4 mov t1001, %eax push 10 push t1001 call LFac$ComputeFac add %esp, 8 mov t1002, %eax push t1002 call L_println_int add %esp, 4 L4: leave ret .type LFac$ComputeFac, @function LFac$ComputeFac: push %ebp mov %ebp, %esp L5: cmp DWORD PTR [%ebp+12], 1 jl L0 jmp L1 L2: mov %eax, t8 jmp L6 L1: mov t1004, DWORD PTR [%ebp+12] mov t1005, -1 add t1005, DWORD PTR [%ebp+12] push t1005 push DWORD PTR [%ebp+8] call LFac$ComputeFac add %esp, 8 mov t1003, %eax mov %eax, t1004 imul t1003 mov t8, %eax jmp L2 L0: mov t8, 1 jmp L2 L6: leave ret
(If you want to be sure, read the .x file, the lexer specification.)
White space is ignored (except as separator for alphanumeric tokens).
Lines beginning with a dot '.' are skipped. These lines are pragmas for the symbolic assembler, which risc386 ignores.
Lines beginning with a hash-symbol followed by a spac'# ' are comments, which are ignored as well
Valid tokens are
[ ] : , . + - * dword ptr DWORD PT mov lea MOV LEA add sub imul ADD SUB IMUL idiv inc dec neg IDIV INC DEC NEG shl shr sal sar SHL SHR SAL SAR and or xor AND OR XOR not NOT cmp CMP je jne jl jle jge JE JNE JL JLE JGE jmp call ret JMP CALL RET push pop enter leave PUSH POP ENTER LEAVE nop NO eax ebx ecx edx esi edi ebp esp %eax %ebx %ecx %edx %esi %edi %ebp %esp t<number> (denoting a temporary register) <ident> (given by reg.ex. [a-zA-Z_][a-zA-Z0-9_'$]*) Identifiers are used for labels
(If you want to know all of them, read the .y file)
The input file must be a sequence of procedures. There must be one procedure called
Lmain. This one is taken as the entry point.
Each procedure starts with a label and ends with a return instruction.
The body of each procedure is a list of i386 assembler instructions in Intel syntax. The supported structions are listed above. Each instruction may be preceded by a label. Conditional and unconditional jumps are only allowed to a label, and only to one defined in the same procedure. Cross-procedure jumps or jumps to a calculated address are not supported. Calls and jumps are only allowed to a label. risc386 assumes the cdecl calling convention.
Restrictions for individual instructions:
RETdoes accept arguments
ENTERis only supported in the form
ENTER <number>, 0
risc386 knows a number of predefined procedures. They expect their arguments on the stack (cdecl calling convention) and return the result in eax.
- 1 Argument: number of bytes to allocate on the heap
- Result : pointer to first allocated byte.
- 1 Argument: signed 32bit integer value to print
- Result : nothing
- 1 Argument: byte to print
- Result : nothing
- no Arguments
- Result : byte read from
- 1 Argument: error code
- Result : nothing, does not return, stops execution
risc386 supports 4 different types, all of size 32 bits:
Heap addresses consist of a base address which was obtained
L_halloc plus an offset. The offset must be a multiple of 4.
Memory access to uninitialized memory locations is treated as illegal.
- Stack addresses.
%esp and %ebp may only be loaded with stack addresses.
- Return addresses.
Get pushed onto the stack by a CALL.
RET checks that a return address lies on top of the stack before returning. The content of the return address is ignored, RET jumps back to the procedure where the matching CALL was issued.
CMP is the only command that sets flags.
CALL saves all temporary registers, RET restores them.