5.51 KB

risc386: Restricted Instruction Set i386 Simulator

(C) 2013, Andreas Abel, Ludwig-Maximilians-University Munich

The main purpose of this simulator is to test i386 code generated by a compiler before register allocation. Therefore, it supports temporaries, an potentially infinite amount of extra registers t0, t1, t2, ... (Of course, it can also be used to execute symbolic assembler after register allocation.)

The supported instruction set is very restricted but sufficient to write a compiler for MiniJava [Andrew Appel, Modern Compiler Implementation in Java].

System requirements

You need a recent version of the Haskell Platform.


Install using Haskell's packet manager cabal

    cabal install

or using stack

    stack install

Running the simulator

  risc386 input-file.s

Format of the input file

The input file must be symbolic assembler in Intel format.

Here is a small example. Note that in addition to actual registers, it uses register variables, which are named t0, t1, t2, ...

        push    %ebp
        mov     %ebp, %esp
L3:     push    0
        call    L_halloc
        add     %esp, 4
        mov     t1001, %eax
        push    10
        push    t1001
        call    LFac$ComputeFac
        add     %esp, 8
        mov     t1002, %eax
        push    t1002
        call    L_println_int
        add     %esp, 4
L4:     leave
        .type LFac$ComputeFac, @function
        push    %ebp
        mov     %ebp, %esp
L5:     cmp     DWORD PTR [%ebp+12], 1
        jl      L0
        jmp     L1
L2:     mov     %eax, t8
        jmp     L6
L1:     mov     t1004, DWORD PTR [%ebp+12]
        mov     t1005, -1
        add     t1005, DWORD PTR [%ebp+12]
        push    t1005
        push    DWORD PTR [%ebp+8]
        call    LFac$ComputeFac
        add     %esp, 8
        mov     t1003, %eax
        mov     %eax, t1004
        imul    t1003
        mov     t8, %eax
        jmp     L2
L0:     mov     t8, 1
        jmp     L2
L6:     leave

Lexing rules:

(If you want to be sure, read the .x file, the lexer specification.)

  • White space is ignored (except as separator for alphanumeric tokens).

  • Lines beginning with a dot '.' are skipped. These lines are pragmas for the symbolic assembler, which risc386 ignores.

  • Lines beginning with a hash-symbol followed by a spac'# ' are comments, which are ignored as well

  • Valid tokens are

    [ ] : , . + - *
    dword ptr                    DWORD PT
    mov lea                      MOV LEA
    add sub imul                 ADD SUB IMUL
    idiv inc dec neg             IDIV INC DEC NEG
    shl shr sal sar              SHL SHR SAL SAR
    and or xor                   AND OR XOR
    not                          NOT
    cmp                          CMP
    je jne jl jle jge            JE JNE JL JLE JGE
    jmp call ret                 JMP CALL RET
    push pop enter leave         PUSH POP ENTER LEAVE
    nop                          NO
    eax ebx ecx edx esi edi ebp esp
    %eax %ebx %ecx %edx %esi %edi %ebp %esp
    t<number>   (denoting a temporary register)
    <ident>     (given by reg.ex. [a-zA-Z_][a-zA-Z0-9_'$]*)
                Identifiers are used for labels

Parsing rules:

(If you want to know all of them, read the .y file)

  1. The input file must be a sequence of procedures. There must be one procedure called Lmain. This one is taken as the entry point.

  2. Each procedure starts with a label and ends with a return instruction.

  3. The body of each procedure is a list of i386 assembler instructions in Intel syntax. The supported structions are listed above. Each instruction may be preceded by a label. Conditional and unconditional jumps are only allowed to a label, and only to one defined in the same procedure. Cross-procedure jumps or jumps to a calculated address are not supported. Calls and jumps are only allowed to a label. risc386 assumes the cdecl calling convention.

  4. Restrictions for individual instructions:

    • RET does accept arguments
    • ENTER is only supported in the form ENTER <number>, 0


risc386 knows a number of predefined procedures. They expect their arguments on the stack (cdecl calling convention) and return the result in eax.

  • L_halloc
    • 1 Argument: number of bytes to allocate on the heap
    • Result : pointer to first allocated byte.
  • L_println_int
    • 1 Argument: signed 32bit integer value to print
    • Result : nothing
  • L_write
    • 1 Argument: byte to print
    • Result : nothing
  • L_read
    • no Arguments
    • Result : byte read from stdin
  • L_raise
    • 1 Argument: error code
    • Result : nothing, does not return, stops execution

Execution specialties

risc386 supports 4 different types, all of size 32 bits:

  1. Signed integers.

  2. Heap addresses.

Heap addresses consist of a base address which was obtained by L_halloc plus an offset. The offset must be a multiple of 4.

Memory access to uninitialized memory locations is treated as illegal.

  1. Stack addresses.

%esp and %ebp may only be loaded with stack addresses.

  1. Return addresses.

Get pushed onto the stack by a CALL.

RET checks that a return address lies on top of the stack before returning. The content of the return address is ignored, RET jumps back to the procedure where the matching CALL was issued.

CMP is the only command that sets flags.

CALL saves all temporary registers, RET restores them.