Files
rasp/vm.md
2020-03-09 18:42:31 -04:00

13 KiB

VM

This is an outline of the VM that drives this language.

Primitives

  • Numbers are little endian (LE) at the byte level.
  • Addresses point to single bytes.
  • Signed numbers use two's complement.
Type Size (bits)
Address 64
Word 64
Halfword 32
Byte 8

Registers

CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.

  • IP - Instruction pointer
  • SP - Stack pointer
  • FP - Frame pointer
  • FLAGS - CPU flags
  • STATUS - Generic status code
  • NIL - Always zero for reading and will never change after writing.
  • IVT - Interrupt vector table pointer
  • R0-R31
  • (25 reserved registers)

The following registers are caller-save (i.e., their value may change after a function call):

  • FLAGS
  • STATUS
  • IVT

The rest are callee-save.

CPU Flags

CPU flags are addressed by bit index, going from right to left.

  • 00 - Halt flag
  • 01 - Compare flag
  • 02 - Enable interrupts

Flag ideas

  • "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired behavior - for debugging
    • Overwriting a register without its value being used
    • Mixing arithmetic with bit twiddling on the same target

Instructions

All instructions have 16-bit opcodes. There are three types of instructions:

  • Those whose operations require a source and a destination.
  • Those whose operations require two sources
    • The sources of these instructions is implied by the instruction itself; e.g. the CMPEQ instruction implicitly sets a bit in the FLAGS register.
  • Those whose operations require a source, but no destination.
  • Those whose operations require a destination, but no source.
    • There aren't any of these instructions yet
  • Those whose operations require neither a source nor a destination.

Destinations may be:

  • A 64-bit address pointing at a 64-bit or 8-bit value
  • A 6-bit register

Sources may be one of:

  • A 64-bit address pointing at a 64-bit or 8-bit value
  • A 6-bit register
  • A 64-bit immediate value

Counting all source and destination value sizes as their own configuration, there are:

  • 3 possible destination types
  • 4 possible source types

Instructions have different layouts depending on whether its operation takes a source and/or destination. For example, the ADD instruction takes a source and a destination, the JMP instruction takes a source, and the NOP instruction takes neither a source nor a destination.

For instructions that take neither a source nor a destination, they are simply 16 bits long and that's that. All other instructions are followed by a byte determining its source and/or destination.

An instruction that has a source and destination looks like this:

| XXXXXXXX | XXXXXXXX | DDDDSSSS | ...source and destination |

An instruction that has either a source or a destination (but not both) looks like this:

| XXXXXXXX | XXXXXXXX | YYYY0000 | ...source or destination |

An instruction that has neither a source nor a destination looks like this:

| XXXXXXXX | XXXXXXXX |

Source/destination flags

Bits Source/destination
0b0000 Address (64 bit value)
0b0001 Address (32 bit value)
0b0010 Address (16 bit value)
0b0011 Address (8 bit value)
0b0100 6-bit register
0b0101 Immediate (64 bits, source only)
0b0110 Immediate (32 bits, source only)
0b0111 Immediate (16 bits, source only)
0b1000 Immediate (8 bits, source only)

Arithmetic

Arithmetic instructions store their result in the first register specified. Overflow is handled by wrapping around to 0.

  • Add
    • Opcode: 0x1000
    • Params: Destination, source
  • Sub
    • Opcode: 0x1001
    • Params: Destination, source
  • Mul
    • Opcode: 0x1002
    • Params: Destination, source
  • Div
    • Opcode: 0x1003
    • Params: Destination, source
  • Mod
    • Opcode: 0x1004
    • Params: Destination, source
  • And
    • Opcode: 0x1005
    • Params: Destination, source
  • Or
    • Opcode: 0x1006
    • Params: Destination, source
  • Xor
    • Opcode: 0x1007
    • Params: Destination, source
  • Shl
    • Opcode: 0x1008
    • Params: Destination, source
  • Shr
    • Opcode: 0x1009
    • Params: Destination, source
  • INeg
    • Opcode: 0x100a
    • Params: Destination, source
  • Inv
    • Opcode: 0x100b
    • Params: Destination, source
  • Not
    • Opcode: 0x100c
    • Params: Destination, source

TODO

  • Add signed instructions (iadd, imul, etc)
  • Sign-extending SHR
  • Overflow flag?

Control flow

  • CmpEq
    • Opcode: 0x2000
    • Params: Source, source
  • CmpLt
    • Opcode: 0x2001
    • Params: Source, source
  • Jmp
    • Opcode: 0x2002
    • Params: Source
  • Jz
    • Opcode: 0x2003
    • Params: Source
  • Jnz
    • Opcode: 0x2004
    • Params: Source

Functions

  • Call
    • Opcode: 0x3000
    • Params: Source
    • When this instruction is executed, these actions occur:
      • Push the current stack frame pointer
      • Push the IP of the next instruction
      • Update the IP (i.e., jump) to the value at the given source.
      • Update the frame pointer to the current stack pointer - 16
  • Ret
    • Opcode: 0x3001
    • When this instruction is executed, these actions occur:
      • Update the stack pointer to the current frame pointer + 16.
      • Pop the IP of the next instruction.
      • Pop the old stack frame.
      • Restore the last three values in an undefined order
  • Push
    • Opcode: 0x3002
    • Params: Source
    • When this instruction is executed, these actions occur:
      • Set the value in memory at the current stack pointer to the source value.
      • Increment the stack pointer by the size of value at the source.
  • Pop
    • Opcode: 0x3003
    • Params: Dest
    • When this instruction is executed, these actions occur:
      • Decrement the stack pointer by the size of value at the destination.
      • Copy the value at the stack pointer into the destination.
  • Int
    • Opcode: 0x3004
    • Params: Source, Source
    • When this instruction is executed, these actions occur:
      • Push the current stack frame pointer
      • Push the IP of the next instruction to be called
      • Push the FLAGS register
      • Push the STATUS register
      • Push the R0-R31 registers
      • Update the IP (i.e., jump) to the address of the given interrupt vector in the IVT
      • Update the R0 register to the value in the first parameter
      • Update the R1 register to the value in the second parameter
      • Update the frame pointer to the current stack pointer - 288
  • IRet
    • Opcode: 0x3005
    • When this instruction is executed, these actions occur:
      • Update the stack pointer to the current frame pointer + 288
      • Pop the old R1 value
      • Pop the old R0 value
      • Pop the old STATUS value
      • Pop the old FLAGS value
      • Pop the IP of the next instruction
      • Pop the old stack frame
      • Restore the last 6 values in an undefined order

Data movement

  • Mov
    • Opcode: 0x4000
    • Params: Source, Dest

Miscellaneous

  • Halt
    • Opcode: 0xF000
  • Nop
    • Opcode: 0xF001
  • Dump
    • Opcode: 0xF002

Interrupts

Interrupts are signaled explicitly from software or from hardware signaling the CPU. When an interrupt signal is set, the CPU will finish whatever instruction it is executing, and then begin handling the interrupt whose signal was set. Software interrupts may be invoked using the int instruction, supplying the index of the interrupt to invoke. Hardware interrupts are invoked directly by a hardware event, e.g. a keypress. Hardware and software interrupts are treated equally in the CPU, and as such, they are all maskable.

An interrupt may be masked in two ways: either through its entry in the IVT, or through the "enable interrupts" CPU flag. If the "enabled" bit in the IVT is not set, that interrupt will not be handled when it is invoked. If the "enable interrupts" CPU flag is not set, no interrupts will be handled.

Interrupt vector table

Interrupts are defined by the IVT register. The address stored in the IVT register must be a multiple of 64. The IVT always has 512 entries, with 8 bytes for each entry. Thus, the entire table is 512 * 8 = 4096 bytes, or one page.

Interrupt table entries

Interrupt table entries make up the interrupt vector table, each entry being 64 bits (8 bytes) long.

  • 1 bit - Enabled
  • 4 bits - Reserved, set to 0
  • 59 bits - Interrupt address, multiplied by 64 for the start address

Interrupt handling

After an interrupt is signaled, the CPU looks up the index of the interrupt in the IVT, calculates its address, sets up the stack for the interrupt handler, and jumps to the interrupt handler's address.

The interrupt stack is structured similarly to a normal call stack, but since interrupts may be invoked at any time, it saves additional state. Interrupt handlers have two explicit arguments: the interrupt index itself, and an auxiliary 64-bit value or pointer specific to that interrupt. The index is stored in the R0 register, and the auxiliary value is stored in the R1 register. These registers, along with the FP, IP, FLAGS, and STATUS registers are saved on the stack before calling an interrupt handler.

Before an interrupt handler is called, these actions occur:

  • Push the current stack frame pointer
  • Push the IP of the next instruction to be called
  • Push the FLAGS register
  • Push the STATUS register
  • Push the R0 register
  • Push the R1 register

Interrupt handlers must be exited using the iret instruction. When an interrupt call is exited, the above actions occur in reverse:

  • Update the stack pointer to the current frame pointer + 48
  • Pop the old R1 value
  • Pop the old R0 value
  • Pop the old STATUS value
  • Pop the old FLAGS value
  • Pop the IP of the next instruction
  • Pop the old stack frame
  • Restore the last 6 values in an undefined order

Exceptions

The first 256 interrupt vectors are reserved for CPU and hardware-sourced events - these are known as exceptions. Exceptions may occur for a number of reasons:

  • Illegal operation attempted, e.g. divide by zero or accessing protected memory
  • Illegal operation attempted while handling an interrupt (double fault)
  • A hardware event occurred, e.g. a timer tick

The following list defines all exceptions that the CPU may invoke. All other vectors in 0-255 not defined in this table are reserved and may be used in the future.

  • Divide by zero
    • Interrupt vector: 0
    • Auxiliary: N/A
    • Invoked upon a divide-by-zero
  • Invalid opcode
    • Interrupt vector: 1
    • Auxiliary: N/A
    • Attempted to invoke an illegal opcode
  • Illegal memory address
    • Interrupt vector: 2
    • Auxiliary: Memory address causing the interrupt
    • Attempted to access a memory address in an illegal way - either it's out of bounds or is protected in some way.
  • Hardware event
    • Interrupt vector: 3
    • Auxiliary: Pointer to the hardware event structure.
    • A hardware device has an event that needs attention.
  • Interrupt vector 4-255: Reserved for future use

Binary object format

The binary object format is composed of a header followed by sections that make up the content of the object.

Header

The header is composed of:

  • 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11).
  • 32 bits - Version of the file
  • 32 bits - The number of sections in the file
  • section descriptions detailed below

Sections

The rest of the object is a list of sections. A section's layout is a section header, followed by the section contents.

Section header

  • 8 bits - Section kind
    • 0x00 - Data
    • 0xFF - Meta
  • 64 bits - Length of the section

Data section

The data section contains static data that is initialized to some known value.

  • 16 bits - length of section name
  • N bits - section name
  • 64 bits - section load start - where in memory the content of this section begins
  • 64 bits - section length - how long the memory content is

Meta section

The meta section holds a table of metadata about the binary in a key-value format of strings mapping to other strings. All strings are UTF-8 encoded.

  • 64 bits - the number of key-value entries

The remaining length of the section are the key-value pairs.

The layout for a key-value pair is the key, followed immediately by the value. The key is a string, and the value is a 64-bit value. A key starts with the length of the string, followed by the key string itself. A value is just the 8 bytes of the number.

The meta section should be used to place data that's readable by the VM, but is not used by the executing program. Data in the meta section is not copied to the program memory.

A VM must provide support for the following meta-values:

  • entry - a 64-bit address for where the VM should begin executing code.

General TODO

  • Memory permissions
  • MMIO regions
  • Paging
  • Determine how address sizes are determined
    • source size <= dest size - zero extend source and copy
      • mov %r0, (label)u32
    • source size > dest size - truncate to dest size
      • mov (label)u32, %r0
    • source size with unknown dest size - use dest size == source size
      • mov %r0, (label)
    • unknown source size with dest size - use dest size == source size
      • mov (label), %r0
    • unknown source size with unknown dest size - 64 bits
      • mov (label), (%r0)