Files
rasp/vm.md
2020-02-25 12:07:24 -05:00

7.3 KiB

VM

This is an outline of the VM that drives this language.

Primitives

  • Numbers are little endian (LE) at the byte level.
  • Addresses point to single bytes.
  • Signed numbers use two's complement.
Type Size (bits)
Address 64
Word 64
Halfword 32
Byte 8

Registers

CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.

  • IP - Instruction pointer
  • SP - Stack pointer
  • FP - Frame pointer
  • FLAGS - CPU flags
  • STATUS - Generic status code
  • NIL - Always zero for reading and will never change after writing.
  • R0-R31
  • (26 unused registers)

CPU Flags

CPU flags are addressed by bit index, going from right to left.

  • 00 - Halt flag
  • 01 - Compare flag

Flag ideas

  • "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired behavior - for debugging
    • Overwriting a register without its value being used
    • Mixing arithmetic with bit twiddling on the same target

Instructions

All instructions have 16-bit opcodes. There are three types of instructions:

  • Those whose operations require a source and a destination.
  • Those whose operations require two sources
    • The sources of these instructions is implied by the instruction itself; e.g. the CMPEQ instruction implicitly sets a bit in the FLAGS register.
  • Those whose operations require a source, but no destination.
  • Those whose operations require a destination, but no source.
    • There aren't any of these instructions yet
  • Those whose operations require neither a source nor a destination.

Destinations may be:

  • A 64-bit address pointing at a 64-bit or 8-bit value
  • A 6-bit register

Sources may be one of:

  • A 64-bit address pointing at a 64-bit or 8-bit value
  • A 6-bit register
  • A 64-bit immediate value

Counting all source and destination value sizes as their own configuration, there are:

  • 3 possible destination types
  • 4 possible source types

Instructions have different layouts depending on whether its operation takes a source and/or destination. For example, the ADD instruction takes a source and a destination, the JMP instruction takes a source, and the NOP instruction takes neither a source nor a destination.

For instructions that take neither a source nor a destination, they are simply 16 bits long and that's that. All other instructions are followed by a byte determining its source and/or destination.

An instruction that has a source and destination looks like this:

| XXXXXXXX | XXXXXXXX | DDDDSSSS | ...source and destination |

An instruction that has either a source or a destination (but not both) looks like this:

| XXXXXXXX | XXXXXXXX | YYYY0000 | ...source or destination |

An instruction that has neither a source nor a destination looks like this:

| XXXXXXXX | XXXXXXXX |

Source/destination flags

Bits Source/destination
0b0000 Address (64 bit value)
0b0001 Address (32 bit value)
0b0010 Address (16 bit value)
0b0011 Address (8 bit value)
0b0100 6-bit register
0b0101 Immediate (64 bits, source only)
0b0110 Immediate (32 bits, source only)
0b0111 Immediate (16 bits, source only)
0b1000 Immediate (8 bits, source only)

Arithmetic

Arithmetic instructions store their result in the first register specified. Overflow is handled by wrapping around to 0.

  • Add
    • Opcode: 0x0000
    • Params: Destination, source
  • Sub
    • Opcode: 0x0001
    • Params: Destination, source
  • Mul
    • Opcode: 0x0002
    • Params: Destination, source
  • Div
    • Opcode: 0x0003
    • Params: Destination, source
  • Mod
    • Opcode: 0x0004
    • Params: Destination, source
  • And
    • Opcode: 0x0005
    • Params: Destination, source
  • Or
    • Opcode: 0x0006
    • Params: Destination, source
  • Xor
    • Opcode: 0x0007
    • Params: Destination, source
  • Shl
    • Opcode: 0x0008
    • Params: Destination, source
  • Shr
    • Opcode: 0x0009
    • Params: Destination, source
  • INeg
    • Opcode: 0x000a
    • Params: Destination, source
  • Inv
    • Opcode: 0x000b
    • Params: Destination, source
  • Not
    • Opcode: 0x000c
    • Params: Destination, source

TODO

  • Add signed instructions (iadd, imul, etc)
  • Sign-extending SHR
  • Overflow flag?

Control flow

  • CmpEq
    • Opcode: 0x1000
    • Params: Source, source
  • CmpLt
    • Opcode: 0x1001
    • Params: Source, source
  • Jmp
    • Opcode: 0x1002
    • Params: Source
  • Jz
    • Opcode: 0x1003
    • Params: Source
  • Jnz
    • Opcode: 0x1004
    • Params: Source

Data movement

  • Mov
    • Opcode: 0x2000

Miscellaneous

  • Halt
    • Opcode: 0xF000
  • Nop
    • Opcode: 0xF001
  • Dump
    • Opcode: 0xF002

Other instructions TODO

  • Call
    • Takes address and number of bytes on the stack that are for args(?)
    • Updates SP, FP, IP, storing previous values starting at the new FP
  • Ret
    • Uses FP to determine previous SP, FP, and IP and restores them
  • Push
  • Pop

Binary object format

The binary object format is composed of a header followed by sections that make up the content of the object.

Header

The header is composed of:

  • 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11).
  • 32 bits - Version of the file
  • 32 bits - The number of sections in the file
  • section descriptions detailed below

Sections

The rest of the object is a list of sections. A section's layout is a section header, followed by the section contents.

Section header

  • 8 bits - Section kind
    • 0x00 - Data
    • 0xFF - Meta
  • 64 bits - Length of the section

Data section

The data section contains static data that is initialized to some known value.

  • 64 bits - section load start - where in memory the content of this section begins
  • 64 bits - section length - how long the memory content is

Meta section

The meta section holds a table of metadata about the binary in a key-value format of strings mapping to other strings. All strings are UTF-8 encoded.

  • 64 bits - the number of key-value entries

The remaining length of the section are the key-value pairs.

The layout for a key-value pair is the key, followed immediately by the value. The key is a string, and the value is a 64-bit value. A key starts with the length of the string, followed by the key string itself. A value is just the 8 bytes of the number.

The meta section should be used to place data that's readable by the VM, but is not used by the executing program. Data in the meta section is not copied to the program memory.

A VM must provide support for the following meta-values:

  • entry - a 64-bit address for where the VM should begin executing code.

General TODO

  • Interrupts
  • MMIO regions
  • Paging?
  • Determine how address sizes are determined
    • source size <= dest size - zero extend source and copy
      • mov %r0, (label)u32
    • source size > dest size - truncate to dest size
      • mov (label)u32, %r0
    • source size with unknown dest size - use dest size == source size
      • mov %r0, (label)
    • unknown source size with dest size - use dest size == source size
      • mov (label), %r0
    • unknown source size with unknown dest size - 64 bits
      • mov (label), (%r0)