Files
rasp/vm.md
Alek Ratzloff e198da5825 Finish up parser and assembler with more-or-less complete syntax
Major changes inlude:

* Bit the bullet and now instructions have their length hard-coded
* Move from_utf8 object parsing to be done by their objects (instead of
  a Parser god object)
* A list of AST sections are assembled into an Object using the new
  vm::obj::assemble module.
* Changed the object layout some in the spec, and adjusted code to match
  this.

Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
2020-02-09 13:04:56 -05:00

14 KiB

VM

This is an outline of the VM that drives this language.

Primitives

  • Numbers may be big endian (BE) or little endian (LE) at the byte level. This guide will use LE.
  • Addresses point to single bytes.
  • Signed numbers use two's complement.
Type Size (bits)
Address 64
Word 64
Halfword 32
Byte 8

Registers

CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.

  • IP - Instruction pointer
  • SP - Stack pointer
  • FP - Frame pointer
  • FLAGS - CPU flags
  • NULL - Always zero for reading and will never change after writing.
  • (8 unused registers)
  • STATUS - Generic status code
  • R0-R49

CPU Flags

CPU flags are addressed by bit index, going from right to left.

  • 00 - Halt flag
  • 01 - Compare flag

Flag ideas

  • "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired behavior - for debugging
    • Overwriting a register without its value being used
    • Mixing arithmetic with bit twiddling on the same target

Register ideas

* Other possible names: Z, NIL

Instructions

Instructions attempt to be as small as possible while conforming to 8-bit, 16-bit, 32-bit, or 64-bit alignment. All instructions have 16-bit opcodes.

Arithmetic

Arithmetic instructions store their result in the first register specified. Overflow is handled by wrapping around to 0.

  • Add
    • Opcode: 0x0000
    • Params: REG1, REG2
    • REG1 = REG1 + REG2
    • Unsigned addition
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000000000 | ...... | ...... | XXXX | +-------------------------------------------+
  • Mul
    • Opcode: 0x0001
    • Params: REG1, REG2
    • REG1 = REG1 * REG2
    • Unsigned multiplication
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000000001 | ...... | ...... | XXXX | +-------------------------------------------+
  • Div
    • Opcode: 0x0002
    • Params: REG1, REG2
    • REG1 = REG1 / REG2
    • Unsigned division
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000000010 | ...... | ...... | XXXX | +-------------------------------------------+
  • Mod
    • Opcode: 0x0003
    • Params: REG1, REG2
    • REG1 = REG1 % REG2 (exact semantics TBD)
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000000011 | ...... | ...... | XXXX | +-------------------------------------------+
  • INeg
    • Opcode: 0x0004
    • Params: REG1
    • REG1 = REG1 * -1
    • Signed negative
    32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0000000000000100 | ...... | XXXXXXXXXX | +----------------------------------------+
  • And
    • Opcode: 0x0005
    • Params: REG1, REG2
    • REG1 = REG1 & REG2
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000000101 | ...... | ...... | XXXX | +-------------------------------------------+
  • Or
    • Opcode: 0x0006
    • Params: REG1, REG2
    • REG1 = REG1 | REG2
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000000110 | ...... | ...... | XXXX | +-------------------------------------------+
  • Inv
    • Opcode: 0x0007
    • Params: REG1
    • REG1 = ~REG1
    32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0000000000000111 | ...... | XXXXXXXXXX | +----------------------------------------+
  • Not
    • Opcode: 0x0008
    • Params: REG1
    if REG1 == 0 { REG1 = 0; } else { REG1 = 1; }
    * Boolean NOT; equivalent of C's `!` unary operator
    * ```
    32                 16       10           0
     opcode              reg1    unused
    /                  /        /
    +----------------------------------------+
    | 0000000000001000 | ...... | XXXXXXXXXX |
    +----------------------------------------+
    
  • Xor
    • Opcode: 0x0009
    • Params: REG1, REG2
    • REG1 = REG1 ^ REG2
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000001001 | ...... | ...... | XXXX | +-------------------------------------------+
  • Shl
    • Opcode: 0x000A
    • Params: REG1, REG2
    • REG1 = REG1 << REG2
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000001010 | ...... | ...... | XXXX | +-------------------------------------------+
  • Shr
    • Opcode: 0x000B
    • Params: REG1, REG2
    • REG1 = REG1 >> REG2
    • Does not sign extend
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0000000000001011 | ...... | ...... | XXXX | +-------------------------------------------+
    
    

TODO

  • Add signed instructions (iadd, imul, etc)
  • Sign-extending SHR
  • Overflow flag?

Control flow

  • CmpEq

    • Opcode: 0x1000
    • Params: REG1, REG2
    •   if REG1 == REG2 {
            FLAGS[1] = 1;
        } else {
            FLAGS[1] = 0;
        }
      
    * Sets the COMPARE flag to 1 if REG1 == REG2
    * ```
    32                 16       10        4     0
     opcode              reg1     reg2     unused
    /                  /        /         /
    +-------------------------------------------+
    | 0001000000000000 | ...... | ...... | XXXX |
    +-------------------------------------------+
    
  • CmpLt

    • Opcode: 0x1001
    • Params: REG1, REG2
    •   if REG1 < REG2 {
            FLAGS[1] = 1;
        } else {
            FLAGS[1] = 0;
        }
      
    * Sets the COMPARE flag to 1 if REG1 < REG2
    * ```
    32                 16       10        4     0
     opcode              reg1     reg2     unused
    /                  /        /         /
    +-------------------------------------------+
    | 0001000000000001 | ...... | ...... | XXXX |
    +-------------------------------------------+
    
  • Jmp

    • Opcode: 0x1100
    • Params: REG1
    • IP = REG1;
    • Jumps to the address in REG1 unconditionally.

    32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0001000100000000 | ...... | XXXXXXXXXX | +----------------------------------------+

    
    
  • Jz

    • Opcode: 0x1101
    • Params: REG1
    •   if FLAGS[1] == 0 {
            IP = REG1;
        }
      
    * Jumps to the address in REG1 if COMPARE flag is 0.
    * ```
    32                 16       10           0
     opcode              reg1    unused
    /                  /        /
    +----------------------------------------+
    | 0001000100000001 | ...... | XXXXXXXXXX |
    +----------------------------------------+
    
  • Jnz

    • Opcode: 0x1002
    • Params: REG1
    •   if FLAGS[1] != 0 {
            IP = REG1;
        }
      
    * Jumps to the address in REG1 if COMPARE flag is 1.
    * ```
    32                 16       10           0
     opcode              reg1    unused
    /                  /        /
    +----------------------------------------+
    | 0001000100000002 | ...... | XXXXXXXXXX |
    +----------------------------------------+
    

Data movement

  • Load
    • Opcode: 0x2000
    • Params: REG1, REG2
    REG1 = MEM[REG2];
    * Sets REG1 to the value at the memory address in REG2.
    * ```
    32                 16       10        4     0
     opcode              reg1     reg2     unused
    /                  /        /         /
    +-------------------------------------------+
    | 0010000000000000 | ...... | ...... | XXXX |
    +-------------------------------------------+
    
  • RegCopy
    • Opcode: 0x2001
    • Params: REG1, REG2
    • REG1 = REG2
    • Copies the value in REG2 into REG1.
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0010000000000001 | REG1.. | REG2.. | XXXX | +-------------------------------------------+
  • StoreImm64
    • Opcode: 0x2100
    • Params: REG1, IMM_64
    • REG1 = IMM_64
    • Sets REG1 to the specified 64-bit number.
  • StoreImm32
    • Opcode: 0x2101
    • Params: REG1, IMM_32
    • REG1 = IMM_32
    • Sets REG1 to the specified 32-bit number.
    64 48 42 36 32 0 opcode reg1 reg2 unused / / / / immediate 32 bit value / / / / / +------------------------------------------------------------------------------+ | 0010000100000001 | REG1.. | REG2.. | XXXX | IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII | +------------------------------------------------------------------------------+
  • MemCopy
    • Opcode: 0x2200
    • Params: REG1, REG2
    • MEM[REG1] = MEM[REG2]
    • Copies the value at the memory address in REG2 to the memory address in REG1.
    32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0010001000000000 | REG1.. | REG2.. | XXXX | +-------------------------------------------+
  • Store
    • Opcode: 0x2201
    • Params: REG1, REG2
    MEM[REG2] = REG1;
    * Sets the value at the memory address in REG2 to the value in REG1.
    * ```
    32                 16       10        4     0
     opcode              reg1     reg2     unused
    /                  /        /         /
    +-------------------------------------------+
    | 0010001000000001 | REG1.. | REG2.. | XXXX |
    +-------------------------------------------+
    

Miscellaneous

  • Halt
    • Opcode: 0xF000
    • Params: (none)
    • FLAGS[0] = 1
    • Halts the machine
    16 opcode / +------------------+ | 1111000000000000 | +------------------+
  • Nop
    • Opcode: 0xF001
    • Params: (none)
    • Does nothing
    16 opcode / +------------------+ | 1111000000000001 | +------------------+
    
    

Other instructions TODO

  • Call
    • Takes address and number of bytes on the stack that are for args(?)
    • Updates SP, FP, IP, storing previous values starting at the new FP
  • Ret
    • Uses FP to determine previous SP, FP, and IP and restores them
  • Push
  • Pop
  • More immediate stores?
    • Idea: Store42 (or whatever number of bits) that maximizes the usage of a 64-bit instruction

Binary object format

The binary object format is composed of a header followed by sections that make up the content of the object.

Header

The header is composed of:

  • 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11).
  • 32 bits - Version of the file
  • 32 bits - The number of sections in the file
  • section descriptions detailed below

Sections

The rest of the object is a list of sections. A section's layout is a section header, followed by the section contents.

Section header

  • 8 bits - Section kind
    • 0x00 - Data
    • 0x10 - Code
    • 0xFF - Meta
  • 64 bits - Length of the section

Data section

The data section contains static data that is initialized to some known value.

  • 64 bits - section load start - where in memory the content of this section begins
  • 64 bits - section load end - where in memory the content of this section ends

Code section

The code section contains executable code.

  • 64 bits - section load start - where in memory the content of this section begins
  • 64 bits - section load end - where in memory the content of this section ends

The remaining length of the section is the code itself.

Meta section

The meta section holds a table of metadata about the binary in a key-value format of strings mapping to other strings. All strings are UTF-8 encoded.

  • 64 bits - the number of key-value entries

The remaining length of the section are the key-value pairs.

The layout for a key-value pair is the key, followed immediately by the value. The key is a string, and the value is a 64-bit value. A key starts with the length of the string, followed by the key string itself. A value is just the 8 bytes of the number.

The meta section should be used to place data that's readable by the VM, but is not used by the executing program. Data in the meta section is not copied to the program memory.

A VM must provide support for the following meta-values:

  • entry - a 64-bit address for where the VM should begin executing code.

General TODO

  • Interrupts
  • MMIO regions
  • Paging?